diff --git a/.gitignore b/.gitignore index ac4f55092f..3c87fb3377 100644 --- a/.gitignore +++ b/.gitignore @@ -59,3 +59,7 @@ water/build_tools/wheel/water_mlir/water_mlir # rocm version detection requirements-pytorch-rocm-generated.txt + +# AI Agents +CLAUDE.local.md +AGENTS.local.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000000..f7f1fcc34a --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,75 @@ +Wave is a Python DSL for high-performance ML kernel development targeting AMD GPUs (ROCm). The default compilation path is pure Python using IREE for codegen. Water and WaveASM are optional C++ extensions that replace parts of the IREE path. + +## Commands + +### Setup +```bash +python -m venv .venv && source .venv/bin/activate +pip install -r requirements-iree-pinned.txt +pip install -r pytorch-cpu-requirements.txt # CPU-only dev/testing +pip install -e ".[dev]" +pre-commit install && pre-commit install --hook-type commit-msg +``` + +### Testing +```bash +pytest -n 4 --capture=tee-sys -vv ./tests/unittests/ # unit tests +pytest -s tests/unittests/test_file.py::test_name -v # single test +lit lit_tests/ -vv # MLIR LIT tests +pytest -s tests/ --run-e2e # GPU tests (requires hardware) +``` + +### Linting +```bash +mypy # type check wave_lang +pre-commit run # run Black, Ruff, clang-format against currently staged files +``` + +### Gotchas +- **Always set `WAVE_CACHE_ON=0`** when testing code changes — stale cache entries hide the effect of edits: `WAVE_CACHE_ON=0 pytest ...` +- Dump MLIR for debugging: `pytest --dump-mlir-files-path=/tmp/mlir tests/` + +## Architecture + +### Compilation Flow + +``` +Wave Python DSL + ↓ graph transformation passes [wave_lang/kernel/wave/codegen/] +Transformed FX graph + ↓ WaveEmitter [compiler/wave_codegen/emitter.py] +stream.executable MLIR + ↓ iree.compiler.compile_str() [wave/utils/compile_utils.py] +VMFB (IREE bytecode module) + ↓ iree.runtime.VmModule +GPU kernel execution +``` + +Entry point: `wave_compile()` in `wave_lang/kernel/wave/compile.py`. + +### Runtimes + +**IREE runtime (default):** Loads VMFB into IREE's VM. Handles GPU command buffers, queue submission, benchmarking, multi-device. + +**Wave runtime (`options.wave_runtime=True`):** Launches HSACO kernels directly via HIP API. Supports dynamic strides and custom grid layout. Typically paired with WaveASM. Entry point: `invoke_with_wave_runtime()` in `wave_lang/kernel/wave/utils/run_utils.py`. + +### Key Source Locations + +- `wave_lang/kernel/wave/compile.py` — pipeline orchestration, backend/runtime selection +- `wave_lang/kernel/wave/codegen/` — graph transformation passes (scheduling, barriers, index analysis) +- `wave_lang/kernel/compiler/wave_codegen/emitter.py` — lowers FX graph to MLIR +- `wave_lang/kernel/wave/water.py` — Water/WaveASM lowering pipeline entry points +- `wave_lang/kernel/wave/mlir_converter/` — Wave FX ↔ Water MLIR conversion; runs in a subprocess to avoid MLIR library conflicts (Water backend only) + +### Optional Extensions + +Water and WaveASM intercept MLIR before IREE and produce HSACO directly. Enable via env vars: + +| Variable | Purpose | +|---|---| +| `WAVE_BUILD_WATER=1` | Build Water from source | +| `WAVE_BUILD_WAVEASM=1` | Build WaveASM from source | +| `WAVE_WATER_DIR=water/build` | Use existing Water build (fast) | +| `WAVE_WAVEASM_DIR=waveasm/build` | Use existing WaveASM build (fast) | + +When both active: stream.executable MLIR → `water-opt` → `waveasm-translate` → `water-opt` → ExecutionEngine. diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000000..10ddb199c8 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1 @@ +See @AGENTS.md diff --git a/water/AGENTS.md b/water/AGENTS.md new file mode 100644 index 0000000000..0c36da9975 --- /dev/null +++ b/water/AGENTS.md @@ -0,0 +1,103 @@ +Water is an optional MLIR layer in the Wave compiler stack that replaces IREE's middle-end lowering. It defines the `wave.*` and `normalform.*` dialects, transformation passes, and Python bindings (`water_mlir` package). + +## Building + +Water must be built with CMake first. `pip install` alone does not build Water — `WAVE_WATER_DIR` is required to point Wave at an existing Water build. + +LLVM is pinned at `water/llvm-sha.txt`. CLI tool: `water-opt` (analogous to `mlir-opt`). + +### Step 1: Build Water with CMake + +Requires a pre-built LLVM/MLIR. Set `$BUILD_DIR` to your LLVM build or install tree. + +```bash +# Configure +cmake -G Ninja \ + -B water/build \ + water/ \ + -DMLIR_DIR=$BUILD_DIR/lib/cmake/mlir \ + -DBUILD_SHARED_LIBS=ON \ + -DPython3_EXECUTABLE="$(which python)" \ + -DWATER_ENABLE_PYTHON=ON + +# Optional: faster builds with clang + ccache + lld +cmake -B water/build \ + -DCMAKE_C_COMPILER=clang \ + -DCMAKE_CXX_COMPILER=clang++ \ + -DCMAKE_C_COMPILER_LAUNCHER=ccache \ + -DCMAKE_CXX_COMPILER_LAUNCHER=ccache \ + -DLLVM_USE_LINKER=lld + +# Build +cmake --build water/build +``` + +### Step 2: Install Wave with Water bindings + +```bash +WAVE_WATER_DIR=water/build pip install -e ".[dev]" +``` + +`WAVE_WATER_DIR` tells Wave where to find the Water build. Without it, Water is not included. + +### Iterating on C++ changes + +```bash +ninja -C water/build # rebuild changed C++ targets and Python bindings +``` + +## Formatting + +C++ code is formatted with `git clang-format` which formats only the lines changed relative to a commit (default: `HEAD`) +```bash +git clang-format # format staged changes +git clang-format HEAD~1 # also include most recent commit +git clang-format main # format everything touched on your branch +``` + +## Testing + +```bash +ninja -C water/build check-water # all lit tests +lit test/Dialect/Wave/.mlir -vv # single test +``` + +Tests use lit + FileCheck. `.mlir` files use `// CHECK` comments. Negative tests are named `*-invalid.mlir`. + +## Architecture + +### Dialects + +**`wave.*`** — primary dialect. `wave.tensor` has symbolic shapes (unknown until inferred by passes) and an address space (`Global`, `Shared`, `Register`). Each op carries a `WaveIndexMappingAttr` encoding element distribution across device/workgroup/workitem/register dimensions as `(offset, count, step)` triples. + +**`normalform.*`** — `normalform.module` wraps IR and enforces declared invariants. Passes declare pre/post-conditions as normal form attributes, enabling composable pass ordering without new IR constructs. + +### Pass Pipeline + +`water-middle-end-lowering` runs these in order (`include/water/Dialect/Wave/Transforms/Passes.td`): + +| Pass | Purpose | +|---|---| +| `water-wave-detect-normal-forms` | Detect satisfied invariants | +| `water-wave-infer-types` | Shape inference via dataflow | +| `water-wave-infer-index-exprs` | Forward/backward index expression propagation | +| `water-wave-propagate-elements-per-thread` | Replace register tensors with vector types | +| `water-wave-resolve-distributed-allocations` | Map distributed shapes to concrete memref layouts | +| `lower-wave-to-mlir` | Lower to arith/math/vector/memref dialects | +| `lower-normalform-module` | Remove the normalform wrapper | + +Generic passes include SLP vectorization, bounds-checking assertions, alloc-to-alloca, and GPU module serialization (ROCDL). + +### Python Bindings + +Package `water_mlir` (prefixed to avoid IREE conflicts): +- `water_mlir.dialects.wave` — auto-generated op bindings from `WaveOps.td` +- `water_mlir.sympy_to_affine_converter` — converts SymPy expressions to MLIR affine expressions +- C++ extension via nanobind (`WaterExtensionNanobind.cpp`) + +### Key Design Principles + +- **Lazy type inference**: `wave.tensor` shapes start unknown — don't assume they're set at construction. +- **Elements-per-thread (EPT)**: tracked separately from types; required before register tensors can be lowered to vector types. A pass that changes element counts must update EPT. +- **`water_mlir` prefix**: the Python package is prefixed to avoid conflicts with IREE's MLIR bindings. Import as `from water_mlir.dialects import wave`, not `mlir.dialects.wave`. +- **subprocess isolation**: the Wave-side `mlir_converter` runs Water in a subprocess specifically to avoid MLIR library symbol clashes with IREE. diff --git a/water/CLAUDE.md b/water/CLAUDE.md new file mode 100644 index 0000000000..10ddb199c8 --- /dev/null +++ b/water/CLAUDE.md @@ -0,0 +1 @@ +See @AGENTS.md diff --git a/waveasm/AGENTS.md b/waveasm/AGENTS.md new file mode 100644 index 0000000000..e63b8172e8 --- /dev/null +++ b/waveasm/AGENTS.md @@ -0,0 +1,60 @@ +WaveASM is an optional C++ backend in the Wave compiler stack that replaces IREE's GPU codegen. It translates MLIR into AMDGCN assembly for AMD GPUs (gfx942/CDNA3, gfx950/CDNA3.5, gfx1250/RDNA4) and produces `.hsaco` binaries via its own `waveasm.*` MLIR dialect, linear-scan register allocator, and assembly emitter. + +## Building + +```bash +# First build +WAVE_BUILD_WAVEASM=1 pip install -e ".[dev]" + +# Iterating on C++ changes (same pattern as Water) +ninja -C waveasm/build +pip install -e ".[dev]" # re-links extension, skips CMake +``` + +Set `WAVE_WAVEASM_DIR=waveasm/build` after first build to avoid full rebuilds on pip install. CLI tool: `waveasm-translate`. + +## Formatting + +C++ code is formatted with `git clang-format` which formats only the lines changed relative to a commit (default: `HEAD`) + +```bash +git clang-format # format staged changes +git clang-format HEAD~1 # also include most recent commit +git clang-format main # format everything touched on your branch +``` + +## Testing + +```bash +ninja -C waveasm/build check-waveasm # lit regression tests +ninja -C waveasm/build check-waveasm-all # + GPU functional tests (requires hardware) +lit test/Transforms/.mlir -vv # single test +``` + +## Architecture + +### Compilation Pipeline + +``` +Input MLIR (gpu, arith, vector, memref, scf, amdgpu dialects) + ↓ TranslateFromMLIR [lib/Transforms/TranslateFromMLIR.cpp] +WaveASM IR (virtual registers, pseudo-ops) + ↓ ScopedCSE, Peephole, BufferLoadStrengthReduction + ↓ ArithLegalization +Concrete SALU/VALU machine ops + ↓ Liveness → LinearScanRegAlloc → VGPRCompaction +Physical register assignments + ↓ Ticketing, HazardMitigation + ↓ AssemblyEmitter → clang++ +.hsaco GPU binary +``` + +### Dialect + +Types (`WaveASMTypes.td`): virtual (`!waveasm.vreg/sreg/areg`) and physical (`!waveasm.pvreg/psreg/pareg`) register types, plus `!waveasm.imm` and `!waveasm.scc`. The two-phase virtual→physical split is intentional — optimization passes run on virtual SSA, allocation happens once at the end. + +~300 machine ops in `WaveASMOps.td`: VALU, SALU, MFMA, memory (global/LDS/SMEM), control flow, and utility ops. Pseudo-ops (`waveasm.arith.*`) exist for cases where the concrete instruction depends on register class — ArithLegalization resolves them. + +### Adding New Dialect Support + +`TranslateFromMLIR` uses a handler registry. To translate a new upstream op, add a handler to the appropriate file in `lib/Transforms/handlers/` and register it in the `TranslationContext`. The `TranslationContext` also manages the SRD (Shader Resource Descriptor) table and expression cache — use it rather than tracking state locally in handlers. diff --git a/waveasm/CLAUDE.md b/waveasm/CLAUDE.md new file mode 100644 index 0000000000..10ddb199c8 --- /dev/null +++ b/waveasm/CLAUDE.md @@ -0,0 +1 @@ +See @AGENTS.md