From 1c040ae611856e633c0998e4668c0f4c68e67c65 Mon Sep 17 00:00:00 2001 From: Tim Gymnich Date: Tue, 7 Apr 2026 14:56:16 +0200 Subject: [PATCH 1/5] Add CLAUDE.md files for Claude Code guidance Add top-level CLAUDE.md with project overview, build/test commands, compilation flow, runtime options, and architecture. Add water/CLAUDE.md and waveasm/CLAUDE.md covering each optional extension's build workflow, dialect design, and pass pipeline. Co-Authored-By: Claude Sonnet 4.6 Signed-off-by: Tim Gymnich --- .gitignore | 3 ++ CLAUDE.md | 76 +++++++++++++++++++++++++++++++++++++++++++++++ water/CLAUDE.md | 63 +++++++++++++++++++++++++++++++++++++++ waveasm/CLAUDE.md | 57 +++++++++++++++++++++++++++++++++++ 4 files changed, 199 insertions(+) create mode 100644 CLAUDE.md create mode 100644 water/CLAUDE.md create mode 100644 waveasm/CLAUDE.md diff --git a/.gitignore b/.gitignore index ac4f55092f..e482958e5b 100644 --- a/.gitignore +++ b/.gitignore @@ -59,3 +59,6 @@ water/build_tools/wheel/water_mlir/water_mlir # rocm version detection requirements-pytorch-rocm-generated.txt + +# Claude +CLAUDE.local.md diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000000..627c4be2cb --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,76 @@ +Wave is a Python DSL for high-performance ML kernel development targeting AMD GPUs (ROCm). The default compilation path is pure Python using IREE for codegen. Water and WaveASM are optional C++ extensions that replace parts of the IREE path — see @water/CLAUDE.md and @waveasm/CLAUDE.md. + +## Commands + +### Setup +```bash +python -m venv .venv && source .venv/bin/activate +pip install -r requirements-iree-pinned.txt +pip install -r pytorch-cpu-requirements.txt # CPU-only dev/testing +pip install -e ".[dev]" +pre-commit install && pre-commit install --hook-type commit-msg +``` + +### Testing +```bash +pytest -n 4 --capture=tee-sys -vv ./tests/unittests/ # unit tests +pytest -s tests/unittests/test_file.py::test_name -v # single test +lit lit_tests/ -vv # MLIR LIT tests +pytest -s tests/ --run-e2e # GPU tests (requires hardware) +``` + +### Linting +```bash +mypy # type check wave_lang +pre-commit run --all-files # Black, Ruff, clang-format +``` + +### Gotchas +- **Always set `WAVE_CACHE_ON=0`** when testing code changes — stale cache entries hide the effect of edits: `WAVE_CACHE_ON=0 pytest ...` +- DCO sign-off required on commits: `git commit -s` +- Dump MLIR for debugging: `pytest --dump-mlir-files-path=/tmp/mlir tests/` + +## Architecture + +### Compilation Flow + +``` +Wave Python DSL + ↓ graph transformation passes [wave_lang/kernel/wave/codegen/] +Transformed FX graph + ↓ WaveEmitter [compiler/wave_codegen/emitter.py] +stream.executable MLIR + ↓ iree.compiler.compile_str() [wave/utils/compile_utils.py] +VMFB (IREE bytecode module) + ↓ iree.runtime.VmModule +GPU kernel execution +``` + +Entry point: `wave_compile()` in `wave_lang/kernel/wave/compile.py`. + +### Runtimes + +**IREE runtime (default):** Loads VMFB into IREE's VM. Handles GPU command buffers, queue submission, benchmarking, multi-device. + +**Wave runtime (`options.wave_runtime=True`):** Launches HSACO kernels directly via HIP API. Supports dynamic strides and custom grid layout. Typically paired with WaveASM. Entry point: `invoke_with_wave_runtime()` in `wave_lang/kernel/wave/utils/run_utils.py`. + +### Key Source Locations + +- `wave_lang/kernel/wave/compile.py` — pipeline orchestration, backend/runtime selection +- `wave_lang/kernel/wave/codegen/` — graph transformation passes (scheduling, barriers, index analysis) +- `wave_lang/kernel/compiler/wave_codegen/emitter.py` — lowers FX graph to MLIR +- `wave_lang/kernel/wave/water.py` — Water/WaveASM lowering pipeline entry points +- `wave_lang/kernel/wave/mlir_converter/` — Wave FX ↔ Water MLIR conversion; runs in a subprocess to avoid MLIR library conflicts (Water backend only) + +### Optional Extensions + +Water and WaveASM intercept MLIR before IREE and produce HSACO directly. Enable via env vars: + +| Variable | Purpose | +|---|---| +| `WAVE_BUILD_WATER=1` | Build Water from source | +| `WAVE_BUILD_WAVEASM=1` | Build WaveASM from source | +| `WAVE_WATER_DIR=water/build` | Use existing Water build (fast) | +| `WAVE_WAVEASM_DIR=waveasm/build` | Use existing WaveASM build (fast) | + +When both active: stream.executable MLIR → `water-opt` → `waveasm-translate` → `water-opt` → ExecutionEngine. diff --git a/water/CLAUDE.md b/water/CLAUDE.md new file mode 100644 index 0000000000..7387433689 --- /dev/null +++ b/water/CLAUDE.md @@ -0,0 +1,63 @@ +Water is an optional MLIR layer in the Wave compiler stack that replaces IREE's middle-end lowering. It defines the `wave.*` and `normalform.*` dialects, transformation passes, and Python bindings (`water_mlir` package). + +## Building + +```bash +# First build — builds LLVM from source, takes a while +WAVE_WATER_DIR=water/build pip install -e ".[dev]" + +# Iterating on C++ changes +ninja -C water/build # rebuild changed targets only +pip install -e ".[dev]" # re-links Python extension (fast, skips CMake) +``` + +`WAVE_WATER_DIR` tells the Wave build system to use an existing build directory instead of rebuilding from scratch. Without it, the full LLVM + Water CMake build runs on every `pip install`. + +LLVM is pinned at `water/llvm-sha.txt`. CLI tool: `water-opt` (analogous to `mlir-opt`). + +## Testing + +```bash +ninja -C water/build check-water # all lit tests +lit test/Dialect/Wave/.mlir -vv # single test +``` + +Tests use lit + FileCheck. `.mlir` files use `// CHECK` comments. Negative tests are named `*-invalid.mlir`. + +## Architecture + +### Dialects + +**`wave.*`** — primary dialect. `wave.tensor` has symbolic shapes (unknown until inferred by passes) and an address space (`Global`, `Shared`, `Register`). Each op carries a `WaveIndexMappingAttr` encoding element distribution across device/workgroup/workitem/register dimensions as `(offset, count, step)` triples. + +**`normalform.*`** — `normalform.module` wraps IR and enforces declared invariants. Passes declare pre/post-conditions as normal form attributes, enabling composable pass ordering without new IR constructs. + +### Pass Pipeline + +`water-middle-end-lowering` runs these in order (`include/water/Dialect/Wave/Transforms/Passes.td`): + +| Pass | Purpose | +|---|---| +| `water-wave-detect-normal-forms` | Detect satisfied invariants | +| `water-wave-infer-types` | Shape inference via dataflow | +| `water-wave-infer-index-exprs` | Forward/backward index expression propagation | +| `water-wave-propagate-elements-per-thread` | Replace register tensors with vector types | +| `water-wave-resolve-distributed-allocations` | Map distributed shapes to concrete memref layouts | +| `lower-wave-to-mlir` | Lower to arith/math/vector/memref dialects | +| `lower-normalform-module` | Remove the normalform wrapper | + +Generic passes include SLP vectorization, bounds-checking assertions, alloc-to-alloca, and GPU module serialization (ROCDL). + +### Python Bindings + +Package `water_mlir` (prefixed to avoid IREE conflicts): +- `water_mlir.dialects.wave` — auto-generated op bindings from `WaveOps.td` +- `water_mlir.sympy_to_affine_converter` — converts SymPy expressions to MLIR affine expressions +- C++ extension via nanobind (`WaterExtensionNanobind.cpp`) + +### Key Design Principles + +- **Lazy type inference**: `wave.tensor` shapes start unknown — don't assume they're set at construction. +- **Elements-per-thread (EPT)**: tracked separately from types; required before register tensors can be lowered to vector types. A pass that changes element counts must update EPT. +- **`water_mlir` prefix**: the Python package is prefixed to avoid conflicts with IREE's MLIR bindings. Import as `from water_mlir.dialects import wave`, not `mlir.dialects.wave`. +- **subprocess isolation**: the Wave-side `mlir_converter` runs Water in a subprocess specifically to avoid MLIR library symbol clashes with IREE. diff --git a/waveasm/CLAUDE.md b/waveasm/CLAUDE.md new file mode 100644 index 0000000000..47b676655d --- /dev/null +++ b/waveasm/CLAUDE.md @@ -0,0 +1,57 @@ +WaveASM is an optional C++ backend in the Wave compiler stack that replaces IREE's GPU codegen. It translates MLIR into AMDGCN assembly for AMD GPUs (gfx942/CDNA3, gfx950/CDNA3.5, gfx1250/RDNA4) and produces `.hsaco` binaries via its own `waveasm.*` MLIR dialect, linear-scan register allocator, and assembly emitter. + +## Building + +```bash +# First build +WAVE_BUILD_WAVEASM=1 pip install -e ".[dev]" + +# Iterating on C++ changes (same pattern as Water) +ninja -C waveasm/build +pip install -e ".[dev]" # re-links extension, skips CMake +``` + +Set `WAVE_WAVEASM_DIR=waveasm/build` after first build to avoid full rebuilds on pip install. CLI tool: `waveasm-translate`. + +## Testing + +```bash +ninja -C waveasm/build check-waveasm # lit regression tests +ninja -C waveasm/build check-waveasm-all # + GPU functional tests (requires hardware) +lit test/Transforms/.mlir -vv # single test +``` + +## Architecture + +### Compilation Pipeline + +``` +Input MLIR (gpu, arith, vector, memref, scf, amdgpu dialects) + ↓ TranslateFromMLIR [lib/Transforms/TranslateFromMLIR.cpp] +WaveASM IR (virtual registers, pseudo-ops) + ↓ ScopedCSE, Peephole, BufferLoadStrengthReduction + ↓ ArithLegalization +Concrete SALU/VALU machine ops + ↓ Liveness → LinearScanRegAlloc → VGPRCompaction +Physical register assignments + ↓ Ticketing, HazardMitigation + ↓ AssemblyEmitter → clang++ +.hsaco GPU binary +``` + +### Dialect + +Types (`WaveASMTypes.td`): virtual (`!waveasm.vreg/sreg/areg`) and physical (`!waveasm.pvreg/psreg/pareg`) register types, plus `!waveasm.imm` and `!waveasm.scc`. The two-phase virtual→physical split is intentional — optimization passes run on virtual SSA, allocation happens once at the end. + +~300 machine ops in `WaveASMOps.td`: VALU, SALU, MFMA, memory (global/LDS/SMEM), control flow, and utility ops. Pseudo-ops (`waveasm.arith.*`) exist for cases where the concrete instruction depends on register class — ArithLegalization resolves them. + +### Adding New Dialect Support + +`TranslateFromMLIR` uses a handler registry. To translate a new upstream op, add a handler to the appropriate file in `lib/Transforms/handlers/` and register it in the `TranslationContext`. The `TranslationContext` also manages the SRD (Shader Resource Descriptor) table and expression cache — use it rather than tracking state locally in handlers. + +### Non-Obvious Constraints + +- **No spilling**: `LinearScanRegAlloc` aborts if register pressure exceeds hardware limits. If you see allocation failures, the kernel uses too many live values simultaneously. +- **Tied operands**: MFMA accumulator input and output must share the same physical registers. This is expressed via `TiedClass` equivalence classes in `Liveness` — new MFMA variants must declare their ties correctly. +- **SCC liveness**: `!waveasm.scc` is an implicit 1-bit condition code, not a normal SSA value. The SCC verifier enforces that SCC is consumed before the next instruction that overwrites it. SCC spill/reload uses `s_cselect_b32` / `s_cmp_ne`. +- **Ticketing**: `s_waitcnt` insertion is demand-driven via ticket tracking, not conservative. Passes that add new memory ops must ensure they participate in the ticket system. From 616e8ca26eb5ba85eafe3c92d49dc5f853a0ed52 Mon Sep 17 00:00:00 2001 From: Tim Gymnich Date: Tue, 7 Apr 2026 15:09:47 +0200 Subject: [PATCH 2/5] Remove @ imports and mentions of water/waveasm CLAUDE.md from root @ imports always load at session start. Child CLAUDE.md files load on demand automatically when Claude works in those directories. Co-Authored-By: Claude Sonnet 4.6 Signed-off-by: Tim Gymnich --- CLAUDE.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CLAUDE.md b/CLAUDE.md index 627c4be2cb..e3685eed37 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,4 +1,4 @@ -Wave is a Python DSL for high-performance ML kernel development targeting AMD GPUs (ROCm). The default compilation path is pure Python using IREE for codegen. Water and WaveASM are optional C++ extensions that replace parts of the IREE path — see @water/CLAUDE.md and @waveasm/CLAUDE.md. +Wave is a Python DSL for high-performance ML kernel development targeting AMD GPUs (ROCm). The default compilation path is pure Python using IREE for codegen. Water and WaveASM are optional C++ extensions that replace parts of the IREE path. ## Commands From 14a36a3b0d70c1a4ebe08088fd21cfc2ff49bbca Mon Sep 17 00:00:00 2001 From: Tim Gymnich Date: Wed, 8 Apr 2026 11:40:04 +0200 Subject: [PATCH 3/5] Add AGENTS.md Signed-off-by: Tim Gymnich --- AGENTS.md | 76 ++++++++++++++++++++++++++++++++++++++++++++++ CLAUDE.md | 77 +---------------------------------------------- water/AGENTS.md | 63 ++++++++++++++++++++++++++++++++++++++ water/CLAUDE.md | 64 +-------------------------------------- waveasm/AGENTS.md | 50 ++++++++++++++++++++++++++++++ waveasm/CLAUDE.md | 58 +---------------------------------- 6 files changed, 192 insertions(+), 196 deletions(-) create mode 100644 AGENTS.md create mode 100644 water/AGENTS.md create mode 100644 waveasm/AGENTS.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000000..e3685eed37 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,76 @@ +Wave is a Python DSL for high-performance ML kernel development targeting AMD GPUs (ROCm). The default compilation path is pure Python using IREE for codegen. Water and WaveASM are optional C++ extensions that replace parts of the IREE path. + +## Commands + +### Setup +```bash +python -m venv .venv && source .venv/bin/activate +pip install -r requirements-iree-pinned.txt +pip install -r pytorch-cpu-requirements.txt # CPU-only dev/testing +pip install -e ".[dev]" +pre-commit install && pre-commit install --hook-type commit-msg +``` + +### Testing +```bash +pytest -n 4 --capture=tee-sys -vv ./tests/unittests/ # unit tests +pytest -s tests/unittests/test_file.py::test_name -v # single test +lit lit_tests/ -vv # MLIR LIT tests +pytest -s tests/ --run-e2e # GPU tests (requires hardware) +``` + +### Linting +```bash +mypy # type check wave_lang +pre-commit run --all-files # Black, Ruff, clang-format +``` + +### Gotchas +- **Always set `WAVE_CACHE_ON=0`** when testing code changes — stale cache entries hide the effect of edits: `WAVE_CACHE_ON=0 pytest ...` +- DCO sign-off required on commits: `git commit -s` +- Dump MLIR for debugging: `pytest --dump-mlir-files-path=/tmp/mlir tests/` + +## Architecture + +### Compilation Flow + +``` +Wave Python DSL + ↓ graph transformation passes [wave_lang/kernel/wave/codegen/] +Transformed FX graph + ↓ WaveEmitter [compiler/wave_codegen/emitter.py] +stream.executable MLIR + ↓ iree.compiler.compile_str() [wave/utils/compile_utils.py] +VMFB (IREE bytecode module) + ↓ iree.runtime.VmModule +GPU kernel execution +``` + +Entry point: `wave_compile()` in `wave_lang/kernel/wave/compile.py`. + +### Runtimes + +**IREE runtime (default):** Loads VMFB into IREE's VM. Handles GPU command buffers, queue submission, benchmarking, multi-device. + +**Wave runtime (`options.wave_runtime=True`):** Launches HSACO kernels directly via HIP API. Supports dynamic strides and custom grid layout. Typically paired with WaveASM. Entry point: `invoke_with_wave_runtime()` in `wave_lang/kernel/wave/utils/run_utils.py`. + +### Key Source Locations + +- `wave_lang/kernel/wave/compile.py` — pipeline orchestration, backend/runtime selection +- `wave_lang/kernel/wave/codegen/` — graph transformation passes (scheduling, barriers, index analysis) +- `wave_lang/kernel/compiler/wave_codegen/emitter.py` — lowers FX graph to MLIR +- `wave_lang/kernel/wave/water.py` — Water/WaveASM lowering pipeline entry points +- `wave_lang/kernel/wave/mlir_converter/` — Wave FX ↔ Water MLIR conversion; runs in a subprocess to avoid MLIR library conflicts (Water backend only) + +### Optional Extensions + +Water and WaveASM intercept MLIR before IREE and produce HSACO directly. Enable via env vars: + +| Variable | Purpose | +|---|---| +| `WAVE_BUILD_WATER=1` | Build Water from source | +| `WAVE_BUILD_WAVEASM=1` | Build WaveASM from source | +| `WAVE_WATER_DIR=water/build` | Use existing Water build (fast) | +| `WAVE_WAVEASM_DIR=waveasm/build` | Use existing WaveASM build (fast) | + +When both active: stream.executable MLIR → `water-opt` → `waveasm-translate` → `water-opt` → ExecutionEngine. diff --git a/CLAUDE.md b/CLAUDE.md index e3685eed37..10ddb199c8 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,76 +1 @@ -Wave is a Python DSL for high-performance ML kernel development targeting AMD GPUs (ROCm). The default compilation path is pure Python using IREE for codegen. Water and WaveASM are optional C++ extensions that replace parts of the IREE path. - -## Commands - -### Setup -```bash -python -m venv .venv && source .venv/bin/activate -pip install -r requirements-iree-pinned.txt -pip install -r pytorch-cpu-requirements.txt # CPU-only dev/testing -pip install -e ".[dev]" -pre-commit install && pre-commit install --hook-type commit-msg -``` - -### Testing -```bash -pytest -n 4 --capture=tee-sys -vv ./tests/unittests/ # unit tests -pytest -s tests/unittests/test_file.py::test_name -v # single test -lit lit_tests/ -vv # MLIR LIT tests -pytest -s tests/ --run-e2e # GPU tests (requires hardware) -``` - -### Linting -```bash -mypy # type check wave_lang -pre-commit run --all-files # Black, Ruff, clang-format -``` - -### Gotchas -- **Always set `WAVE_CACHE_ON=0`** when testing code changes — stale cache entries hide the effect of edits: `WAVE_CACHE_ON=0 pytest ...` -- DCO sign-off required on commits: `git commit -s` -- Dump MLIR for debugging: `pytest --dump-mlir-files-path=/tmp/mlir tests/` - -## Architecture - -### Compilation Flow - -``` -Wave Python DSL - ↓ graph transformation passes [wave_lang/kernel/wave/codegen/] -Transformed FX graph - ↓ WaveEmitter [compiler/wave_codegen/emitter.py] -stream.executable MLIR - ↓ iree.compiler.compile_str() [wave/utils/compile_utils.py] -VMFB (IREE bytecode module) - ↓ iree.runtime.VmModule -GPU kernel execution -``` - -Entry point: `wave_compile()` in `wave_lang/kernel/wave/compile.py`. - -### Runtimes - -**IREE runtime (default):** Loads VMFB into IREE's VM. Handles GPU command buffers, queue submission, benchmarking, multi-device. - -**Wave runtime (`options.wave_runtime=True`):** Launches HSACO kernels directly via HIP API. Supports dynamic strides and custom grid layout. Typically paired with WaveASM. Entry point: `invoke_with_wave_runtime()` in `wave_lang/kernel/wave/utils/run_utils.py`. - -### Key Source Locations - -- `wave_lang/kernel/wave/compile.py` — pipeline orchestration, backend/runtime selection -- `wave_lang/kernel/wave/codegen/` — graph transformation passes (scheduling, barriers, index analysis) -- `wave_lang/kernel/compiler/wave_codegen/emitter.py` — lowers FX graph to MLIR -- `wave_lang/kernel/wave/water.py` — Water/WaveASM lowering pipeline entry points -- `wave_lang/kernel/wave/mlir_converter/` — Wave FX ↔ Water MLIR conversion; runs in a subprocess to avoid MLIR library conflicts (Water backend only) - -### Optional Extensions - -Water and WaveASM intercept MLIR before IREE and produce HSACO directly. Enable via env vars: - -| Variable | Purpose | -|---|---| -| `WAVE_BUILD_WATER=1` | Build Water from source | -| `WAVE_BUILD_WAVEASM=1` | Build WaveASM from source | -| `WAVE_WATER_DIR=water/build` | Use existing Water build (fast) | -| `WAVE_WAVEASM_DIR=waveasm/build` | Use existing WaveASM build (fast) | - -When both active: stream.executable MLIR → `water-opt` → `waveasm-translate` → `water-opt` → ExecutionEngine. +See @AGENTS.md diff --git a/water/AGENTS.md b/water/AGENTS.md new file mode 100644 index 0000000000..7387433689 --- /dev/null +++ b/water/AGENTS.md @@ -0,0 +1,63 @@ +Water is an optional MLIR layer in the Wave compiler stack that replaces IREE's middle-end lowering. It defines the `wave.*` and `normalform.*` dialects, transformation passes, and Python bindings (`water_mlir` package). + +## Building + +```bash +# First build — builds LLVM from source, takes a while +WAVE_WATER_DIR=water/build pip install -e ".[dev]" + +# Iterating on C++ changes +ninja -C water/build # rebuild changed targets only +pip install -e ".[dev]" # re-links Python extension (fast, skips CMake) +``` + +`WAVE_WATER_DIR` tells the Wave build system to use an existing build directory instead of rebuilding from scratch. Without it, the full LLVM + Water CMake build runs on every `pip install`. + +LLVM is pinned at `water/llvm-sha.txt`. CLI tool: `water-opt` (analogous to `mlir-opt`). + +## Testing + +```bash +ninja -C water/build check-water # all lit tests +lit test/Dialect/Wave/.mlir -vv # single test +``` + +Tests use lit + FileCheck. `.mlir` files use `// CHECK` comments. Negative tests are named `*-invalid.mlir`. + +## Architecture + +### Dialects + +**`wave.*`** — primary dialect. `wave.tensor` has symbolic shapes (unknown until inferred by passes) and an address space (`Global`, `Shared`, `Register`). Each op carries a `WaveIndexMappingAttr` encoding element distribution across device/workgroup/workitem/register dimensions as `(offset, count, step)` triples. + +**`normalform.*`** — `normalform.module` wraps IR and enforces declared invariants. Passes declare pre/post-conditions as normal form attributes, enabling composable pass ordering without new IR constructs. + +### Pass Pipeline + +`water-middle-end-lowering` runs these in order (`include/water/Dialect/Wave/Transforms/Passes.td`): + +| Pass | Purpose | +|---|---| +| `water-wave-detect-normal-forms` | Detect satisfied invariants | +| `water-wave-infer-types` | Shape inference via dataflow | +| `water-wave-infer-index-exprs` | Forward/backward index expression propagation | +| `water-wave-propagate-elements-per-thread` | Replace register tensors with vector types | +| `water-wave-resolve-distributed-allocations` | Map distributed shapes to concrete memref layouts | +| `lower-wave-to-mlir` | Lower to arith/math/vector/memref dialects | +| `lower-normalform-module` | Remove the normalform wrapper | + +Generic passes include SLP vectorization, bounds-checking assertions, alloc-to-alloca, and GPU module serialization (ROCDL). + +### Python Bindings + +Package `water_mlir` (prefixed to avoid IREE conflicts): +- `water_mlir.dialects.wave` — auto-generated op bindings from `WaveOps.td` +- `water_mlir.sympy_to_affine_converter` — converts SymPy expressions to MLIR affine expressions +- C++ extension via nanobind (`WaterExtensionNanobind.cpp`) + +### Key Design Principles + +- **Lazy type inference**: `wave.tensor` shapes start unknown — don't assume they're set at construction. +- **Elements-per-thread (EPT)**: tracked separately from types; required before register tensors can be lowered to vector types. A pass that changes element counts must update EPT. +- **`water_mlir` prefix**: the Python package is prefixed to avoid conflicts with IREE's MLIR bindings. Import as `from water_mlir.dialects import wave`, not `mlir.dialects.wave`. +- **subprocess isolation**: the Wave-side `mlir_converter` runs Water in a subprocess specifically to avoid MLIR library symbol clashes with IREE. diff --git a/water/CLAUDE.md b/water/CLAUDE.md index 7387433689..10ddb199c8 100644 --- a/water/CLAUDE.md +++ b/water/CLAUDE.md @@ -1,63 +1 @@ -Water is an optional MLIR layer in the Wave compiler stack that replaces IREE's middle-end lowering. It defines the `wave.*` and `normalform.*` dialects, transformation passes, and Python bindings (`water_mlir` package). - -## Building - -```bash -# First build — builds LLVM from source, takes a while -WAVE_WATER_DIR=water/build pip install -e ".[dev]" - -# Iterating on C++ changes -ninja -C water/build # rebuild changed targets only -pip install -e ".[dev]" # re-links Python extension (fast, skips CMake) -``` - -`WAVE_WATER_DIR` tells the Wave build system to use an existing build directory instead of rebuilding from scratch. Without it, the full LLVM + Water CMake build runs on every `pip install`. - -LLVM is pinned at `water/llvm-sha.txt`. CLI tool: `water-opt` (analogous to `mlir-opt`). - -## Testing - -```bash -ninja -C water/build check-water # all lit tests -lit test/Dialect/Wave/.mlir -vv # single test -``` - -Tests use lit + FileCheck. `.mlir` files use `// CHECK` comments. Negative tests are named `*-invalid.mlir`. - -## Architecture - -### Dialects - -**`wave.*`** — primary dialect. `wave.tensor` has symbolic shapes (unknown until inferred by passes) and an address space (`Global`, `Shared`, `Register`). Each op carries a `WaveIndexMappingAttr` encoding element distribution across device/workgroup/workitem/register dimensions as `(offset, count, step)` triples. - -**`normalform.*`** — `normalform.module` wraps IR and enforces declared invariants. Passes declare pre/post-conditions as normal form attributes, enabling composable pass ordering without new IR constructs. - -### Pass Pipeline - -`water-middle-end-lowering` runs these in order (`include/water/Dialect/Wave/Transforms/Passes.td`): - -| Pass | Purpose | -|---|---| -| `water-wave-detect-normal-forms` | Detect satisfied invariants | -| `water-wave-infer-types` | Shape inference via dataflow | -| `water-wave-infer-index-exprs` | Forward/backward index expression propagation | -| `water-wave-propagate-elements-per-thread` | Replace register tensors with vector types | -| `water-wave-resolve-distributed-allocations` | Map distributed shapes to concrete memref layouts | -| `lower-wave-to-mlir` | Lower to arith/math/vector/memref dialects | -| `lower-normalform-module` | Remove the normalform wrapper | - -Generic passes include SLP vectorization, bounds-checking assertions, alloc-to-alloca, and GPU module serialization (ROCDL). - -### Python Bindings - -Package `water_mlir` (prefixed to avoid IREE conflicts): -- `water_mlir.dialects.wave` — auto-generated op bindings from `WaveOps.td` -- `water_mlir.sympy_to_affine_converter` — converts SymPy expressions to MLIR affine expressions -- C++ extension via nanobind (`WaterExtensionNanobind.cpp`) - -### Key Design Principles - -- **Lazy type inference**: `wave.tensor` shapes start unknown — don't assume they're set at construction. -- **Elements-per-thread (EPT)**: tracked separately from types; required before register tensors can be lowered to vector types. A pass that changes element counts must update EPT. -- **`water_mlir` prefix**: the Python package is prefixed to avoid conflicts with IREE's MLIR bindings. Import as `from water_mlir.dialects import wave`, not `mlir.dialects.wave`. -- **subprocess isolation**: the Wave-side `mlir_converter` runs Water in a subprocess specifically to avoid MLIR library symbol clashes with IREE. +See @AGENTS.md diff --git a/waveasm/AGENTS.md b/waveasm/AGENTS.md new file mode 100644 index 0000000000..43a3c852f5 --- /dev/null +++ b/waveasm/AGENTS.md @@ -0,0 +1,50 @@ +WaveASM is an optional C++ backend in the Wave compiler stack that replaces IREE's GPU codegen. It translates MLIR into AMDGCN assembly for AMD GPUs (gfx942/CDNA3, gfx950/CDNA3.5, gfx1250/RDNA4) and produces `.hsaco` binaries via its own `waveasm.*` MLIR dialect, linear-scan register allocator, and assembly emitter. + +## Building + +```bash +# First build +WAVE_BUILD_WAVEASM=1 pip install -e ".[dev]" + +# Iterating on C++ changes (same pattern as Water) +ninja -C waveasm/build +pip install -e ".[dev]" # re-links extension, skips CMake +``` + +Set `WAVE_WAVEASM_DIR=waveasm/build` after first build to avoid full rebuilds on pip install. CLI tool: `waveasm-translate`. + +## Testing + +```bash +ninja -C waveasm/build check-waveasm # lit regression tests +ninja -C waveasm/build check-waveasm-all # + GPU functional tests (requires hardware) +lit test/Transforms/.mlir -vv # single test +``` + +## Architecture + +### Compilation Pipeline + +``` +Input MLIR (gpu, arith, vector, memref, scf, amdgpu dialects) + ↓ TranslateFromMLIR [lib/Transforms/TranslateFromMLIR.cpp] +WaveASM IR (virtual registers, pseudo-ops) + ↓ ScopedCSE, Peephole, BufferLoadStrengthReduction + ↓ ArithLegalization +Concrete SALU/VALU machine ops + ↓ Liveness → LinearScanRegAlloc → VGPRCompaction +Physical register assignments + ↓ Ticketing, HazardMitigation + ↓ AssemblyEmitter → clang++ +.hsaco GPU binary +``` + +### Dialect + +Types (`WaveASMTypes.td`): virtual (`!waveasm.vreg/sreg/areg`) and physical (`!waveasm.pvreg/psreg/pareg`) register types, plus `!waveasm.imm` and `!waveasm.scc`. The two-phase virtual→physical split is intentional — optimization passes run on virtual SSA, allocation happens once at the end. + +~300 machine ops in `WaveASMOps.td`: VALU, SALU, MFMA, memory (global/LDS/SMEM), control flow, and utility ops. Pseudo-ops (`waveasm.arith.*`) exist for cases where the concrete instruction depends on register class — ArithLegalization resolves them. + +### Adding New Dialect Support + +`TranslateFromMLIR` uses a handler registry. To translate a new upstream op, add a handler to the appropriate file in `lib/Transforms/handlers/` and register it in the `TranslationContext`. The `TranslationContext` also manages the SRD (Shader Resource Descriptor) table and expression cache — use it rather than tracking state locally in handlers. diff --git a/waveasm/CLAUDE.md b/waveasm/CLAUDE.md index 47b676655d..10ddb199c8 100644 --- a/waveasm/CLAUDE.md +++ b/waveasm/CLAUDE.md @@ -1,57 +1 @@ -WaveASM is an optional C++ backend in the Wave compiler stack that replaces IREE's GPU codegen. It translates MLIR into AMDGCN assembly for AMD GPUs (gfx942/CDNA3, gfx950/CDNA3.5, gfx1250/RDNA4) and produces `.hsaco` binaries via its own `waveasm.*` MLIR dialect, linear-scan register allocator, and assembly emitter. - -## Building - -```bash -# First build -WAVE_BUILD_WAVEASM=1 pip install -e ".[dev]" - -# Iterating on C++ changes (same pattern as Water) -ninja -C waveasm/build -pip install -e ".[dev]" # re-links extension, skips CMake -``` - -Set `WAVE_WAVEASM_DIR=waveasm/build` after first build to avoid full rebuilds on pip install. CLI tool: `waveasm-translate`. - -## Testing - -```bash -ninja -C waveasm/build check-waveasm # lit regression tests -ninja -C waveasm/build check-waveasm-all # + GPU functional tests (requires hardware) -lit test/Transforms/.mlir -vv # single test -``` - -## Architecture - -### Compilation Pipeline - -``` -Input MLIR (gpu, arith, vector, memref, scf, amdgpu dialects) - ↓ TranslateFromMLIR [lib/Transforms/TranslateFromMLIR.cpp] -WaveASM IR (virtual registers, pseudo-ops) - ↓ ScopedCSE, Peephole, BufferLoadStrengthReduction - ↓ ArithLegalization -Concrete SALU/VALU machine ops - ↓ Liveness → LinearScanRegAlloc → VGPRCompaction -Physical register assignments - ↓ Ticketing, HazardMitigation - ↓ AssemblyEmitter → clang++ -.hsaco GPU binary -``` - -### Dialect - -Types (`WaveASMTypes.td`): virtual (`!waveasm.vreg/sreg/areg`) and physical (`!waveasm.pvreg/psreg/pareg`) register types, plus `!waveasm.imm` and `!waveasm.scc`. The two-phase virtual→physical split is intentional — optimization passes run on virtual SSA, allocation happens once at the end. - -~300 machine ops in `WaveASMOps.td`: VALU, SALU, MFMA, memory (global/LDS/SMEM), control flow, and utility ops. Pseudo-ops (`waveasm.arith.*`) exist for cases where the concrete instruction depends on register class — ArithLegalization resolves them. - -### Adding New Dialect Support - -`TranslateFromMLIR` uses a handler registry. To translate a new upstream op, add a handler to the appropriate file in `lib/Transforms/handlers/` and register it in the `TranslationContext`. The `TranslationContext` also manages the SRD (Shader Resource Descriptor) table and expression cache — use it rather than tracking state locally in handlers. - -### Non-Obvious Constraints - -- **No spilling**: `LinearScanRegAlloc` aborts if register pressure exceeds hardware limits. If you see allocation failures, the kernel uses too many live values simultaneously. -- **Tied operands**: MFMA accumulator input and output must share the same physical registers. This is expressed via `TiedClass` equivalence classes in `Liveness` — new MFMA variants must declare their ties correctly. -- **SCC liveness**: `!waveasm.scc` is an implicit 1-bit condition code, not a normal SSA value. The SCC verifier enforces that SCC is consumed before the next instruction that overwrites it. SCC spill/reload uses `s_cselect_b32` / `s_cmp_ne`. -- **Ticketing**: `s_waitcnt` insertion is demand-driven via ticket tracking, not conservative. Passes that add new memory ops must ensure they participate in the ticket system. +See @AGENTS.md From 94d61643b9a5e23c9455672d39f1d781df7adafe Mon Sep 17 00:00:00 2001 From: Tim Gymnich Date: Wed, 8 Apr 2026 11:44:44 +0200 Subject: [PATCH 4/5] Add clang-format formatting guidance to water and waveasm AGENTS.md Co-Authored-By: Claude Sonnet 4.6 Signed-off-by: Tim Gymnich --- water/AGENTS.md | 9 +++++++++ waveasm/AGENTS.md | 9 +++++++++ 2 files changed, 18 insertions(+) diff --git a/water/AGENTS.md b/water/AGENTS.md index 7387433689..0c43cd491b 100644 --- a/water/AGENTS.md +++ b/water/AGENTS.md @@ -15,6 +15,15 @@ pip install -e ".[dev]" # re-links Python extension (fast, skips CMake) LLVM is pinned at `water/llvm-sha.txt`. CLI tool: `water-opt` (analogous to `mlir-opt`). +## Formatting + +C++ code is formatted with `clang-format`. Run via pre-commit or directly: + +```bash +clang-format -i # format a single file in-place +pre-commit run clang-format # format all staged files +``` + ## Testing ```bash diff --git a/waveasm/AGENTS.md b/waveasm/AGENTS.md index 43a3c852f5..35ec55a338 100644 --- a/waveasm/AGENTS.md +++ b/waveasm/AGENTS.md @@ -13,6 +13,15 @@ pip install -e ".[dev]" # re-links extension, skips CMake Set `WAVE_WAVEASM_DIR=waveasm/build` after first build to avoid full rebuilds on pip install. CLI tool: `waveasm-translate`. +## Formatting + +C++ code is formatted with `clang-format`. Run via pre-commit or directly: + +```bash +clang-format -i # format a single file in-place +pre-commit run clang-format # format all staged files +``` + ## Testing ```bash From 8c9cd08268b640a1e7c7472594d1f62386fb4a66 Mon Sep 17 00:00:00 2001 From: Tim Gymnich Date: Wed, 8 Apr 2026 12:12:33 +0200 Subject: [PATCH 5/5] Update AGENTS.md files with build instructions and formatting guidance - water/AGENTS.md: restructure Building section to clarify that Water must be built with CMake first, then pip install with WAVE_WATER_DIR; add full cmake configure/build commands and useful flags; note that ninja alone is sufficient for iterating after initial pip install; add git clang-format guidance; add lit location note; add Pipelines.cpp reference in Pass Pipeline section - waveasm/AGENTS.md: add git clang-format guidance - AGENTS.md: update pre-commit invocation; add AGENTS.local.md to .gitignore Co-Authored-By: Claude Sonnet 4.6 Signed-off-by: Tim Gymnich --- .gitignore | 3 ++- AGENTS.md | 5 ++--- water/AGENTS.md | 53 +++++++++++++++++++++++++++++++++++++---------- waveasm/AGENTS.md | 7 ++++--- 4 files changed, 50 insertions(+), 18 deletions(-) diff --git a/.gitignore b/.gitignore index e482958e5b..3c87fb3377 100644 --- a/.gitignore +++ b/.gitignore @@ -60,5 +60,6 @@ water/build_tools/wheel/water_mlir/water_mlir # rocm version detection requirements-pytorch-rocm-generated.txt -# Claude +# AI Agents CLAUDE.local.md +AGENTS.local.md diff --git a/AGENTS.md b/AGENTS.md index e3685eed37..f7f1fcc34a 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -21,13 +21,12 @@ pytest -s tests/ --run-e2e # GPU tests (requires har ### Linting ```bash -mypy # type check wave_lang -pre-commit run --all-files # Black, Ruff, clang-format +mypy # type check wave_lang +pre-commit run # run Black, Ruff, clang-format against currently staged files ``` ### Gotchas - **Always set `WAVE_CACHE_ON=0`** when testing code changes — stale cache entries hide the effect of edits: `WAVE_CACHE_ON=0 pytest ...` -- DCO sign-off required on commits: `git commit -s` - Dump MLIR for debugging: `pytest --dump-mlir-files-path=/tmp/mlir tests/` ## Architecture diff --git a/water/AGENTS.md b/water/AGENTS.md index 0c43cd491b..0c36da9975 100644 --- a/water/AGENTS.md +++ b/water/AGENTS.md @@ -2,26 +2,57 @@ Water is an optional MLIR layer in the Wave compiler stack that replaces IREE's ## Building +Water must be built with CMake first. `pip install` alone does not build Water — `WAVE_WATER_DIR` is required to point Wave at an existing Water build. + +LLVM is pinned at `water/llvm-sha.txt`. CLI tool: `water-opt` (analogous to `mlir-opt`). + +### Step 1: Build Water with CMake + +Requires a pre-built LLVM/MLIR. Set `$BUILD_DIR` to your LLVM build or install tree. + ```bash -# First build — builds LLVM from source, takes a while -WAVE_WATER_DIR=water/build pip install -e ".[dev]" +# Configure +cmake -G Ninja \ + -B water/build \ + water/ \ + -DMLIR_DIR=$BUILD_DIR/lib/cmake/mlir \ + -DBUILD_SHARED_LIBS=ON \ + -DPython3_EXECUTABLE="$(which python)" \ + -DWATER_ENABLE_PYTHON=ON + +# Optional: faster builds with clang + ccache + lld +cmake -B water/build \ + -DCMAKE_C_COMPILER=clang \ + -DCMAKE_CXX_COMPILER=clang++ \ + -DCMAKE_C_COMPILER_LAUNCHER=ccache \ + -DCMAKE_CXX_COMPILER_LAUNCHER=ccache \ + -DLLVM_USE_LINKER=lld + +# Build +cmake --build water/build +``` + +### Step 2: Install Wave with Water bindings -# Iterating on C++ changes -ninja -C water/build # rebuild changed targets only -pip install -e ".[dev]" # re-links Python extension (fast, skips CMake) +```bash +WAVE_WATER_DIR=water/build pip install -e ".[dev]" ``` -`WAVE_WATER_DIR` tells the Wave build system to use an existing build directory instead of rebuilding from scratch. Without it, the full LLVM + Water CMake build runs on every `pip install`. +`WAVE_WATER_DIR` tells Wave where to find the Water build. Without it, Water is not included. -LLVM is pinned at `water/llvm-sha.txt`. CLI tool: `water-opt` (analogous to `mlir-opt`). +### Iterating on C++ changes -## Formatting +```bash +ninja -C water/build # rebuild changed C++ targets and Python bindings +``` -C++ code is formatted with `clang-format`. Run via pre-commit or directly: +## Formatting +C++ code is formatted with `git clang-format` which formats only the lines changed relative to a commit (default: `HEAD`) ```bash -clang-format -i # format a single file in-place -pre-commit run clang-format # format all staged files +git clang-format # format staged changes +git clang-format HEAD~1 # also include most recent commit +git clang-format main # format everything touched on your branch ``` ## Testing diff --git a/waveasm/AGENTS.md b/waveasm/AGENTS.md index 35ec55a338..e63b8172e8 100644 --- a/waveasm/AGENTS.md +++ b/waveasm/AGENTS.md @@ -15,11 +15,12 @@ Set `WAVE_WAVEASM_DIR=waveasm/build` after first build to avoid full rebuilds on ## Formatting -C++ code is formatted with `clang-format`. Run via pre-commit or directly: +C++ code is formatted with `git clang-format` which formats only the lines changed relative to a commit (default: `HEAD`) ```bash -clang-format -i # format a single file in-place -pre-commit run clang-format # format all staged files +git clang-format # format staged changes +git clang-format HEAD~1 # also include most recent commit +git clang-format main # format everything touched on your branch ``` ## Testing