Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -59,3 +59,7 @@ water/build_tools/wheel/water_mlir/water_mlir

# rocm version detection
requirements-pytorch-rocm-generated.txt

# AI Agents
CLAUDE.local.md

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gitignores a CLAUDE.local.md file, but I would like the main CLAUDE.md (or AGENTS.md) to explicitly reference and tell the model to also adhere to an AGENTS.local.md file -- from what I've read, most agent programs don't read an AGENTS.local.md, and it seems that claude doesn't use CLAUDE.local.md anymore either. I think this is a shame. Anyway, we will all have some personal workflow stuff in addition to any shared AGENTS.md stuff. There is the global ~/.claude/CLAUDE.md and similar for other agents, but that affects all repos, and I really want to have personal and repo-specific instructions for agents.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude does still use CLAUDE.local.md:

❯ /context
  ⎿  Context Usage
     ⛁ ⛁ ⛀ ⛀ ⛀ ⛶ ⛶ ⛶ ⛶ ⛶   Claude-Sonnet-4.6 · 5k/200k tokens (3%)
     ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶
     ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶   Estimated usage by category
     ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶   ⛁ System prompt: 3.9k tokens (1.9%)
     ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶   ⛁ Memory files: 1.1k tokens (0.5%)
     ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶   ⛁ Skills: 349 tokens (0.2%)
     ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶   ⛁ Messages: 8 tokens (0.0%)
     ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶   ⛶ Free space: 162k (80.9%)
     ⛶ ⛶ ⛶ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝   ⛝ Autocompact buffer: 33k tokens (16.5%)
     ⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝

     Memory files · /memory
     └ CLAUDE.md: 13 tokens
     └ AGENTS.md: 1k tokens
     └ CLAUDE.local.md: 19 tokens
     └ ~/.claude/projects/-home-tgymnich-wave/memory/MEMORY.md: 39 tokens

     Skills · /skills

So you should be able to still use per-repo configs.
I also added AGENTS.local.md to .gitignore.

AGENTS.local.md
75 changes: 75 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
Wave is a Python DSL for high-performance ML kernel development targeting AMD GPUs (ROCm). The default compilation path is pure Python using IREE for codegen. Water and WaveASM are optional C++ extensions that replace parts of the IREE path.

## Commands

### Setup
```bash
python -m venv .venv && source .venv/bin/activate
pip install -r requirements-iree-pinned.txt
pip install -r pytorch-cpu-requirements.txt # CPU-only dev/testing
pip install -e ".[dev]"
pre-commit install && pre-commit install --hook-type commit-msg
```

### Testing
```bash
pytest -n 4 --capture=tee-sys -vv ./tests/unittests/ # unit tests
pytest -s tests/unittests/test_file.py::test_name -v # single test
lit lit_tests/ -vv # MLIR LIT tests
pytest -s tests/ --run-e2e # GPU tests (requires hardware)
```

### Linting
```bash
mypy # type check wave_lang
pre-commit run # run Black, Ruff, clang-format against currently staged files
```

### Gotchas
- **Always set `WAVE_CACHE_ON=0`** when testing code changes — stale cache entries hide the effect of edits: `WAVE_CACHE_ON=0 pytest ...`
- Dump MLIR for debugging: `pytest --dump-mlir-files-path=/tmp/mlir tests/`

## Architecture

### Compilation Flow

```
Wave Python DSL
↓ graph transformation passes [wave_lang/kernel/wave/codegen/]
Transformed FX graph
↓ WaveEmitter [compiler/wave_codegen/emitter.py]
stream.executable MLIR
↓ iree.compiler.compile_str() [wave/utils/compile_utils.py]
VMFB (IREE bytecode module)
↓ iree.runtime.VmModule
GPU kernel execution
```

Entry point: `wave_compile()` in `wave_lang/kernel/wave/compile.py`.

### Runtimes

**IREE runtime (default):** Loads VMFB into IREE's VM. Handles GPU command buffers, queue submission, benchmarking, multi-device.

**Wave runtime (`options.wave_runtime=True`):** Launches HSACO kernels directly via HIP API. Supports dynamic strides and custom grid layout. Typically paired with WaveASM. Entry point: `invoke_with_wave_runtime()` in `wave_lang/kernel/wave/utils/run_utils.py`.

### Key Source Locations

- `wave_lang/kernel/wave/compile.py` — pipeline orchestration, backend/runtime selection
- `wave_lang/kernel/wave/codegen/` — graph transformation passes (scheduling, barriers, index analysis)
- `wave_lang/kernel/compiler/wave_codegen/emitter.py` — lowers FX graph to MLIR
- `wave_lang/kernel/wave/water.py` — Water/WaveASM lowering pipeline entry points
- `wave_lang/kernel/wave/mlir_converter/` — Wave FX ↔ Water MLIR conversion; runs in a subprocess to avoid MLIR library conflicts (Water backend only)

### Optional Extensions

Water and WaveASM intercept MLIR before IREE and produce HSACO directly. Enable via env vars:

| Variable | Purpose |
|---|---|
| `WAVE_BUILD_WATER=1` | Build Water from source |
| `WAVE_BUILD_WAVEASM=1` | Build WaveASM from source |
| `WAVE_WATER_DIR=water/build` | Use existing Water build (fast) |
| `WAVE_WAVEASM_DIR=waveasm/build` | Use existing WaveASM build (fast) |

When both active: stream.executable MLIR → `water-opt` → `waveasm-translate` → `water-opt` → ExecutionEngine.
1 change: 1 addition & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
See @AGENTS.md
103 changes: 103 additions & 0 deletions water/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
Water is an optional MLIR layer in the Wave compiler stack that replaces IREE's middle-end lowering. It defines the `wave.*` and `normalform.*` dialects, transformation passes, and Python bindings (`water_mlir` package).

## Building

Water must be built with CMake first. `pip install` alone does not build Water — `WAVE_WATER_DIR` is required to point Wave at an existing Water build.

LLVM is pinned at `water/llvm-sha.txt`. CLI tool: `water-opt` (analogous to `mlir-opt`).

### Step 1: Build Water with CMake

Requires a pre-built LLVM/MLIR. Set `$BUILD_DIR` to your LLVM build or install tree.

```bash
# Configure
cmake -G Ninja \
-B water/build \
water/ \
-DMLIR_DIR=$BUILD_DIR/lib/cmake/mlir \
-DBUILD_SHARED_LIBS=ON \
-DPython3_EXECUTABLE="$(which python)" \
-DWATER_ENABLE_PYTHON=ON

# Optional: faster builds with clang + ccache + lld
cmake -B water/build \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_C_COMPILER_LAUNCHER=ccache \
-DCMAKE_CXX_COMPILER_LAUNCHER=ccache \
-DLLVM_USE_LINKER=lld

# Build
cmake --build water/build
```

### Step 2: Install Wave with Water bindings

```bash
WAVE_WATER_DIR=water/build pip install -e ".[dev]"
```

`WAVE_WATER_DIR` tells Wave where to find the Water build. Without it, Water is not included.

### Iterating on C++ changes

```bash
ninja -C water/build # rebuild changed C++ targets and Python bindings
```

## Formatting

C++ code is formatted with `git clang-format` which formats only the lines changed relative to a commit (default: `HEAD`)
```bash
git clang-format # format staged changes
git clang-format HEAD~1 # also include most recent commit
git clang-format main # format everything touched on your branch
```

## Testing

```bash
ninja -C water/build check-water # all lit tests
lit test/Dialect/Wave/<test>.mlir -vv # single test
```

Tests use lit + FileCheck. `.mlir` files use `// CHECK` comments. Negative tests are named `*-invalid.mlir`.

## Architecture

### Dialects

**`wave.*`** — primary dialect. `wave.tensor` has symbolic shapes (unknown until inferred by passes) and an address space (`Global`, `Shared`, `Register`). Each op carries a `WaveIndexMappingAttr` encoding element distribution across device/workgroup/workitem/register dimensions as `(offset, count, step)` triples.

**`normalform.*`** — `normalform.module` wraps IR and enforces declared invariants. Passes declare pre/post-conditions as normal form attributes, enabling composable pass ordering without new IR constructs.

### Pass Pipeline

`water-middle-end-lowering` runs these in order (`include/water/Dialect/Wave/Transforms/Passes.td`):

| Pass | Purpose |
|---|---|
| `water-wave-detect-normal-forms` | Detect satisfied invariants |
| `water-wave-infer-types` | Shape inference via dataflow |
| `water-wave-infer-index-exprs` | Forward/backward index expression propagation |
| `water-wave-propagate-elements-per-thread` | Replace register tensors with vector types |
| `water-wave-resolve-distributed-allocations` | Map distributed shapes to concrete memref layouts |
| `lower-wave-to-mlir` | Lower to arith/math/vector/memref dialects |
| `lower-normalform-module` | Remove the normalform wrapper |

Generic passes include SLP vectorization, bounds-checking assertions, alloc-to-alloca, and GPU module serialization (ROCDL).

### Python Bindings

Package `water_mlir` (prefixed to avoid IREE conflicts):
- `water_mlir.dialects.wave` — auto-generated op bindings from `WaveOps.td`
- `water_mlir.sympy_to_affine_converter` — converts SymPy expressions to MLIR affine expressions
- C++ extension via nanobind (`WaterExtensionNanobind.cpp`)

### Key Design Principles

- **Lazy type inference**: `wave.tensor` shapes start unknown — don't assume they're set at construction.
- **Elements-per-thread (EPT)**: tracked separately from types; required before register tensors can be lowered to vector types. A pass that changes element counts must update EPT.
- **`water_mlir` prefix**: the Python package is prefixed to avoid conflicts with IREE's MLIR bindings. Import as `from water_mlir.dialects import wave`, not `mlir.dialects.wave`.
- **subprocess isolation**: the Wave-side `mlir_converter` runs Water in a subprocess specifically to avoid MLIR library symbol clashes with IREE.
1 change: 1 addition & 0 deletions water/CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
See @AGENTS.md
60 changes: 60 additions & 0 deletions waveasm/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
WaveASM is an optional C++ backend in the Wave compiler stack that replaces IREE's GPU codegen. It translates MLIR into AMDGCN assembly for AMD GPUs (gfx942/CDNA3, gfx950/CDNA3.5, gfx1250/RDNA4) and produces `.hsaco` binaries via its own `waveasm.*` MLIR dialect, linear-scan register allocator, and assembly emitter.

## Building

```bash
# First build
WAVE_BUILD_WAVEASM=1 pip install -e ".[dev]"

# Iterating on C++ changes (same pattern as Water)
ninja -C waveasm/build
pip install -e ".[dev]" # re-links extension, skips CMake
```

Set `WAVE_WAVEASM_DIR=waveasm/build` after first build to avoid full rebuilds on pip install. CLI tool: `waveasm-translate`.

## Formatting

C++ code is formatted with `git clang-format` which formats only the lines changed relative to a commit (default: `HEAD`)

```bash
git clang-format # format staged changes
git clang-format HEAD~1 # also include most recent commit
git clang-format main # format everything touched on your branch
```

## Testing

```bash
ninja -C waveasm/build check-waveasm # lit regression tests
ninja -C waveasm/build check-waveasm-all # + GPU functional tests (requires hardware)
lit test/Transforms/<test>.mlir -vv # single test
```

## Architecture

### Compilation Pipeline

```
Input MLIR (gpu, arith, vector, memref, scf, amdgpu dialects)
↓ TranslateFromMLIR [lib/Transforms/TranslateFromMLIR.cpp]
WaveASM IR (virtual registers, pseudo-ops)
↓ ScopedCSE, Peephole, BufferLoadStrengthReduction
↓ ArithLegalization
Concrete SALU/VALU machine ops
↓ Liveness → LinearScanRegAlloc → VGPRCompaction
Physical register assignments
↓ Ticketing, HazardMitigation
↓ AssemblyEmitter → clang++
.hsaco GPU binary
```

### Dialect

Types (`WaveASMTypes.td`): virtual (`!waveasm.vreg/sreg/areg`) and physical (`!waveasm.pvreg/psreg/pareg`) register types, plus `!waveasm.imm` and `!waveasm.scc`. The two-phase virtual→physical split is intentional — optimization passes run on virtual SSA, allocation happens once at the end.

~300 machine ops in `WaveASMOps.td`: VALU, SALU, MFMA, memory (global/LDS/SMEM), control flow, and utility ops. Pseudo-ops (`waveasm.arith.*`) exist for cases where the concrete instruction depends on register class — ArithLegalization resolves them.

### Adding New Dialect Support

`TranslateFromMLIR` uses a handler registry. To translate a new upstream op, add a handler to the appropriate file in `lib/Transforms/handlers/` and register it in the `TranslationContext`. The `TranslationContext` also manages the SRD (Shader Resource Descriptor) table and expression cache — use it rather than tracking state locally in handlers.
1 change: 1 addition & 0 deletions waveasm/CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
See @AGENTS.md
Loading