-
Notifications
You must be signed in to change notification settings - Fork 380
Pull requests: ROCm/aiter
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Tune] Add qwen3.5-397B MXFP4 a16w16 GEMM tuning configs
#3974
opened Jun 28, 2026 by
yichiche
Contributor
Loading…
2 of 3 tasks
[CK] Fix MoE 2-stage dispatch for non-128-divisible inter_dim
#3973
opened Jun 27, 2026 by
jonahbernard
Loading…
1 task done
Add gelu_tanh activation to no-quant CK 2-stage fused MoE
#3972
opened Jun 27, 2026 by
jonahbernard
Loading…
1 task done
[Triton] [gfx12] Tunning of A8W8 blockscale GEMM
#3967
opened Jun 27, 2026 by
k50112113
Contributor
Loading…
[Kernel][Perf] split-K long-context decode for shuffled fp8 SWA path
#3962
opened Jun 26, 2026 by
reger-men
Loading…
3 tasks done
[Kernel][Triton] sliding-window decode over shuffled fp8 paged KV
#3959
opened Jun 26, 2026 by
reger-men
Loading…
2 of 3 tasks
bf16 asm mha: add mask=0 kernel
#3957
opened Jun 26, 2026 by
tingchen988
Contributor
Loading…
1 task
fix(triton): support gfx1201 unified attention within LDS limits
#3956
opened Jun 26, 2026 by
papadako
Loading…
CI: avoid fp8 KV cache in Kimi vLLM gate
ci:kimi
Trigger Kimi-K2.5 downstream accuracy gates (vLLM+SGLang)
#3954
opened Jun 26, 2026 by
gyohuangxin
Member
Loading…
[Configs] DSv3.2 gfx942 (MI325X): tuned a8w8 blockscale GEMM + FMoE configs (TP8)
#3951
opened Jun 26, 2026 by
frida-andersson
Contributor
Loading…
[Triton] Add fused_gemm_a16w16_split_cat
#3940
opened Jun 25, 2026 by
rbrugaro-amd
Contributor
Loading…
Map top-left map to bottom-right for self-attn
#3939
opened Jun 25, 2026 by
Micky774
Contributor
Loading…
1 task
gate custom all-reduce on XGMI topology
#3938
opened Jun 25, 2026 by
skysnow2001
Contributor
Loading…
1 task done
Spatial Attention: XCD-aware spatial workgroup mapping for MHA and GQA (SWIZZLE=1)
#3936
opened Jun 25, 2026 by
mc186
Loading…
[test] test_topk_plain: parametrize sweep to fix collection-time OOM
#3934
opened Jun 25, 2026 by
JohnQinAMD
Contributor
Loading…
1 task
edit aiter_opus_plus.h using opus api instead of asm code
#3932
opened Jun 25, 2026 by
junhaha666
Contributor
Loading…
1 task
fix(quick_all_reduce): make flag sync CUDA-graph-safe
#3928
opened Jun 25, 2026 by
Jasen2201
Contributor
Loading…
Previous Next
ProTip!
Updated in the last three days: updated:>2026-06-25.