Skip to content

Map top-left map to bottom-right for self-attn#3939

Open
Micky774 wants to merge 1 commit into
mainfrom
zain/tl-br-sym
Open

Map top-left map to bottom-right for self-attn#3939
Micky774 wants to merge 1 commit into
mainfrom
zain/tl-br-sym

Conversation

@Micky774

@Micky774 Micky774 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Motivation

gfx1250 only has kernels which support bottom-right masking. In the case of s_q=s_kv we have that bottom-right = top-left masking. This mirrors a similar symmetry already in place for gfx942.

Technical Details

Test Plan

Tested by building w/ TE and applying to E2E models.

Test Result

The E2E model succeeds.

Submission Checklist

@Micky774 Micky774 requested review from a team and Copilot June 25, 2026 21:23
@github-actions

Copy link
Copy Markdown
Contributor

🏷️ CI Guide

Runs automatically on every PR:

  • ✅ Pre-checks (submodule verification, code formatting)
  • ✅ Aiter op tests (gfx942 + gfx950)
  • ✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label Tests
ci:triton-300x Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
ci:sglang SGLang integration tests: DeepSeek-R1-MXFP4 accuracy, Qwen 3.5 accuracy
ci:atom ATOM benchmark: DeepSeek-R1-0528, GPT-OSS-120B
ci:atom_full ATOM accuracy suite for PR and main models from ATOM models_accuracy.json
ci:vllm vLLM benchmark: GPT-OSS-120B, DeepSeek-R1-0528, Kimi-K2.5
ci:all All standard extended tests (excludes ci:atom_full)

Only add ci:atom_full for FlyDSL or Triton upgrades.
Add labels via the sidebar or gh pr edit 3939 --add-label <label>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the FMHA v3 backward ASM kernel selection to handle gfx1250’s limited causal-mask kernel availability by remapping an equivalent causal mask in the self-attention (square) case, aligning with an existing symmetry already used for gfx942.

Changes:

  • For arch_id == "gfx1250", remap mask_type from top-left (1) to bottom-right (2) when seqlen_q == seqlen_k so existing gfx1250 causal BWD kernels can be selected.
  • Keep the existing gfx942 square-seqlen remap logic (2 → 1) and apply the analogous inverse remap for gfx1250.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@zufayu zufayu requested a review from yzhou103 June 26, 2026 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants