Skip to content

edit aiter_opus_plus.h using opus api instead of asm code#3932

Open
junhaha666 wants to merge 1 commit into
mainfrom
jun/aiter_opus_plus
Open

edit aiter_opus_plus.h using opus api instead of asm code#3932
junhaha666 wants to merge 1 commit into
mainfrom
jun/aiter_opus_plus

Conversation

@junhaha666

Copy link
Copy Markdown
Contributor

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

@junhaha666 junhaha666 requested review from a team and Copilot June 25, 2026 15:41
@github-actions

Copy link
Copy Markdown
Contributor

🏷️ CI Guide

Runs automatically on every PR:

  • ✅ Pre-checks (submodule verification, code formatting)
  • ✅ Aiter op tests (gfx942 + gfx950)
  • ✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label Tests
ci:triton-300x Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
ci:sglang SGLang integration tests: DeepSeek-R1-MXFP4 accuracy, Qwen 3.5 accuracy
ci:atom ATOM benchmark: DeepSeek-R1-0528, GPT-OSS-120B
ci:atom_full ATOM accuracy suite for PR and main models from ATOM models_accuracy.json
ci:vllm vLLM benchmark: GPT-OSS-120B, DeepSeek-R1-0528, Kimi-K2.5
ci:all All standard extended tests (excludes ci:atom_full)

Only add ci:atom_full for FlyDSL or Triton upgrades.
Add labels via the sidebar or gh pr edit 3932 --add-label <label>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors parts of the low-level quantization/packing utilities to rely on OPUS API/builtins rather than inline GCN asm, and extends opus.hpp with additional packing helpers needed by aiter_opus_plus.h.

Changes:

  • Add fp32 -> bf8 packed conversion helper in opus.hpp.
  • Add fp16 -> fp4 packed helpers in opus.hpp and switch aiter_opus_plus.h to delegate to these helpers.
  • Replace inline asm sequences for fp32 -> {fp8,bf8} packed conversion in aiter_opus_plus.h with OPUS helpers + med3 clamp.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
csrc/include/opus/opus.hpp Adds new bf8/fp4 packing helpers and adjusts some fp4/bf16 conversion plumbing.
csrc/include/aiter_opus_plus.h Removes inline asm for packed fp8/bf8 conversion and delegates fp16→fp4 packing to opus.hpp.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1258 to +1259
int w ; w = __builtin_amdgcn_cvt_pk_bf8_f32(s[0], s[1], w, sel);
return __builtin_bit_cast(bf8x2_t, static_cast<short>(w));
Comment on lines +1351 to +1352
u32_t w; w = __builtin_amdgcn_cvt_scalef32_pk_fp4_f16(w, s, scale, sel);
return __builtin_bit_cast(array<fp4_t, 1>, static_cast<u8_t>(w));
Comment on lines +1356 to +1357
u32_t w; w = __builtin_amdgcn_cvt_scalef32_pk_fp4_f16(w, fp16x2_t{s[0], s[1]}, scale, 0); w = __builtin_amdgcn_cvt_scalef32_pk_fp4_f16(w, fp16x2_t{s[2], s[3]}, scale, 1);
return __builtin_bit_cast(array<fp4_t, 2>, static_cast<u16_t>(w));
Comment on lines +1361 to +1363
u32_t w; w = __builtin_amdgcn_cvt_scalef32_pk_fp4_f16(w, fp16x2_t{s[0], s[1]}, scale, 0); w = __builtin_amdgcn_cvt_scalef32_pk_fp4_f16(w, fp16x2_t{s[2], s[3]}, scale, 1);
w = __builtin_amdgcn_cvt_scalef32_pk_fp4_f16(w, fp16x2_t{s[4], s[5]}, scale, 2); w = __builtin_amdgcn_cvt_scalef32_pk_fp4_f16(w, fp16x2_t{s[6], s[7]}, scale, 3);
return __builtin_bit_cast(array<fp4_t, 4>, w);
@zufayu zufayu requested a review from valarLip June 26, 2026 08:53
@valarLip valarLip requested a review from carlushuang June 28, 2026 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants