CI: avoid fp8 KV cache in Kimi vLLM gate by gyohuangxin · Pull Request #3954 · ROCm/aiter

gyohuangxin · 2026-06-26T09:36:02Z

Summary

stop forcing fp8 KV cache for the Kimi-K2.5 vLLM accuracy gate; leave fp8 opt-in via KIMI_VLLM_KV_CACHE_DTYPE
set vLLM block size to 1 for the ROCm AITER MLA path
print the vLLM server log tail when the server dies, does not become ready, or lm-eval produces no JSON

Context

The main branch Kimi Downstream Test is consistently failing in the vLLM backend while the SGLang backend passes. The failing vLLM job reaches /health, then the first gsm8k completion requests return EngineCore 500s and the server stops accepting connections. Recent ROCm vLLM changes reverted fp8 MLA decode support, so the CI should not force fp8 KV cache by default.

Validation

bash -n .github/scripts/kimi_vllm_accuracy.sh
git diff --check

github-actions · 2026-06-26T09:36:14Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-300x`	Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
`ci:sglang`	SGLang integration tests: DeepSeek-R1-MXFP4 accuracy, Qwen 3.5 accuracy
`ci:atom`	ATOM benchmark: DeepSeek-R1-0528, GPT-OSS-120B
`ci:atom_full`	ATOM accuracy suite for PR and main models from ATOM `models_accuracy.json`
`ci:vllm`	vLLM benchmark: GPT-OSS-120B, DeepSeek-R1-0528, Kimi-K2.5
`ci:all`	All standard extended tests (excludes `ci:atom_full`)

Only add ci:atom_full for FlyDSL or Triton upgrades.
Add labels via the sidebar or gh pr edit 3954 --add-label <label>

Copilot

Pull request overview

Updates the Kimi-K2.5 downstream vLLM CI gate to avoid unstable FP8 KV-cache defaults in current ROCm nightly images, while improving diagnostics when the vLLM server or eval step fails.

Changes:

Stop forcing --kv-cache-dtype fp8; keep FP8 opt-in via KIMI_VLLM_KV_CACHE_DTYPE.
Set vllm serve --block-size 1 for the ROCm AITER MLA path.
Add a dump_server_log helper and emit server log tail on common failure paths (server death, not-ready, missing lm-eval JSON).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

CI: avoid fp8 KV cache in Kimi vLLM gate

3c6a671

gyohuangxin requested review from a team and Copilot June 26, 2026 09:36

Copilot started reviewing on behalf of gyohuangxin June 26, 2026 09:37 View session

gyohuangxin added the ci:kimi Trigger Kimi-K2.5 downstream accuracy gates (vLLM+SGLang) label Jun 26, 2026

Copilot AI reviewed Jun 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CI: avoid fp8 KV cache in Kimi vLLM gate#3954

CI: avoid fp8 KV cache in Kimi vLLM gate#3954
gyohuangxin wants to merge 1 commit into
mainfrom
ci/fix-kimi-vllm-kv-cache

gyohuangxin commented Jun 26, 2026

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

gyohuangxin commented Jun 26, 2026

Summary

Context

Validation

Uh oh!

github-actions Bot commented Jun 26, 2026

🏷️ CI Guide

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants