Skip to content

CI: avoid fp8 KV cache in Kimi vLLM gate#3954

Open
gyohuangxin wants to merge 1 commit into
mainfrom
ci/fix-kimi-vllm-kv-cache
Open

CI: avoid fp8 KV cache in Kimi vLLM gate#3954
gyohuangxin wants to merge 1 commit into
mainfrom
ci/fix-kimi-vllm-kv-cache

Conversation

@gyohuangxin

Copy link
Copy Markdown
Member

Summary

  • stop forcing fp8 KV cache for the Kimi-K2.5 vLLM accuracy gate; leave fp8 opt-in via KIMI_VLLM_KV_CACHE_DTYPE
  • set vLLM block size to 1 for the ROCm AITER MLA path
  • print the vLLM server log tail when the server dies, does not become ready, or lm-eval produces no JSON

Context

The main branch Kimi Downstream Test is consistently failing in the vLLM backend while the SGLang backend passes. The failing vLLM job reaches /health, then the first gsm8k completion requests return EngineCore 500s and the server stops accepting connections. Recent ROCm vLLM changes reverted fp8 MLA decode support, so the CI should not force fp8 KV cache by default.

Validation

  • bash -n .github/scripts/kimi_vllm_accuracy.sh
  • git diff --check

@gyohuangxin gyohuangxin requested review from a team and Copilot June 26, 2026 09:36
@github-actions

Copy link
Copy Markdown
Contributor

🏷️ CI Guide

Runs automatically on every PR:

  • ✅ Pre-checks (submodule verification, code formatting)
  • ✅ Aiter op tests (gfx942 + gfx950)
  • ✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label Tests
ci:triton-300x Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
ci:sglang SGLang integration tests: DeepSeek-R1-MXFP4 accuracy, Qwen 3.5 accuracy
ci:atom ATOM benchmark: DeepSeek-R1-0528, GPT-OSS-120B
ci:atom_full ATOM accuracy suite for PR and main models from ATOM models_accuracy.json
ci:vllm vLLM benchmark: GPT-OSS-120B, DeepSeek-R1-0528, Kimi-K2.5
ci:all All standard extended tests (excludes ci:atom_full)

Only add ci:atom_full for FlyDSL or Triton upgrades.
Add labels via the sidebar or gh pr edit 3954 --add-label <label>

@gyohuangxin gyohuangxin added the ci:kimi Trigger Kimi-K2.5 downstream accuracy gates (vLLM+SGLang) label Jun 26, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the Kimi-K2.5 downstream vLLM CI gate to avoid unstable FP8 KV-cache defaults in current ROCm nightly images, while improving diagnostics when the vLLM server or eval step fails.

Changes:

  • Stop forcing --kv-cache-dtype fp8; keep FP8 opt-in via KIMI_VLLM_KV_CACHE_DTYPE.
  • Set vllm serve --block-size 1 for the ROCm AITER MLA path.
  • Add a dump_server_log helper and emit server log tail on common failure paths (server death, not-ready, missing lm-eval JSON).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:kimi Trigger Kimi-K2.5 downstream accuracy gates (vLLM+SGLang)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants