Skip to content

ci(sglang): add Kimi-K2 e2e accuracy + perf regression#985

Draft
sunway513 wants to merge 1 commit into
ROCm:mainfrom
sunway513:feat/sglang-kimik2-regression
Draft

ci(sglang): add Kimi-K2 e2e accuracy + perf regression#985
sunway513 wants to merge 1 commit into
ROCm:mainfrom
sunway513:feat/sglang-kimik2-regression

Conversation

@sunway513
Copy link
Copy Markdown
Collaborator

What

Adds Kimi-K2 end-to-end regression coverage to the SGLang backend (both accuracy and performance), reaching parity with the Kimi coverage that already exists for vLLM.

Why

KimiK2 was already covered on every vLLM lane (in-tree accuracy/perf and OOT accuracy/perf) but had zero coverage on SGLang — neither the accuracy matrix nor the perf catalog contained any Kimi entry. This closes that gap.

Coverage before vs after

Lane Before After
vLLM in-tree accuracy Kimi-K2.5-MXFP4 (unchanged)
vLLM in-tree perf Kimi-K2.5-MXFP4 (unchanged)
vLLM OOT accuracy K2-Thinking + K2.5 TP8 (unchanged)
vLLM OOT perf K2-Thinking/K2.5 TP4+TP8 (unchanged)
SGLang accuracy none + K2-Thinking-MXFP4 TP8, K2.5-MXFP4 TP8
SGLang perf none + K2-Thinking-MXFP4 TP8, K2.5-MXFP4 TP8

Changes

  1. .github/workflows/atom-sglang-test.yaml — two Kimi cases added to the accuracy matrix.include (TP8, linux-atom-mi35x-8, SGLANG_USE_AITER=1).
  2. .github/benchmark/sglang_benchmark_models.json — two Kimi perf cases (nightly_group: B, atom-mi355-8gpu-aac-runner). Group B keeps them out of the daily group-A sweep; they run in the Friday C-ALL full sweep and on manual dispatch.
  3. .github/workflows/atom-sglang-benchmark.yamlkimi-k2-thinking-mxfp4-tp8, kimi-k25-mxfp4-tp8 and an all-kimi option added to the oob_model_preset dropdown + selector logic.
  4. .github/runner-config.yml — documents the runner labels actually in use but previously missing from the GPU-arch map: atom-mi355-8gpu-oot-benchmark and the linux-atom-mi35x-{1,4,8} family.

Open items for reviewers (draft)

  • Accuracy thresholds need calibration. 0.90 (K2-Thinking) and 0.92 (K2.5) are mirrored from the vLLM baselines as placeholders. Please recalibrate from the first green SGLang run.
  • Model paths / flags. Using amd/Kimi-K2-Thinking-MXFP4 and amd/Kimi-K2.5-MXFP4 with --tensor-parallel-size 8 --trust-remote-code + SGLANG_USE_AITER=1. Confirm these are the intended SGLang serving args (DeepSeek entries also set SGLANG_AITER_FP8_PREFILL_ATTN=0 / fusion flags — Kimi MXFP4 may want different env).
  • mi35x GPU-arch label. The linux-atom-mi35x-* pool schedules onto MI350 or MI355. I mapped it to MI355 with a comment; confirm whether the devops-dashboard supports an MI35X heterogeneous-pool value before relabeling.

Draft — opening for review of approach/config before enabling on real runners.

Add Kimi-K2-Thinking-MXFP4 and Kimi-K2.5-MXFP4 (TP8) to the SGLang
regression suite, reaching parity with the existing vLLM coverage.

- atom-sglang-test.yaml: add two Kimi accuracy cases to the matrix
- sglang_benchmark_models.json: add two Kimi perf cases (nightly group B)
- atom-sglang-benchmark.yaml: add kimi presets + all-kimi selector
- runner-config.yml: document the in-use linux-atom-mi35x-{1,4,8} and
  atom-mi355-8gpu-oot-benchmark runner labels (GPU-arch mapping gap)

Accuracy thresholds mirror the vLLM baselines and must be recalibrated
from the first green SGLang run.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant