ci(sglang): add Kimi-K2 e2e accuracy + perf regression#985
Draft
sunway513 wants to merge 1 commit into
Draft
Conversation
Add Kimi-K2-Thinking-MXFP4 and Kimi-K2.5-MXFP4 (TP8) to the SGLang
regression suite, reaching parity with the existing vLLM coverage.
- atom-sglang-test.yaml: add two Kimi accuracy cases to the matrix
- sglang_benchmark_models.json: add two Kimi perf cases (nightly group B)
- atom-sglang-benchmark.yaml: add kimi presets + all-kimi selector
- runner-config.yml: document the in-use linux-atom-mi35x-{1,4,8} and
atom-mi355-8gpu-oot-benchmark runner labels (GPU-arch mapping gap)
Accuracy thresholds mirror the vLLM baselines and must be recalibrated
from the first green SGLang run.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds Kimi-K2 end-to-end regression coverage to the SGLang backend (both accuracy and performance), reaching parity with the Kimi coverage that already exists for vLLM.
Why
KimiK2 was already covered on every vLLM lane (in-tree accuracy/perf and OOT accuracy/perf) but had zero coverage on SGLang — neither the accuracy matrix nor the perf catalog contained any Kimi entry. This closes that gap.
Coverage before vs after
Changes
.github/workflows/atom-sglang-test.yaml— two Kimi cases added to the accuracymatrix.include(TP8,linux-atom-mi35x-8,SGLANG_USE_AITER=1)..github/benchmark/sglang_benchmark_models.json— two Kimi perf cases (nightly_group: B,atom-mi355-8gpu-aac-runner). Group B keeps them out of the daily group-A sweep; they run in the FridayC-ALLfull sweep and on manual dispatch..github/workflows/atom-sglang-benchmark.yaml—kimi-k2-thinking-mxfp4-tp8,kimi-k25-mxfp4-tp8and anall-kimioption added to theoob_model_presetdropdown + selector logic..github/runner-config.yml— documents the runner labels actually in use but previously missing from the GPU-arch map:atom-mi355-8gpu-oot-benchmarkand thelinux-atom-mi35x-{1,4,8}family.Open items for reviewers (draft)
0.90(K2-Thinking) and0.92(K2.5) are mirrored from the vLLM baselines as placeholders. Please recalibrate from the first green SGLang run.amd/Kimi-K2-Thinking-MXFP4andamd/Kimi-K2.5-MXFP4with--tensor-parallel-size 8 --trust-remote-code+SGLANG_USE_AITER=1. Confirm these are the intended SGLang serving args (DeepSeek entries also setSGLANG_AITER_FP8_PREFILL_ATTN=0/ fusion flags — Kimi MXFP4 may want different env).mi35xGPU-arch label. Thelinux-atom-mi35x-*pool schedules onto MI350 or MI355. I mapped it toMI355with a comment; confirm whether the devops-dashboard supports anMI35Xheterogeneous-pool value before relabeling.