ci(sglang): add Kimi-K2 e2e accuracy + perf regression by sunway513 · Pull Request #985 · ROCm/ATOM

sunway513 · 2026-05-30T13:38:31Z

What

Adds Kimi-K2 end-to-end regression coverage to the SGLang backend (both accuracy and performance), reaching parity with the Kimi coverage that already exists for vLLM.

Why

KimiK2 was already covered on every vLLM lane (in-tree accuracy/perf and OOT accuracy/perf) but had zero coverage on SGLang — neither the accuracy matrix nor the perf catalog contained any Kimi entry. This closes that gap.

Coverage before vs after

Lane	Before	After
vLLM in-tree accuracy	Kimi-K2.5-MXFP4	(unchanged)
vLLM in-tree perf	Kimi-K2.5-MXFP4	(unchanged)
vLLM OOT accuracy	K2-Thinking + K2.5 TP8	(unchanged)
vLLM OOT perf	K2-Thinking/K2.5 TP4+TP8	(unchanged)
SGLang accuracy	none	+ K2-Thinking-MXFP4 TP8, K2.5-MXFP4 TP8
SGLang perf	none	+ K2-Thinking-MXFP4 TP8, K2.5-MXFP4 TP8

Changes

.github/workflows/atom-sglang-test.yaml — two Kimi cases added to the accuracy matrix.include (TP8, linux-atom-mi35x-8, SGLANG_USE_AITER=1).
.github/benchmark/sglang_benchmark_models.json — two Kimi perf cases (nightly_group: B, atom-mi355-8gpu-aac-runner). Group B keeps them out of the daily group-A sweep; they run in the Friday C-ALL full sweep and on manual dispatch.
.github/workflows/atom-sglang-benchmark.yaml — kimi-k2-thinking-mxfp4-tp8, kimi-k25-mxfp4-tp8 and an all-kimi option added to the oob_model_preset dropdown + selector logic.
.github/runner-config.yml — documents the runner labels actually in use but previously missing from the GPU-arch map: atom-mi355-8gpu-oot-benchmark and the linux-atom-mi35x-{1,4,8} family.

Open items for reviewers (draft)

Accuracy thresholds need calibration. 0.90 (K2-Thinking) and 0.92 (K2.5) are mirrored from the vLLM baselines as placeholders. Please recalibrate from the first green SGLang run.
Model paths / flags. Using amd/Kimi-K2-Thinking-MXFP4 and amd/Kimi-K2.5-MXFP4 with --tensor-parallel-size 8 --trust-remote-code + SGLANG_USE_AITER=1. Confirm these are the intended SGLang serving args (DeepSeek entries also set SGLANG_AITER_FP8_PREFILL_ATTN=0 / fusion flags — Kimi MXFP4 may want different env).
mi35x GPU-arch label. The linux-atom-mi35x-* pool schedules onto MI350 or MI355. I mapped it to MI355 with a comment; confirm whether the devops-dashboard supports an MI35X heterogeneous-pool value before relabeling.

Draft — opening for review of approach/config before enabling on real runners.

Add Kimi-K2-Thinking-MXFP4 and Kimi-K2.5-MXFP4 (TP8) to the SGLang regression suite, reaching parity with the existing vLLM coverage. - atom-sglang-test.yaml: add two Kimi accuracy cases to the matrix - sglang_benchmark_models.json: add two Kimi perf cases (nightly group B) - atom-sglang-benchmark.yaml: add kimi presets + all-kimi selector - runner-config.yml: document the in-use linux-atom-mi35x-{1,4,8} and atom-mi355-8gpu-oot-benchmark runner labels (GPU-arch mapping gap) Accuracy thresholds mirror the vLLM baselines and must be recalibrated from the first green SGLang run.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci(sglang): add Kimi-K2 e2e accuracy + perf regression#985

ci(sglang): add Kimi-K2 e2e accuracy + perf regression#985
sunway513 wants to merge 1 commit into
ROCm:mainfrom
sunway513:feat/sglang-kimik2-regression

sunway513 commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sunway513 commented May 30, 2026

What

Why

Coverage before vs after

Changes

Open items for reviewers (draft)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant