[sgl][recipe]update sglang-atom ds fp4 recipe by zhuyuhua-v · Pull Request #975 · ROCm/ATOM

zhuyuhua-v · 2026-05-29T08:49:41Z

PR 描述：更新 ATOM SGLang Benchmark 覆盖范围

概要

这个 PR 更新 ATOM SGLang benchmark workflow 和 DeepSeek SGLang recipe，使它们对齐当前 DeepSeek-R1-0528 FP8/MXFP4 的 benchmark 启动命令。

将 SGLang benchmark 的 server 参数显式写入 model JSON，不再依赖 SGLANG_DEFAULT_SERVER_ARGS 的隐藏默认值。
新增 DeepSeek-R1-0528 FP8 SGLang-Mesh DPA4/EP4 和 DPA8/EP8 benchmark case。
调整 scheduled benchmark：Mesh non-MTP、Mesh MTP、Mesh 全量和 OOB 全量分别在不同晚上运行。
为手动 Mesh 触发增加 FP4/FP8 独立 preset，同时保留 schedule 使用的 all、non-mtp、mtp 内部标签。
为所有 MTP benchmark case 显式设置 SGLANG_ENABLE_SPEC_V2=1，确保默认 sglang-atom 路径也开启 speculative v2。
将所有 Mesh DP+EP case 的 benchmark 请求数调整为 num_prompts=CONC*3、num_warmups=CONC，降低高并发 DP+EP case 的运行成本。
更新 DeepSeek-R1 SGLang recipe，使用 amd/DeepSeek-R1-0528-MXFP4-v2，并补充 TP/DP/EP/MTP 启动命令。

定时安排

所有 scheduled run 都在北京时间晚上 10 点触发，也就是 14:00 UTC。

Cron	覆盖范围	当前矩阵规模
`0 14 * * 1,3`	SGLang-Mesh non-MTP 配置	6 个模型，40 个 case
`0 14 * * 2,4`	SGLang-Mesh MTP 配置	4 个模型，31 个 case
`0 14 * * 5`	SGLang-Mesh 全量配置	10 个模型，71 个 case
`0 14 * * 6`	SGLang-OOB 全量配置	11 个模型，110 个 case

Mesh Preset

手动触发 Mesh benchmark 时，现在有明确的 DeepSeek preset 分组：

ds-all：所有 DeepSeek Mesh 配置，共 71 个 case。
ds-fp4-all：所有 FP4 Mesh 配置，共 62 个 case。
ds-fp8-all：所有 FP8 Mesh 配置，共 9 个 case。
ds-fp4-* preset 只选择 FP4 模型。
ds-fp8-* preset 只选择 FP8 模型。

schedule 仍然通过 model tag all、non-mtp、mtp 做分组，因此 nightly split 会在合适的分组里同时包含 FP8 和 FP4。

Benchmark 配置变化

SGLang benchmark model JSON 现在做了以下更新：

DeepSeek OOB 和 Mesh server args 统一使用 --mem-fraction-static 0.85。
Mesh 配置显式添加 --trust-remote-code。
使用 SGLANG_DEFAULT_SERVER_ARGS= 禁用 atom_sglang_test.sh 里的隐藏默认 server args。
将 benchmark 需要的运行时环境变量显式写入 JSON，包括 SGLANG_EXTERNAL_MODEL_PACKAGE 和 TORCHINDUCTOR_COMPILE_THREADS。
新增基于 deepseek-ai/DeepSeek-R1-0528 的 FP8 Mesh DPA4/EP4 和 DPA8/EP8 case。
ATOM_DUAL_STREAM_MOE_TOKEN_THRESHOLD=0 只保留在 MTP DPA/EP case 上。
所有带 --speculative-algorithm 或 mesh_spec_mode=mtp 的 case 都显式包含 SGLANG_ENABLE_SPEC_V2=1。
保留按输入输出长度区分的覆盖项：
- case_extra_args_by_pair 用于 --chunked-prefill-size 65536。
- case_env_vars_by_pair 用于 TP case 在 1x1024 下开启 ATOM_USE_FP4_TRITON_GEMM=1。

Workflow 变化

Benchmark workflow 现在：

在启动 benchmark container 前应用 case_env_vars_by_pair。
在启动 SGLang 前应用 case_extra_args_by_pair，同时覆盖 sglang-atom 和 sglang-mori 两种 server mode。
对 SGLang-Mesh 且 dp_size > 1、ep_size > 1 的 DP+EP case，默认 sglang-atom benchmark client 使用 num_prompts=CONC*3、num_warmups=CONC；其他 case 仍使用 CONC*10 和 CONC*2。
按新的 schedule cron 字符串映射 workload：
- 周一/周三、周二/周四、周五运行 SGLang-Mesh。
- 周六运行 SGLang-OOB。
规范化手动 Mesh preset，使 ds-fp4-* 和 ds-fp8-* 分别映射到对应 model tag。

Mori Mesh 脚本

atom_sglang_mesh_benchmark.sh 现在会从 SERVER_EXTRA_ARGS 中读取 --chunked-prefill-size，并转换成脚本内部的 prefill_size，而不是继续把重复的 server 参数传给 SGLang。这样 --chunked-prefill-size、--max-prefill-tokens 和 benchmark matrix 会保持一致。

脚本内的 sglang-mori benchmark client 也对 Mesh DP+EP case 使用同样的降采样策略：num_prompts=CONC*3、num_warmups=CONC；非 DP+EP case 仍保持 CONC*10 和 CONC*2。

ZhiweiYan-96 · 2026-05-29T12:20:21Z

      "1x1024": [64, 128, 256]
    },
-    "env_vars": "AITER_QUICK_REDUCE_QUANTIZATION=INT4\nSGLANG_AITER_FP8_PREFILL_ATTN=0\nSGLANG_USE_AITER=1\nATOM_ENABLE_DS_QKNORM_QUANT_FUSION=1"
+    "env_vars": "SGLANG_DEFAULT_SERVER_ARGS=\nAITER_QUICK_REDUCE_QUANTIZATION=INT4\nSGLANG_AITER_FP8_PREFILL_ATTN=0\nSGLANG_USE_AITER=1\nATOM_ENABLE_DS_QKNORM_QUANT_FUSION=1\nSGLANG_EXTERNAL_MODEL_PACKAGE=atom.plugin.sglang.models\nTORCHINDUCTOR_COMPILE_THREADS=128",


Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>

qichu-yun previously approved these changes May 29, 2026

View reviewed changes

ZLkanyo009 previously approved these changes May 29, 2026

View reviewed changes

zhuyuhua-v dismissed stale reviews from ZLkanyo009 and qichu-yun via 8954765 May 29, 2026 09:54

Update SGLang benchmark coverage

3abdf6f

zhuyuhua-v force-pushed the yuhua/sgl-recipe branch from 47fea28 to 3abdf6f Compare May 29, 2026 12:07

ZhiweiYan-96 reviewed May 29, 2026

View reviewed changes

Comment thread .github/benchmark/sglang_benchmark_models.json Outdated

ZhiweiYan-96 reviewed May 29, 2026

View reviewed changes

Comment thread .github/benchmark/sglang_benchmark_models.json Outdated

ZhiweiYan-96 reviewed May 29, 2026

View reviewed changes

Comment thread .github/benchmark/sglang_benchmark_models.json Outdated

update benchmark

c5b30dd

Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sgl][recipe]update sglang-atom ds fp4 recipe#975

[sgl][recipe]update sglang-atom ds fp4 recipe#975
zhuyuhua-v wants to merge 2 commits into
mainfrom
yuhua/sgl-recipe

zhuyuhua-v commented May 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

ZhiweiYan-96 May 29, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

zhuyuhua-v commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

概要

定时安排

Mesh Preset

Benchmark 配置变化

Workflow 变化

Mori Mesh 脚本

Uh oh!

Uh oh!

ZhiweiYan-96 May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zhuyuhua-v commented May 29, 2026 •

edited

Loading