[sgl][recipe]update sglang-atom ds fp4 recipe#975
Draft
zhuyuhua-v wants to merge 2 commits into
Draft
Conversation
qichu-yun
previously approved these changes
May 29, 2026
ZLkanyo009
previously approved these changes
May 29, 2026
47fea28 to
3abdf6f
Compare
| "1x1024": [64, 128, 256] | ||
| }, | ||
| "env_vars": "AITER_QUICK_REDUCE_QUANTIZATION=INT4\nSGLANG_AITER_FP8_PREFILL_ATTN=0\nSGLANG_USE_AITER=1\nATOM_ENABLE_DS_QKNORM_QUANT_FUSION=1" | ||
| "env_vars": "SGLANG_DEFAULT_SERVER_ARGS=\nAITER_QUICK_REDUCE_QUANTIZATION=INT4\nSGLANG_AITER_FP8_PREFILL_ATTN=0\nSGLANG_USE_AITER=1\nATOM_ENABLE_DS_QKNORM_QUANT_FUSION=1\nSGLANG_EXTERNAL_MODEL_PACKAGE=atom.plugin.sglang.models\nTORCHINDUCTOR_COMPILE_THREADS=128", |
Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR 描述:更新 ATOM SGLang Benchmark 覆盖范围
概要
这个 PR 更新 ATOM SGLang benchmark workflow 和 DeepSeek SGLang recipe,使它们对齐当前 DeepSeek-R1-0528 FP8/MXFP4 的 benchmark 启动命令。
SGLANG_DEFAULT_SERVER_ARGS的隐藏默认值。all、non-mtp、mtp内部标签。SGLANG_ENABLE_SPEC_V2=1,确保默认sglang-atom路径也开启 speculative v2。num_prompts=CONC*3、num_warmups=CONC,降低高并发 DP+EP case 的运行成本。amd/DeepSeek-R1-0528-MXFP4-v2,并补充 TP/DP/EP/MTP 启动命令。定时安排
所有 scheduled run 都在北京时间晚上 10 点触发,也就是
14:00 UTC。0 14 * * 1,30 14 * * 2,40 14 * * 50 14 * * 6Mesh Preset
手动触发 Mesh benchmark 时,现在有明确的 DeepSeek preset 分组:
ds-all:所有 DeepSeek Mesh 配置,共 71 个 case。ds-fp4-all:所有 FP4 Mesh 配置,共 62 个 case。ds-fp8-all:所有 FP8 Mesh 配置,共 9 个 case。ds-fp4-*preset 只选择 FP4 模型。ds-fp8-*preset 只选择 FP8 模型。schedule 仍然通过 model tag
all、non-mtp、mtp做分组,因此 nightly split 会在合适的分组里同时包含 FP8 和 FP4。Benchmark 配置变化
SGLang benchmark model JSON 现在做了以下更新:
--mem-fraction-static 0.85。--trust-remote-code。SGLANG_DEFAULT_SERVER_ARGS=禁用atom_sglang_test.sh里的隐藏默认 server args。SGLANG_EXTERNAL_MODEL_PACKAGE和TORCHINDUCTOR_COMPILE_THREADS。deepseek-ai/DeepSeek-R1-0528的 FP8 Mesh DPA4/EP4 和 DPA8/EP8 case。ATOM_DUAL_STREAM_MOE_TOKEN_THRESHOLD=0只保留在 MTP DPA/EP case 上。--speculative-algorithm或mesh_spec_mode=mtp的 case 都显式包含SGLANG_ENABLE_SPEC_V2=1。case_extra_args_by_pair用于--chunked-prefill-size 65536。case_env_vars_by_pair用于 TP case 在1x1024下开启ATOM_USE_FP4_TRITON_GEMM=1。Workflow 变化
Benchmark workflow 现在:
case_env_vars_by_pair。case_extra_args_by_pair,同时覆盖sglang-atom和sglang-mori两种 server mode。SGLang-Mesh且dp_size > 1、ep_size > 1的 DP+EP case,默认sglang-atombenchmark client 使用num_prompts=CONC*3、num_warmups=CONC;其他 case 仍使用CONC*10和CONC*2。SGLang-Mesh。SGLang-OOB。ds-fp4-*和ds-fp8-*分别映射到对应 model tag。Mori Mesh 脚本
atom_sglang_mesh_benchmark.sh现在会从SERVER_EXTRA_ARGS中读取--chunked-prefill-size,并转换成脚本内部的prefill_size,而不是继续把重复的 server 参数传给 SGLang。这样--chunked-prefill-size、--max-prefill-tokens和 benchmark matrix 会保持一致。脚本内的
sglang-moribenchmark client 也对 Mesh DP+EP case 使用同样的降采样策略:num_prompts=CONC*3、num_warmups=CONC;非 DP+EP case 仍保持CONC*10和CONC*2。