[Feat] enable fuse share expert in DeepSeek-R1-0528-MXFP4-v2 in sgl atom by ZLkanyo009 · Pull Request #958 · ROCm/ATOM

ZLkanyo009 · 2026-05-28T07:46:43Z

Motivation

In DeepSeek-R1-0528-MXFP4-v2, layer 61 has a situation where shared experts and routed experts are of different types, while the other layers still have the same type. The original is_rocm_aiter_fusion_shared_expert_enabled() function will cause all layers to skip shared expert fusion. This PR enables shared expert fusion for the compatible layers in DeepSeek-R1-0528-MXFP4-v2. For other models, the original behavior is preserved to ensure compatibility.

Command

export CUDA_VISIBLE_DEVICES=0,1,2,3

export AITER_QUICK_REDUCE_QUANTIZATION=INT4
export SGLANG_AITER_FP8_PREFILL_ATTN=0
export SGLANG_USE_AITER=1
export ATOM_ENABLE_DS_QKNORM_QUANT_FUSION=1

model_path=/models/deepseek-ai/DeepSeek-R1-0528-MXFP4-v2

export SGLANG_EXTERNAL_MODEL_PACKAGE=atom.plugin.sglang.models
 
TORCHINDUCTOR_COMPILE_THREADS=128 python3 -m sglang.launch_server \
    --model-path $model_path \
    --host localhost \
    --port 8000 \
    --trust-remote-code \
    --tensor-parallel-size 4 \
    --mem-fraction-static 0.9 \
    --disable-radix-cache \
    --attention-backend aiter \
    --kv-cache-dtype fp8_e4m3 \
    --max-running-requests 128

export CUDA_VISIBLE_DEVICES=0,1,2,3

export AITER_QUICK_REDUCE_QUANTIZATION=INT4
export SGLANG_AITER_FP8_PREFILL_ATTN=0
export SGLANG_USE_AITER=1
export ATOM_ENABLE_DS_QKNORM_QUANT_FUSION=1

export MORI_SHMEM_MODE=ISOLATION

model_path=/models/deepseek-ai/DeepSeek-R1-0528-MXFP4-v2

export SGLANG_EXTERNAL_MODEL_PACKAGE=atom.plugin.sglang.models
 
TORCHINDUCTOR_COMPILE_THREADS=128 python3 -m sglang.launch_server \
    --model-path $model_path \
    --host localhost \
    --port 8000 \
    --trust-remote-code \
    --tensor-parallel-size 4 \
    --data-parallel-size 4 \
    --expert-parallel-size 4 \
    --enable-dp-attention \
    --mem-fraction-static 0.9 \
    --disable-radix-cache \
    --attention-backend aiter \
    --kv-cache-dtype fp8_e4m3 \
    --max-running-requests 128

Performance

before:

The shared expert is not fused into the MoE.

after:

The shared expert is fused into the MoE.

[Feat] enable fuse share expert in DeepSeek-R1-0528-MXFP4-v2 in sgl atom

94417a5

ZLkanyo009 force-pushed the lingzha/fuse-share-expert branch from da41796 to 94417a5 Compare May 28, 2026 07:51

qichu-yun mentioned this pull request May 28, 2026

[Fix] Enable dpsk r1 mxfp4 V2 model #934

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] enable fuse share expert in DeepSeek-R1-0528-MXFP4-v2 in sgl atom#958

[Feat] enable fuse share expert in DeepSeek-R1-0528-MXFP4-v2 in sgl atom#958
ZLkanyo009 wants to merge 1 commit into
mainfrom
lingzha/fuse-share-expert

ZLkanyo009 commented May 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ZLkanyo009 commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Command

Performance

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ZLkanyo009 commented May 28, 2026 •

edited

Loading