Skip to content

[Feat] Fused qknorm + quant for dpsk v2 model#963

Open
qichu-yun wants to merge 2 commits into
mainfrom
fuse_quant
Open

[Feat] Fused qknorm + quant for dpsk v2 model#963
qichu-yun wants to merge 2 commits into
mainfrom
fuse_quant

Conversation

@qichu-yun
Copy link
Copy Markdown
Contributor

Motivation

Improve DeepSeek V2 performance by enabling fused q/k RMSNorm + quantization for both FP8 and MXFP4 paths.

Technical Details

This commit enables fuse_qknorm_quant for MXFP4, preserves unshuffled MXFP4 attention projection weights for raw-scale GEMM, and routes SGLang q_b_proj through the proper FP4 GEMM path when fused quantization is used. Existing non-fused and fallback paths remain unchanged.

Test Result

before:
image

after:
image

Submission Checklist

@qichu-yun qichu-yun force-pushed the fuse_quant branch 3 times, most recently from a984baa to 968577e Compare May 29, 2026 12:50
)
return y

q_scale = fp4_utils.e8m0_shuffle(q_scale.view(torch.float8_e8m0fnu))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should better remove e8m0_shuffle by other fusion

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This e8m0_shuffle() is only on the fallback path. The native SGLang-style FP4 GEMM path consumes raw q_scale directly and returns before this line, so the optimized path has no post-shuffle. It is only used when we fall back to ATOM q_b_proj, which still requires shuffled scale layout. Removing it without removing/replacing the fallback would reintroduce the scale-layout correctness issue.

valarLip
valarLip previously approved these changes Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants