[Feat] Fused qknorm + quant for dpsk v2 model by qichu-yun · Pull Request #963 · ROCm/ATOM

qichu-yun · 2026-05-28T10:10:06Z

Motivation

Improve DeepSeek V2 performance by enabling fused q/k RMSNorm + quantization for both FP8 and MXFP4 paths.

Technical Details

This commit enables fuse_qknorm_quant for MXFP4, preserves unshuffled MXFP4 attention projection weights for raw-scale GEMM, and routes SGLang q_b_proj through the proper FP4 GEMM path when fused quantization is used. Existing non-fused and fallback paths remain unchanged.

Test Result

before：

after:

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

valarLip · 2026-05-29T13:17:32Z

+        )
+        return y
+
+    q_scale = fp4_utils.e8m0_shuffle(q_scale.view(torch.float8_e8m0fnu))


we should better remove e8m0_shuffle by other fusion

This e8m0_shuffle() is only on the fallback path. The native SGLang-style FP4 GEMM path consumes raw q_scale directly and returns before this line, so the optimized path has no post-shuffle. It is only used when we fall back to ATOM q_b_proj, which still requires shuffled scale layout. Removing it without removing/replacing the fallback would reintroduce the scale-layout correctness issue.

qichu-yun force-pushed the fuse_quant branch 3 times, most recently from a984baa to 968577e Compare May 29, 2026 12:50

valarLip requested changes May 29, 2026

View reviewed changes

valarLip previously approved these changes Jun 1, 2026

View reviewed changes

qichu-yun added 2 commits June 2, 2026 02:57

[Feat] Fused qknorm + quant for dpsk v2 model

06060c2

[Fix] Localize SGLang MXFP4 projection preservation

118e3c7

qichu-yun dismissed valarLip’s stale review via 118e3c7 June 2, 2026 08:08

qichu-yun force-pushed the fuse_quant branch from 968577e to 118e3c7 Compare June 2, 2026 08:08

zhuyuhua-v requested a review from valarLip June 2, 2026 08:44

valarLip approved these changes Jun 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Fused qknorm + quant for dpsk v2 model#963

[Feat] Fused qknorm + quant for dpsk v2 model#963
qichu-yun wants to merge 2 commits into
mainfrom
fuse_quant

qichu-yun commented May 28, 2026

Uh oh!

valarLip May 29, 2026

Uh oh!

qichu-yun Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

qichu-yun commented May 28, 2026

Motivation

Technical Details

Test Result

Submission Checklist

Uh oh!

valarLip May 29, 2026

Choose a reason for hiding this comment

Uh oh!

qichu-yun Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants