[None][feat] Support NVFP4 dsv4 by Tracin · Pull Request #14026 · NVIDIA/TensorRT-LLM

Tracin · 2026-05-12T03:20:47Z

@coderabbitai summary

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>

…P4 block-scale kernel without bias The TRTLLM-Gen fp4-block-scale fused-MoE kernel (`run_fp4_block_scale_moe` in `fused_moe_trtllm_gen.py:760`, used for NVFP4 / W4A16-MXFP4 / W4A8-MXFP4-MXFP8) only implements the GPT-OSS-style SwiGLU clamping path, which expects `bias`, `swiglu_alpha`, `swiglu_beta` to be set alongside `swiglu_limit`. DeepSeek-V4 has no FFN bias, so passing `swiglu_limit` through that kernel collapses the routed-experts output to all zeros. `tests/unittest/_torch/modules/test_fused_moe.py::test_fused_moe_nvfp4` only exercises `swiglu_limit` via `gptoss_style=True` (which also sets the bias and alpha/beta tensors), so this bias-free combination is uncovered by tests. Minimal repro (single-rank, layer-0 only): python scripts/dsv4_nvfp4_one_layer_repro.py --routing dsv4-hashed python scripts/dsv4_nvfp4_one_layer_repro.py --routing dsv4-hashed \ --swiglu-limit 10.0 The only difference between these two invocations is the `--swiglu-limit` flag, and that single change flips the routed-MoE output absmax from ~0.17 to 0. Workaround applied here: when `moe_cls` resolves to TRTLLMGenFusedMoE or WideEPMoE and the experts quant mode uses any FP4 block-scale variant, do not construct the `moe_swiglu_limit` tensor. The shared experts still apply the limit via the standard `GatedMLP` linear stack below, so only the routed-experts pre-activation clamping is dropped. Verified on a 2-layer truncated `flash-nvfp4-experts-v3.5` checkpoint via the DSv4 dump infra: `layer.0.routed_output` went from absmax=0, nnz=0/32768 to absmax=0.64, nnz=32768/32768. The proper long-term fix is on the kernel side -- either add a no-bias clamping variant to `run_fp4_block_scale_moe`, have the kernel skip the clamp when bias is None, or enforce that all four (bias, swiglu_alpha, swiglu_beta, swiglu_limit) be set together and gate the construction at the Python level so the silent-zero failure mode becomes impossible. Signed-off-by: Barry Kang <jinshik@nvidia.com>

Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>

Tracin and others added 3 commits May 11, 2026 20:19

DSV4 support NVFP4.

bb8d935

Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>

DSV4 support NVFP4.

1994149

Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>

Tracin requested review from a team as code owners May 12, 2026 03:20

Tracin requested review from HuiGao-NV and syuoni and removed request for a team May 12, 2026 03:20

github-actions Bot assigned Tracin May 12, 2026

Tracin added 4 commits May 11, 2026 20:24

Rebase branch.

9c0b9f9

Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>

Make code clean.

fa6226a

Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>

Make code clean.

864191d

Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>

Make code clean.

e8ce5c7

Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>

Tracin assigned lfr-0531 May 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[None][feat] Support NVFP4 dsv4#14026

[None][feat] Support NVFP4 dsv4#14026
Tracin wants to merge 7 commits into
NVIDIA:feat/deepseek_v4from
Tracin:nvfp4_dsv4

Tracin commented May 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Tracin commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Tracin commented May 12, 2026 •

edited

Loading