[None][fix] Run DeepSeek V4 gate test on CUDA by lfr-0531 · Pull Request #13932 · NVIDIA/TensorRT-LLM

lfr-0531 · 2026-05-09T06:14:27Z

Description

The DeepSeek V4 gate unit test was added to the multi-GPU CI test list, where it exposed that test_deepseek_v4_gate_uses_fp32_reference_linear constructed the gate and input tensors on CPU.

DeepseekV4Gate.forward() calls trtllm::dsv3_router_gemm_op, which is a CUDA-only custom op. This caused the B200 multi-GPU stage to fail with a CPU backend dispatch error.

This PR updates the test to run the gate, hidden states, and reference weights on CUDA, and skips the test when CUDA is unavailable.

Test Coverage

python3 -m pytest -q tests/unittest/_torch/modeling/test_modeling_deepseekv4.py::test_deepseek_v4_gate_uses_fp32_reference_linear
python3 -m pytest -q tests/unittest/_torch/modeling/test_modeling_deepseekv4.py
pre-commit run --files tests/unittest/_torch/modeling/test_modeling_deepseekv4.py

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>

lfr-0531 · 2026-05-09T06:15:22Z

/bot run --add-multi-gpu-test

tensorrt-cicd · 2026-05-09T06:23:00Z

PR_Github #47482 [ run ] triggered by Bot. Commit: ab7217e Link to invocation

[None][fix] Run DeepSeek V4 gate test on CUDA

ab7217e

Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>

github-actions Bot assigned lfr-0531 May 9, 2026

lfr-0531 requested a review from Shixiaowei02 May 9, 2026 06:15

longlee0622 approved these changes May 9, 2026

View reviewed changes

Shixiaowei02 approved these changes May 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[None][fix] Run DeepSeek V4 gate test on CUDA#13932

[None][fix] Run DeepSeek V4 gate test on CUDA#13932
lfr-0531 wants to merge 1 commit intoNVIDIA:feat/deepseek_v4from
lfr-0531:user/fanrongl/fix-dsv4-gate-ci

lfr-0531 commented May 9, 2026

Uh oh!

lfr-0531 commented May 9, 2026

Uh oh!

tensorrt-cicd commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

lfr-0531 commented May 9, 2026

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

lfr-0531 commented May 9, 2026

Uh oh!

tensorrt-cicd commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants