[https://nvbugs/6159132][fix] Differentiate the two paths via extra_acc_spec="tp_attn" when attention_dp=False by tensorrt-cicd · Pull Request #13922 · NVIDIA/TensorRT-LLM

tensorrt-cicd · 2026-05-08T21:12:37Z

Summary

Root cause: The GSM8K reference accuracy (93.75) for MiniMax-M2 FP8_BLOCK_SCALES was set based on the attention_dp=True path, but the attention_dp=False path uses the fused minimax_allreduce_rms_qk kernel whose numerics produce ~90.49, below the derived threshold of 90.547.
Fix: Differentiate the two paths via extra_acc_spec="tp_attn" when attention_dp=False, add a dedicated reference entry of 92.0 in gsm8k.yaml (threshold 88.797), and remove the existing waiver.
Automated fix generated by repair-bot

Test plan

Verify fix on the same GPU type as the original failure
Check for regressions in related tests

Links

Bug: https://nvbugs/6159132

…ion path The attention_dp=False variant of TestMiniMaxM2::test_4gpus uses the fused minimax_allreduce_rms_qk kernel for QK norm, which is numerically less precise than the per-rank RMSNorm path selected by attention_dp=True. The shared reference of 93.75 resulted in a threshold of 90.547 while the observed accuracy on the TP-sharded path is ~90.49, causing flaky failures. Differentiate the two paths via extra_acc_spec='tp_attn' and register a lower reference (92.0) for the TP-sharded path. Signed-off-by: tensorrt-cicd <90828364+tensorrt-cicd@users.noreply.github.com>

coderabbitai · 2026-05-08T21:19:22Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 88195fe7-e50b-4d9a-b0d8-1bc3af117f37

📥 Commits

Reviewing files that changed from the base of the PR and between f8572ab and 17de91f.

📒 Files selected for processing (3)

tests/integration/defs/accuracy/references/gsm8k.yaml
tests/integration/defs/accuracy/test_llm_api_pytorch.py
tests/integration/test_lists/waives.txt

💤 Files with no reviewable changes (1)

tests/integration/test_lists/waives.txt

📝 Walkthrough

Walkthrough

This PR fixes MiniMax-M2 TP-sharded accuracy testing by adding a baseline accuracy reference with corrected quantization configuration, modifying the test to conditionally specify the TP-attention extra_acc_spec parameter based on the attention_dp flag, and removing the associated test waiver.

Changes

MiniMax-M2 TP-Sharded Accuracy Testing

Layer / File(s)	Summary
Accuracy Baseline Reference `tests/integration/defs/accuracy/references/gsm8k.yaml`	Adds FP8_BLOCK_SCALES quantization accuracy entry with `extra_acc_spec: tp_attn` configuration, setting expected accuracy to 92.0 for MiniMax-M2.
Test Implementation `tests/integration/defs/accuracy/test_llm_api_pytorch.py`	Updates TestMiniMaxM2.test_4gpus to conditionally pass `extra_acc_spec="tp_attn"` when `attention_dp` is disabled, with comment documenting TP-sharded fused-kernel numerics differences.
Test Waiver Removal `tests/integration/test_lists/waives.txt`	Removes skip waiver for TestMiniMaxM2::test_4gpus[attention_dp=False-...] since the test should now pass with corrected extra_acc_spec handling.

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The PR description explains the root cause, solution, test plan, and provides a bug link, but does not follow the template structure with explicit Description, Test Coverage, and Checklist sections.	Reformat the description to follow the template structure: add a Description section, clarify Test Coverage section, and include the PR Checklist with checkmarks.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: differentiating two execution paths using extra_acc_spec="tp_attn" when attention_dp=False.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Timed out fetching pipeline failures after 30000ms

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tensorrt-cicd requested a review from a team as a code owner May 8, 2026 21:12

github-actions Bot assigned tensorrt-cicd May 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[https://nvbugs/6159132][fix] Differentiate the two paths via extra_acc_spec="tp_attn" when attention_dp=False#13922

[https://nvbugs/6159132][fix] Differentiate the two paths via extra_acc_spec="tp_attn" when attention_dp=False#13922
tensorrt-cicd wants to merge 1 commit intoNVIDIA:mainfrom
tensorrt-cicd:repair-bot-bug6159132

tensorrt-cicd commented May 8, 2026

Uh oh!

coderabbitai Bot commented May 8, 2026

Walkthrough

Changes

❌ Failed checks (1 warning, 1 inconclusive)

Review ran into problems

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tensorrt-cicd commented May 8, 2026

Summary

Test plan

Links

Uh oh!

coderabbitai Bot commented May 8, 2026

Walkthrough

Changes

❌ Failed checks (1 warning, 1 inconclusive)

Review ran into problems

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant