[None][test] Add MLA chunked-prefill SM dispatch regression coverage by DhineshPonnarasan · Pull Request #13904 · NVIDIA/TensorRT-LLM

DhineshPonnarasan · 2026-05-08T12:14:56Z

Related to #12502

Summary

Adds focused regression coverage for the existing SM-gated MLA chunked-prefill dispatch behavior in the PyTorch backend.

The runtime mitigation already exists upstream:

SM90 falls back to the cached-KV MLA path
chunked-prefill MLA dispatch is gated behind SM >= 100

This PR intentionally does not change runtime behavior. It adds a targeted unit regression test to protect the intended dispatch contract from accidental behavioral drift during future refactors.

What Changed

Added test_mla_chunked_prefill_dispatch_by_sm in:

tests/unittest/_torch/attention/test_attention_mla.py

The test verifies:

SM90 routes MLA chunked-prefill through forward_context_with_cached_kv
SM100 routes MLA chunked-prefill through forward_context_with_chunked_prefill

The test uses monkeypatch-based dispatch validation and does not require GPU execution.

Testing

Ran pre-commit hooks on the modified test file
Verified focused staged diff contains only regression test coverage
Prepared targeted pytest validation:

pytest tests/unittest/_torch/attention/test_attention_mla.py -k "test_mla_chunked_prefill_dispatch_by_sm" -v

Rationale

The SM-version dispatch gate is a correctness safeguard for Hopper (SM90), where MLA chunked-prefill online-softmax merging can produce inaccurate results.

Although the mitigation already exists upstream, there was no focused regression coverage protecting this behavior. This PR codifies the intended dispatch contract as an automated test while intentionally avoiding runtime logic changes.

Requesting review from @kaiyux @akhoroshev since this regression coverage is related to the existing SM-gated MLA dispatch mitigation discussed in #12502.

Signed-off-by: Dhinesh Ponnarasan <dhineshponnarasan@gmail.com>

coderabbitai · 2026-05-08T12:21:46Z

📝 Walkthrough

Walkthrough

This pull request adds a parametrized pytest test for Multi-Head Latent Attention (MLA) dispatch behavior. The test monkeypatches attention module dependencies to simulate SM-version-specific code paths and validates that Attention.forward_context returns the correct dispatch result based on simulated GPU compute capability.

Changes

MLA Dispatch Test

Layer / File(s)	Summary
Dispatch Test by SM Version `tests/unittest/_torch/attention/test_attention_mla.py`	New parametrized test `test_mla_chunked_prefill_dispatch_by_sm` monkeypatches attention class, metadata, and SM version getter; constructs synthetic tensors; calls `forward_context`; and asserts returned dispatch path (`cached_kv` for SM 90, `chunked_prefill` for SM 100).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: adding a regression test for MLA chunked-prefill SM dispatch behavior.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description is well-structured, comprehensive, and follows the template guidelines with clear sections for Summary, What Changed, Testing, and Rationale.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

tests/unittest/_torch/attention/test_attention_mla.py (2)
359-360: ⚡ Quick win

Add type annotations to the new test signature

The new test function should include argument and return type annotations for consistency with repository Python typing rules.
✍️ Suggested diff
 def test_mla_chunked_prefill_dispatch_by_sm(sm_version, expected_path,
-                                            monkeypatch):
+                                            monkeypatch: pytest.MonkeyPatch) -> None:
As per coding guidelines, Python code should use type annotations for all function arguments and return types.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unittest/_torch/attention/test_attention_mla.py` around lines 359 -
360, The test function test_mla_chunked_prefill_dispatch_by_sm should be updated
to include Python type annotations for its parameters and return type: annotate
sm_version, expected_path, and monkeypatch with the appropriate types used in
tests (e.g., int/str/Path/pytest.MonkeyPatch or more specific fixtures) and add
-> None as the return type; modify the function signature for
test_mla_chunked_prefill_dispatch_by_sm accordingly so it matches repository
typing conventions.
352-421: Good regression scope for dispatch gating; QA list updates are not needed

This unit test cleanly validates the SM gate without requiring GPU execution, and no tests/integration/test_lists/qa/* update is needed for this PR scope.

As per coding guidelines, unittest-only changes do not require QA integration list updates unless coverage is being promoted to integration/QA suites.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unittest/_torch/attention/test_attention_mla.py` around lines 352 -
421, The test correctly validates the SM-version dispatch gate without GPU and
requires no changes to QA integration lists; leave the test in
tests/unittest/_torch/attention/test_attention_mla.py as a unit test (do not
move it to integration), ensure the monkeypatches target
attention_module.TrtllmAttention, TrtllmAttentionMetadata and get_sm_version as
shown so the test runs offline, and do not add or update any
tests/integration/test_lists/qa/* entries for this change.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/unittest/_torch/attention/test_attention_mla.py`:
- Around line 352-357: Update the pytest parametrization in the
test_attention_mla.py block that defines "sm_version,expected_path" to include a
near-threshold case for sm_version=99 expecting "cached_kv"; specifically add
the tuple (99, "cached_kv") alongside the existing (90, "cached_kv") and (100,
"chunked_prefill") entries so the SM >= 100 boundary is explicitly covered in
the test that uses these parameters.

---

Nitpick comments:
In `@tests/unittest/_torch/attention/test_attention_mla.py`:
- Around line 359-360: The test function test_mla_chunked_prefill_dispatch_by_sm
should be updated to include Python type annotations for its parameters and
return type: annotate sm_version, expected_path, and monkeypatch with the
appropriate types used in tests (e.g., int/str/Path/pytest.MonkeyPatch or more
specific fixtures) and add -> None as the return type; modify the function
signature for test_mla_chunked_prefill_dispatch_by_sm accordingly so it matches
repository typing conventions.
- Around line 352-421: The test correctly validates the SM-version dispatch gate
without GPU and requires no changes to QA integration lists; leave the test in
tests/unittest/_torch/attention/test_attention_mla.py as a unit test (do not
move it to integration), ensure the monkeypatches target
attention_module.TrtllmAttention, TrtllmAttentionMetadata and get_sm_version as
shown so the test runs offline, and do not add or update any
tests/integration/test_lists/qa/* entries for this change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8a8b67ed-dead-4605-9012-7b5eba935e22

📥 Commits

Reviewing files that changed from the base of the PR and between 2e4b05c and 6bf84c5.

📒 Files selected for processing (1)

tests/unittest/_torch/attention/test_attention_mla.py

Signed-off-by: Dhinesh Ponnarasan <dhineshponnarasan@gmail.com>

[None][test] Add SM-gated MLA chunked-prefill dispatch regression test

6bf84c5

Signed-off-by: Dhinesh Ponnarasan <dhineshponnarasan@gmail.com>

github-actions Bot assigned DhineshPonnarasan May 8, 2026

DhineshPonnarasan mentioned this pull request May 8, 2026

[Bug]: SM90 chunked prefill mla incorrect results #12502

Open

5 tasks

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

Comment thread tests/unittest/_torch/attention/test_attention_mla.py

svc-trtllm-gh-bot added the Community want to contribute PRs initiated from Community label May 8, 2026

DhineshPonnarasan and others added 2 commits May 8, 2026 15:19

Merge branch 'main' into fix/mla-sm90-dispatch-regression-test

d5a2819

[None][test] Add SM99 boundary case to MLA dispatch regression test

2073134

Signed-off-by: Dhinesh Ponnarasan <dhineshponnarasan@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[None][test] Add MLA chunked-prefill SM dispatch regression coverage#13904

[None][test] Add MLA chunked-prefill SM dispatch regression coverage#13904
DhineshPonnarasan wants to merge 3 commits intoNVIDIA:mainfrom
DhineshPonnarasan:fix/mla-sm90-dispatch-regression-test

DhineshPonnarasan commented May 8, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 8, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DhineshPonnarasan commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

Testing

Rationale

Uh oh!

coderabbitai Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DhineshPonnarasan commented May 8, 2026 •

edited

Loading

coderabbitai Bot commented May 8, 2026 •

edited

Loading