Skip to content

[None][test] Add MLA chunked-prefill SM dispatch regression coverage#13904

Open
DhineshPonnarasan wants to merge 3 commits intoNVIDIA:mainfrom
DhineshPonnarasan:fix/mla-sm90-dispatch-regression-test
Open

[None][test] Add MLA chunked-prefill SM dispatch regression coverage#13904
DhineshPonnarasan wants to merge 3 commits intoNVIDIA:mainfrom
DhineshPonnarasan:fix/mla-sm90-dispatch-regression-test

Conversation

@DhineshPonnarasan
Copy link
Copy Markdown

@DhineshPonnarasan DhineshPonnarasan commented May 8, 2026

Related to #12502

Summary

Adds focused regression coverage for the existing SM-gated MLA chunked-prefill dispatch behavior in the PyTorch backend.

The runtime mitigation already exists upstream:

  • SM90 falls back to the cached-KV MLA path
  • chunked-prefill MLA dispatch is gated behind SM >= 100

This PR intentionally does not change runtime behavior. It adds a targeted unit regression test to protect the intended dispatch contract from accidental behavioral drift during future refactors.

What Changed

Added test_mla_chunked_prefill_dispatch_by_sm in:

  • tests/unittest/_torch/attention/test_attention_mla.py

The test verifies:

  • SM90 routes MLA chunked-prefill through forward_context_with_cached_kv
  • SM100 routes MLA chunked-prefill through forward_context_with_chunked_prefill

The test uses monkeypatch-based dispatch validation and does not require GPU execution.

Testing

  • Ran pre-commit hooks on the modified test file
  • Verified focused staged diff contains only regression test coverage
  • Prepared targeted pytest validation:
pytest tests/unittest/_torch/attention/test_attention_mla.py -k "test_mla_chunked_prefill_dispatch_by_sm" -v

Rationale

The SM-version dispatch gate is a correctness safeguard for Hopper (SM90), where MLA chunked-prefill online-softmax merging can produce inaccurate results.

Although the mitigation already exists upstream, there was no focused regression coverage protecting this behavior. This PR codifies the intended dispatch contract as an automated test while intentionally avoiding runtime logic changes.


Requesting review from @kaiyux @akhoroshev since this regression coverage is related to the existing SM-gated MLA dispatch mitigation discussed in #12502.

Signed-off-by: Dhinesh Ponnarasan <dhineshponnarasan@gmail.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 8, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This pull request adds a parametrized pytest test for Multi-Head Latent Attention (MLA) dispatch behavior. The test monkeypatches attention module dependencies to simulate SM-version-specific code paths and validates that Attention.forward_context returns the correct dispatch result based on simulated GPU compute capability.

Changes

MLA Dispatch Test

Layer / File(s) Summary
Dispatch Test by SM Version
tests/unittest/_torch/attention/test_attention_mla.py
New parametrized test test_mla_chunked_prefill_dispatch_by_sm monkeypatches attention class, metadata, and SM version getter; constructs synthetic tensors; calls forward_context; and asserts returned dispatch path (cached_kv for SM 90, chunked_prefill for SM 100).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding a regression test for MLA chunked-prefill SM dispatch behavior.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description is well-structured, comprehensive, and follows the template guidelines with clear sections for Summary, What Changed, Testing, and Rationale.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/unittest/_torch/attention/test_attention_mla.py (2)

359-360: ⚡ Quick win

Add type annotations to the new test signature

The new test function should include argument and return type annotations for consistency with repository Python typing rules.

✍️ Suggested diff
 def test_mla_chunked_prefill_dispatch_by_sm(sm_version, expected_path,
-                                            monkeypatch):
+                                            monkeypatch: pytest.MonkeyPatch) -> None:
As per coding guidelines, Python code should use type annotations for all function arguments and return types.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unittest/_torch/attention/test_attention_mla.py` around lines 359 -
360, The test function test_mla_chunked_prefill_dispatch_by_sm should be updated
to include Python type annotations for its parameters and return type: annotate
sm_version, expected_path, and monkeypatch with the appropriate types used in
tests (e.g., int/str/Path/pytest.MonkeyPatch or more specific fixtures) and add
-> None as the return type; modify the function signature for
test_mla_chunked_prefill_dispatch_by_sm accordingly so it matches repository
typing conventions.

352-421: Good regression scope for dispatch gating; QA list updates are not needed

This unit test cleanly validates the SM gate without requiring GPU execution, and no tests/integration/test_lists/qa/* update is needed for this PR scope.

As per coding guidelines, unittest-only changes do not require QA integration list updates unless coverage is being promoted to integration/QA suites.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unittest/_torch/attention/test_attention_mla.py` around lines 352 -
421, The test correctly validates the SM-version dispatch gate without GPU and
requires no changes to QA integration lists; leave the test in
tests/unittest/_torch/attention/test_attention_mla.py as a unit test (do not
move it to integration), ensure the monkeypatches target
attention_module.TrtllmAttention, TrtllmAttentionMetadata and get_sm_version as
shown so the test runs offline, and do not add or update any
tests/integration/test_lists/qa/* entries for this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/unittest/_torch/attention/test_attention_mla.py`:
- Around line 352-357: Update the pytest parametrization in the
test_attention_mla.py block that defines "sm_version,expected_path" to include a
near-threshold case for sm_version=99 expecting "cached_kv"; specifically add
the tuple (99, "cached_kv") alongside the existing (90, "cached_kv") and (100,
"chunked_prefill") entries so the SM >= 100 boundary is explicitly covered in
the test that uses these parameters.

---

Nitpick comments:
In `@tests/unittest/_torch/attention/test_attention_mla.py`:
- Around line 359-360: The test function test_mla_chunked_prefill_dispatch_by_sm
should be updated to include Python type annotations for its parameters and
return type: annotate sm_version, expected_path, and monkeypatch with the
appropriate types used in tests (e.g., int/str/Path/pytest.MonkeyPatch or more
specific fixtures) and add -> None as the return type; modify the function
signature for test_mla_chunked_prefill_dispatch_by_sm accordingly so it matches
repository typing conventions.
- Around line 352-421: The test correctly validates the SM-version dispatch gate
without GPU and requires no changes to QA integration lists; leave the test in
tests/unittest/_torch/attention/test_attention_mla.py as a unit test (do not
move it to integration), ensure the monkeypatches target
attention_module.TrtllmAttention, TrtllmAttentionMetadata and get_sm_version as
shown so the test runs offline, and do not add or update any
tests/integration/test_lists/qa/* entries for this change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8a8b67ed-dead-4605-9012-7b5eba935e22

📥 Commits

Reviewing files that changed from the base of the PR and between 2e4b05c and 6bf84c5.

📒 Files selected for processing (1)
  • tests/unittest/_torch/attention/test_attention_mla.py

Comment thread tests/unittest/_torch/attention/test_attention_mla.py
@svc-trtllm-gh-bot svc-trtllm-gh-bot added the Community want to contribute PRs initiated from Community label May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Community want to contribute PRs initiated from Community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants