[None][test] Add MLA chunked-prefill SM dispatch regression coverage#13904
[None][test] Add MLA chunked-prefill SM dispatch regression coverage#13904DhineshPonnarasan wants to merge 3 commits intoNVIDIA:mainfrom
Conversation
Signed-off-by: Dhinesh Ponnarasan <dhineshponnarasan@gmail.com>
📝 WalkthroughWalkthroughThis pull request adds a parametrized pytest test for Multi-Head Latent Attention (MLA) dispatch behavior. The test monkeypatches attention module dependencies to simulate SM-version-specific code paths and validates that ChangesMLA Dispatch Test
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tests/unittest/_torch/attention/test_attention_mla.py (2)
359-360: ⚡ Quick winAdd type annotations to the new test signature
The new test function should include argument and return type annotations for consistency with repository Python typing rules.
As per coding guidelines, Python code should use type annotations for all function arguments and return types.✍️ Suggested diff
def test_mla_chunked_prefill_dispatch_by_sm(sm_version, expected_path, - monkeypatch): + monkeypatch: pytest.MonkeyPatch) -> None:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unittest/_torch/attention/test_attention_mla.py` around lines 359 - 360, The test function test_mla_chunked_prefill_dispatch_by_sm should be updated to include Python type annotations for its parameters and return type: annotate sm_version, expected_path, and monkeypatch with the appropriate types used in tests (e.g., int/str/Path/pytest.MonkeyPatch or more specific fixtures) and add -> None as the return type; modify the function signature for test_mla_chunked_prefill_dispatch_by_sm accordingly so it matches repository typing conventions.
352-421: Good regression scope for dispatch gating; QA list updates are not neededThis unit test cleanly validates the SM gate without requiring GPU execution, and no
tests/integration/test_lists/qa/*update is needed for this PR scope.As per coding guidelines, unittest-only changes do not require QA integration list updates unless coverage is being promoted to integration/QA suites.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unittest/_torch/attention/test_attention_mla.py` around lines 352 - 421, The test correctly validates the SM-version dispatch gate without GPU and requires no changes to QA integration lists; leave the test in tests/unittest/_torch/attention/test_attention_mla.py as a unit test (do not move it to integration), ensure the monkeypatches target attention_module.TrtllmAttention, TrtllmAttentionMetadata and get_sm_version as shown so the test runs offline, and do not add or update any tests/integration/test_lists/qa/* entries for this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/unittest/_torch/attention/test_attention_mla.py`:
- Around line 352-357: Update the pytest parametrization in the
test_attention_mla.py block that defines "sm_version,expected_path" to include a
near-threshold case for sm_version=99 expecting "cached_kv"; specifically add
the tuple (99, "cached_kv") alongside the existing (90, "cached_kv") and (100,
"chunked_prefill") entries so the SM >= 100 boundary is explicitly covered in
the test that uses these parameters.
---
Nitpick comments:
In `@tests/unittest/_torch/attention/test_attention_mla.py`:
- Around line 359-360: The test function test_mla_chunked_prefill_dispatch_by_sm
should be updated to include Python type annotations for its parameters and
return type: annotate sm_version, expected_path, and monkeypatch with the
appropriate types used in tests (e.g., int/str/Path/pytest.MonkeyPatch or more
specific fixtures) and add -> None as the return type; modify the function
signature for test_mla_chunked_prefill_dispatch_by_sm accordingly so it matches
repository typing conventions.
- Around line 352-421: The test correctly validates the SM-version dispatch gate
without GPU and requires no changes to QA integration lists; leave the test in
tests/unittest/_torch/attention/test_attention_mla.py as a unit test (do not
move it to integration), ensure the monkeypatches target
attention_module.TrtllmAttention, TrtllmAttentionMetadata and get_sm_version as
shown so the test runs offline, and do not add or update any
tests/integration/test_lists/qa/* entries for this change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 8a8b67ed-dead-4605-9012-7b5eba935e22
📒 Files selected for processing (1)
tests/unittest/_torch/attention/test_attention_mla.py
Signed-off-by: Dhinesh Ponnarasan <dhineshponnarasan@gmail.com>
Related to #12502
Summary
Adds focused regression coverage for the existing SM-gated MLA chunked-prefill dispatch behavior in the PyTorch backend.
The runtime mitigation already exists upstream:
SM >= 100This PR intentionally does not change runtime behavior. It adds a targeted unit regression test to protect the intended dispatch contract from accidental behavioral drift during future refactors.
What Changed
Added
test_mla_chunked_prefill_dispatch_by_smin:tests/unittest/_torch/attention/test_attention_mla.pyThe test verifies:
forward_context_with_cached_kvforward_context_with_chunked_prefillThe test uses monkeypatch-based dispatch validation and does not require GPU execution.
Testing
pytest tests/unittest/_torch/attention/test_attention_mla.py -k "test_mla_chunked_prefill_dispatch_by_sm" -vRationale
The SM-version dispatch gate is a correctness safeguard for Hopper (SM90), where MLA chunked-prefill online-softmax merging can produce inaccurate results.
Although the mitigation already exists upstream, there was no focused regression coverage protecting this behavior. This PR codifies the intended dispatch contract as an automated test while intentionally avoiding runtime logic changes.
Requesting review from @kaiyux @akhoroshev since this regression coverage is related to the existing SM-gated MLA dispatch mitigation discussed in #12502.