Skip to content

[https://nvbugs/6160248][fix] AutoDeploy: fixed broken pattern matching of fuse_rope_into_trtllm_attention transform#14038

Open
MrGeva wants to merge 1 commit into
NVIDIA:mainfrom
nv-auto-deploy:fix/ad-rope-fusion-unwrap-contiguous-call-method
Open

[https://nvbugs/6160248][fix] AutoDeploy: fixed broken pattern matching of fuse_rope_into_trtllm_attention transform#14038
MrGeva wants to merge 1 commit into
NVIDIA:mainfrom
nv-auto-deploy:fix/ad-rope-fusion-unwrap-contiguous-call-method

Conversation

@MrGeva
Copy link
Copy Markdown
Collaborator

@MrGeva MrGeva commented May 12, 2026

The recent commit 6cd23bc changed fuse_gemms_mixed_children to emit narrow → contiguous (call_method) → view instead of split_with_sizes (closure) → getitem. But _try_trace_to_fused_qkv._trace_narrow only handles view → narrow directly — it does not unwrap a call_method("contiguous", ...) node sitting between the view and the narrow, and _unwrap_contiguous only handles call_function contiguous, not the call_method form. Result: the passthrough leg silently fails and _trtllm_fused_qkv is never set.

Fix:

  • Extend _unwrap_contiguous to also skip call_method nodes whose target is the string "contiguous".
  • In _trace_narrow and _trace_split apply _unwrap_contiguous to the view's input before testing for narrow / getitem.

Summary by CodeRabbit

  • Bug Fixes
    • Improved attention optimization logic to correctly handle additional tensor operation patterns, ensuring more reliable detection and fusion of fused QKV operations across diverse model architectures.

Review Change Stack

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

…ugh rope trace

After fuse_gemms_mixed_children switched its post-fusion split path from
``split_with_sizes-closure → getitem`` to ``narrow → contiguous → view``,
the contiguous node is emitted via ``graph.call_method("contiguous", ...)``,
i.e. a ``call_method`` op rather than a ``call_function``. This broke
``fuse_rope_into_trtllm_attention``'s QKV-passthrough leg: ``_unwrap_contiguous``
only handled ``call_function`` forms, and ``_trace_narrow`` walked
``view → narrow`` directly without peeling intermediate contiguous nodes.
As a result the passthrough silently failed to fire and ``_trtllm_fused_qkv``
was never set, causing
``test_gemm_fusion_trtllm.py::test_fuse_qkv_passthrough_with_rope`` to fail.

Fix:
- Extend ``_unwrap_contiguous`` to also skip ``call_method`` nodes whose
  target is the string ``"contiguous"``.
- In ``_trace_narrow`` and ``_trace_split`` apply ``_unwrap_contiguous`` to
  the view's input before testing for narrow / getitem.

Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
@MrGeva
Copy link
Copy Markdown
Collaborator Author

MrGeva commented May 12, 2026

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1" --disable-fail-fast

@MrGeva MrGeva marked this pull request as ready for review May 12, 2026 07:35
@MrGeva MrGeva requested a review from a team as a code owner May 12, 2026 07:35
@MrGeva MrGeva requested a review from Fridah-nv May 12, 2026 07:35
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 189305ee-d4e0-41af-95cb-7623e43f0f82

📥 Commits

Reviewing files that changed from the base of the PR and between 7bc328f and d0f0d60.

📒 Files selected for processing (1)
  • tensorrt_llm/_torch/auto_deploy/transform/library/fuse_rope_into_trtllm_attention.py

📝 Walkthrough

Walkthrough

The PR enhances RoPE-attention graph tracing by expanding the _unwrap_contiguous helper to recognize additional PyTorch contiguous emission patterns, then applies this enhanced helper in both split and narrow QKV tracing paths to ensure intervening contiguous nodes don't block correct source identification.

Changes

Contiguous unwrapping in fused RoPE-attention tracing

Layer / File(s) Summary
Expand contiguous-unwrapping helper
tensorrt_llm/_torch/auto_deploy/transform/library/fuse_rope_into_trtllm_attention.py
_unwrap_contiguous now recognizes call_method("contiguous") and call_function forms (aten.contiguous.default overloads and Tensor.contiguous method call_function variants) in addition to the original pattern, allowing the trace to skip past these contiguous emissions.
Apply unwrapping in QKV tracing paths
tensorrt_llm/_torch/auto_deploy/transform/library/fuse_rope_into_trtllm_attention.py
_unwrap_contiguous is called on view_input in both _trace_split and _trace_narrow functions to remove contiguous nodes between the view/reshape operation and its input, ensuring the fused QKV source is correctly identified.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description clearly explains the issue and the solution; however, the Description and Test Coverage sections are missing or not filled in according to the template structure. Add a dedicated Description section explaining the issue and solution, and add a Test Coverage section listing the relevant tests (e.g., test_fuse_qkv_passthrough_with_rope) that validate this fix.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the specific issue (broken pattern matching in fuse_rope_into_trtllm_attention) and the fix category (fix), with proper NVBugs ticket reference.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47923 [ run ] triggered by Bot. Commit: d0f0d60 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47923 [ run ] completed with state SUCCESS. Commit: d0f0d60
/LLM/main/L0_MergeRequest_PR pipeline #37770 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants