Skip to content

Fused QKV add node issue for GQA graph surgery #1057

Open
hthadicherla wants to merge 1 commit intomainfrom
hthadicherla/gqa-bias-add-fix
Open

Fused QKV add node issue for GQA graph surgery #1057
hthadicherla wants to merge 1 commit intomainfrom
hthadicherla/gqa-bias-add-fix

Conversation

@hthadicherla
Copy link
Contributor

@hthadicherla hthadicherla commented Mar 17, 2026

What does this PR do?

Type of change: Bug fix

There was a small issue where for models like qwen which have bias add nodes, while fusing the q,k,v matmul and q,k,v add nodes , the fused qkv bias add node was added to the graph before the fused qkv matmul node, causing the removal script to assume that the fused matmul and the add node were part of dead subgraph hence removing them. I just changed the order in which they are added. Now there are no issues.

Summary by CodeRabbit

  • Refactor
    • Optimized graph surgery operations for ONNX model processing by adjusting node insertion timing during the multi-head to grouped-query attention transformation, maintaining functional equivalence while improving internal processing flow.

… before qkv matmul was fused causing the script to remove this node assuming is was a dead node

Signed-off-by: Hrishith Thadicherla <hthadicherla@nvidia.com>
@hthadicherla hthadicherla requested a review from a team as a code owner March 17, 2026 08:22
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 17, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 688966e0-3276-4c2d-a638-69a54a49a1bb

📥 Commits

Reviewing files that changed from the base of the PR and between 00fa5bd and d849182.

📒 Files selected for processing (1)
  • modelopt/onnx/graph_surgery/gqa_replacement.py

📝 Walkthrough

Walkthrough

A single-file change modifies the timing of when a newly created Add node (qkv_proj/Add) is appended during bias fusion in GQA replacement. The node is now collected in qkv_matmul_nodes for later insertion rather than appended directly to the graph immediately.

Changes

Cohort / File(s) Summary
GQA Bias Fusion Timing
modelopt/onnx/graph_surgery/gqa_replacement.py
Modified _fuse_qkv_and_create_gqa to defer Add node insertion by collecting it in qkv_matmul_nodes list instead of appending directly to graph node list, altering insertion order during fusion process.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main issue being fixed: a bug with the fused QKV add node during GQA graph surgery operations.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Security Anti-Patterns ✅ Passed The pull request fixes a graph insertion ordering bug in gqa_replacement.py with no security anti-patterns detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch hthadicherla/gqa-bias-add-fix
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can use TruffleHog to scan for secrets in your code with verification capabilities.

Add a TruffleHog config file (e.g. trufflehog-config.yml, trufflehog.yml) to your project to customize detectors and scanning behavior. The tool runs only when a config file is present.

@codecov
Copy link

codecov bot commented Mar 17, 2026

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 70.31%. Comparing base (00fa5bd) to head (d849182).

Files with missing lines Patch % Lines
modelopt/onnx/graph_surgery/gqa_replacement.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1057      +/-   ##
==========================================
+ Coverage   70.30%   70.31%   +0.01%     
==========================================
  Files         227      227              
  Lines       25847    25847              
==========================================
+ Hits        18172    18175       +3     
+ Misses       7675     7672       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant