[None][fix] Fix and unwaive AutoDeploy accuracy tests by bmarimuthu-nv · Pull Request #13925 · NVIDIA/TensorRT-LLM

bmarimuthu-nv · 2026-05-08T22:46:48Z

Summary by CodeRabbit

New Features
- Gemma4 tokenizer now uses standard HuggingFace transformers library integration.
- Qwen3.5 MoE multimodal input handling refined for improved compatibility.
Bug Fixes
- Corrected MiniMax-M2 GSM8k model accuracy benchmark reference value.
Tests
- Removed multiple test waivers to improve test coverage and reliability.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Balamurugan Marimuthu <246387390+bmarimuthu-nv@users.noreply.github.com>

bmarimuthu-nv · 2026-05-09T00:02:24Z

@coderabbitai summary

coderabbitai · 2026-05-09T00:02:30Z

✅ Actions performed

Summary regeneration triggered.

bmarimuthu-nv · 2026-05-09T00:02:56Z

/bot run --stage-list "A10-Build_Docs, A10-PackageSanityCheck-PY310-UB2204, A100X-PackageSanityCheck-PY312-UB2404, A30-AutoDeploy-1, H100_PCIe-AutoDeploy-1, DGX_B200-AutoDeploy-1, A100X-PyTorch-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-Post-Merge-1, DGX_B200-8_GPUs-AutoDeploy-Post-Merge-1" --disable-fail-fast

coderabbitai · 2026-05-09T00:03:32Z

📝 Walkthrough

Walkthrough

This PR contains three independent model and test updates: Gemma4 switches from a custom tokenizer wrapper to transformers.AutoTokenizer with corrected tied-weights mapping; Qwen3.5 MoE removes token-type tensor fields from multimodal input processing paths; test reference accuracy, import wiring, and waiver entries are updated.

Changes

Gemma4 AutoTokenizer Migration

Layer / File(s)	Summary
Imports and Type Updates `tensorrt_llm/_torch/auto_deploy/models/custom/modeling_gemma4.py`	Imports updated to use `AutoTokenizer` directly; `Gemma4ForCausalLM._tied_weights_keys` corrected to map `lm_head.weight` to `embed_tokens.weight` instead of `model.embed_tokens.weight`.
Wrapper Class Removal `tensorrt_llm/_torch/auto_deploy/models/custom/modeling_gemma4.py`	Tokenizer-specific constants block replaced; `ADGemma4Tokenizer` class completely removed.
Processor Integration `tensorrt_llm/_torch/auto_deploy/models/custom/modeling_gemma4.py`	`ADGemma4Processor.__init__` now accepts `tokenizer: Any`; `from_pretrained` constructs tokenizer via `AutoTokenizer.from_pretrained`.
Factory Initialization `tensorrt_llm/_torch/auto_deploy/models/custom/modeling_gemma4.py`	`Gemma4ForConditionalGenerationFactory.init_tokenizer` updated to return `AutoTokenizer.from_pretrained` instead of wrapper method.

Qwen3.5 MoE Token-Type Field Cleanup

Layer / File(s)	Summary
Forward and Input Processing `tensorrt_llm/_torch/auto_deploy/models/custom/modeling_qwen3_5_moe.py`	`Qwen3_5MoeModel.forward` cleanup list now includes `mm_token_type_ids`; `Qwen3_5MoeADInputProcessor` removes `token_type_ids` and `mm_token_type_ids` from `multimodal_data` before layout metadata attachment.
IR Processor Cleanup `tensorrt_llm/_torch/auto_deploy/models/custom/modeling_qwen3_5_moe_ir.py`	Multimodal wrapper removes `mm_token_type_ids` from kwargs; `Qwen3_5MoeADInputProcessor` removes token-type fields from `multimodal_data` and deletes unused `get_vocab_size()` method.

Test Reference and Configuration Updates

Layer / File(s)	Summary
Import Wiring and Docstrings `tests/integration/defs/accuracy/test_llm_api_autodeploy.py`	`llm_models_root` imported from `..conftest` instead of `test_common.llm_data`; minor docstring formatting and punctuation adjustments.
Reference Accuracy `tests/integration/defs/accuracy/references/gsm8k.yaml`	`MiniMaxAI/MiniMax-M2` top-level accuracy changed from `93.75` to `89.045`.
Test Waiver Updates `tests/integration/test_lists/waives.txt`	Removed waivers for GLM4Flash, Gemma4MoE, and MiniMaxM2 tests; adjusted waiver entries accordingly.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is incomplete; it contains only the repository template with placeholder sections and no substantive explanation of the changes, rationale, or test coverage details.	Fill in the Description section explaining the root causes of the accuracy failures and how they were fixed, and provide specific test coverage details for the changes made.
Docstring Coverage	⚠️ Warning	Docstring coverage is 29.41% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies this as a fix for AutoDeploy accuracy tests, directly corresponding to the code changes that restore test coverage and remove test waivers.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

suyoggupta · 2026-05-09T00:07:24Z

-  - accuracy: 93.75
+  - accuracy: 89.045
  - quant_algo: FP8_BLOCK_SCALES
    accuracy: 93.75


this needs update too?

no this was registered by pytorch backend. AD flow doesn't use quant_algo row.
The difference could be some setup changes I think

tensorrt-cicd · 2026-05-09T00:09:02Z

PR_Github #47450 [ run ] triggered by Bot. Commit: 2e89ebb Link to invocation

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tensorrt_llm/_torch/auto_deploy/models/custom/modeling_gemma4.py (1)
2291-2299: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Update transformers requirement to ≥5.5.0 or add version guard before AutoTokenizer.from_pretrained() calls.

The code uses AutoTokenizer.from_pretrained() at lines 2296 and 2635 to load Gemma4 tokenizers, but the repo is pinned to transformers==5.3.0. Gemma4 was introduced in Transformers v5.5.0; attempting to load it with 5.3.0 will fail with "Transformers does not recognize this architecture." Either update requirements.txt to transformers>=5.5.0 throughout the repo, or add a version check (similar to the pattern in transformers_causal_mask.py) to guard these calls and fail fast with a clear error message if the requirement is not met.

Also applies to: 2632-2635
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/auto_deploy/models/custom/modeling_gemma4.py` around
lines 2291 - 2299, The Gemma4 tokenizer calls in
ADGemma4Processor.from_pretrained (and the similar call near the end of
modeling_gemma4.py) rely on Transformers >=5.5.0; either update the repo
requirement to transformers>=5.5.0 or add a runtime version guard before calling
AutoTokenizer.from_pretrained() that checks transformers.__version__ (or uses
packaging.version.parse) and raises a clear, fast-failing error indicating the
minimum version required. Locate the two AutoTokenizer.from_pretrained usages in
modeling_gemma4.py (inside ADGemma4Processor.from_pretrained and the other
Gemma4 loader) and implement the version check there so the code never calls
AutoTokenizer when the installed transformers is <5.5.0.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@tensorrt_llm/_torch/auto_deploy/models/custom/modeling_gemma4.py`:
- Around line 2291-2299: The Gemma4 tokenizer calls in
ADGemma4Processor.from_pretrained (and the similar call near the end of
modeling_gemma4.py) rely on Transformers >=5.5.0; either update the repo
requirement to transformers>=5.5.0 or add a runtime version guard before calling
AutoTokenizer.from_pretrained() that checks transformers.__version__ (or uses
packaging.version.parse) and raises a clear, fast-failing error indicating the
minimum version required. Locate the two AutoTokenizer.from_pretrained usages in
modeling_gemma4.py (inside ADGemma4Processor.from_pretrained and the other
Gemma4 loader) and implement the version check there so the code never calls
AutoTokenizer when the installed transformers is <5.5.0.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4e8b3ad0-f423-48b0-a7ea-a9494d07e7bc

📥 Commits

Reviewing files that changed from the base of the PR and between 43f4b94 and 2e89ebb.

📒 Files selected for processing (6)

tensorrt_llm/_torch/auto_deploy/models/custom/modeling_gemma4.py
tensorrt_llm/_torch/auto_deploy/models/custom/modeling_qwen3_5_moe.py
tensorrt_llm/_torch/auto_deploy/models/custom/modeling_qwen3_5_moe_ir.py
tests/integration/defs/accuracy/references/gsm8k.yaml
tests/integration/defs/accuracy/test_llm_api_autodeploy.py
tests/integration/test_lists/waives.txt

💤 Files with no reviewable changes (1)

tests/integration/test_lists/waives.txt

tensorrt-cicd · 2026-05-09T06:00:15Z

PR_Github #47450 [ run ] completed with state SUCCESS. Commit: 2e89ebb
/LLM/main/L0_MergeRequest_PR pipeline #37371 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

github-actions Bot assigned bmarimuthu-nv May 8, 2026

[None][fix] restore AutoDeploy accuracy coverage

107756e

Signed-off-by: Balamurugan Marimuthu <246387390+bmarimuthu-nv@users.noreply.github.com>

bmarimuthu-nv force-pushed the bala/ci-acc-test-fix2 branch from 62866f2 to 107756e Compare May 8, 2026 22:50

bmarimuthu-nv added 2 commits May 8, 2026 16:24

[None][fix] avoid unnecessary multimodal vocab probing

2e960eb

Signed-off-by: Balamurugan Marimuthu <246387390+bmarimuthu-nv@users.noreply.github.com>

[None][fix] remove redundant multimodal vocab overrides

2e89ebb

Signed-off-by: Balamurugan Marimuthu <246387390+bmarimuthu-nv@users.noreply.github.com>

bmarimuthu-nv marked this pull request as ready for review May 9, 2026 00:02

bmarimuthu-nv requested review from a team as code owners May 9, 2026 00:02

bmarimuthu-nv requested a review from marinayanov May 9, 2026 00:02

suyoggupta approved these changes May 9, 2026

View reviewed changes

suyoggupta reviewed May 9, 2026

View reviewed changes

coderabbitai Bot reviewed May 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[None][fix] Fix and unwaive AutoDeploy accuracy tests#13925

[None][fix] Fix and unwaive AutoDeploy accuracy tests#13925
bmarimuthu-nv wants to merge 3 commits intoNVIDIA:mainfrom
nv-auto-deploy:bala/ci-acc-test-fix2

bmarimuthu-nv commented May 8, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

bmarimuthu-nv commented May 9, 2026

Uh oh!

coderabbitai Bot commented May 9, 2026

Uh oh!

bmarimuthu-nv commented May 9, 2026

Uh oh!

coderabbitai Bot commented May 9, 2026 •

edited

Loading

Walkthrough

Changes

❌ Failed checks (2 warnings)

Uh oh!

suyoggupta May 9, 2026

Uh oh!

bmarimuthu-nv May 9, 2026

Uh oh!

tensorrt-cicd commented May 9, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

tensorrt-cicd commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bmarimuthu-nv commented May 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

bmarimuthu-nv commented May 9, 2026

Uh oh!

coderabbitai Bot commented May 9, 2026

Uh oh!

bmarimuthu-nv commented May 9, 2026

Uh oh!

coderabbitai Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (2 warnings)

Uh oh!

suyoggupta May 9, 2026

Choose a reason for hiding this comment

Uh oh!

bmarimuthu-nv May 9, 2026

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented May 9, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bmarimuthu-nv commented May 8, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 9, 2026 •

edited

Loading