Skip to content

[None][fix] Fix and unwaive AutoDeploy accuracy tests#13925

Open
bmarimuthu-nv wants to merge 3 commits intoNVIDIA:mainfrom
nv-auto-deploy:bala/ci-acc-test-fix2
Open

[None][fix] Fix and unwaive AutoDeploy accuracy tests#13925
bmarimuthu-nv wants to merge 3 commits intoNVIDIA:mainfrom
nv-auto-deploy:bala/ci-acc-test-fix2

Conversation

@bmarimuthu-nv
Copy link
Copy Markdown
Collaborator

@bmarimuthu-nv bmarimuthu-nv commented May 8, 2026

Summary by CodeRabbit

  • New Features

    • Gemma4 tokenizer now uses standard HuggingFace transformers library integration.
    • Qwen3.5 MoE multimodal input handling refined for improved compatibility.
  • Bug Fixes

    • Corrected MiniMax-M2 GSM8k model accuracy benchmark reference value.
  • Tests

    • Removed multiple test waivers to improve test coverage and reliability.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Balamurugan Marimuthu <246387390+bmarimuthu-nv@users.noreply.github.com>
@bmarimuthu-nv bmarimuthu-nv force-pushed the bala/ci-acc-test-fix2 branch from 62866f2 to 107756e Compare May 8, 2026 22:50
Signed-off-by: Balamurugan Marimuthu <246387390+bmarimuthu-nv@users.noreply.github.com>
Signed-off-by: Balamurugan Marimuthu <246387390+bmarimuthu-nv@users.noreply.github.com>
@bmarimuthu-nv
Copy link
Copy Markdown
Collaborator Author

@coderabbitai summary

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 9, 2026

✅ Actions performed

Summary regeneration triggered.

@bmarimuthu-nv bmarimuthu-nv marked this pull request as ready for review May 9, 2026 00:02
@bmarimuthu-nv bmarimuthu-nv requested review from a team as code owners May 9, 2026 00:02
@bmarimuthu-nv bmarimuthu-nv requested a review from marinayanov May 9, 2026 00:02
@bmarimuthu-nv
Copy link
Copy Markdown
Collaborator Author

/bot run --stage-list "A10-Build_Docs, A10-PackageSanityCheck-PY310-UB2204, A100X-PackageSanityCheck-PY312-UB2404, A30-AutoDeploy-1, H100_PCIe-AutoDeploy-1, DGX_B200-AutoDeploy-1, A100X-PyTorch-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-Post-Merge-1, DGX_B200-8_GPUs-AutoDeploy-Post-Merge-1" --disable-fail-fast

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 9, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR contains three independent model and test updates: Gemma4 switches from a custom tokenizer wrapper to transformers.AutoTokenizer with corrected tied-weights mapping; Qwen3.5 MoE removes token-type tensor fields from multimodal input processing paths; test reference accuracy, import wiring, and waiver entries are updated.

Changes

Gemma4 AutoTokenizer Migration

Layer / File(s) Summary
Imports and Type Updates
tensorrt_llm/_torch/auto_deploy/models/custom/modeling_gemma4.py
Imports updated to use AutoTokenizer directly; Gemma4ForCausalLM._tied_weights_keys corrected to map lm_head.weight to embed_tokens.weight instead of model.embed_tokens.weight.
Wrapper Class Removal
tensorrt_llm/_torch/auto_deploy/models/custom/modeling_gemma4.py
Tokenizer-specific constants block replaced; ADGemma4Tokenizer class completely removed.
Processor Integration
tensorrt_llm/_torch/auto_deploy/models/custom/modeling_gemma4.py
ADGemma4Processor.__init__ now accepts tokenizer: Any; from_pretrained constructs tokenizer via AutoTokenizer.from_pretrained.
Factory Initialization
tensorrt_llm/_torch/auto_deploy/models/custom/modeling_gemma4.py
Gemma4ForConditionalGenerationFactory.init_tokenizer updated to return AutoTokenizer.from_pretrained instead of wrapper method.

Qwen3.5 MoE Token-Type Field Cleanup

Layer / File(s) Summary
Forward and Input Processing
tensorrt_llm/_torch/auto_deploy/models/custom/modeling_qwen3_5_moe.py
Qwen3_5MoeModel.forward cleanup list now includes mm_token_type_ids; Qwen3_5MoeADInputProcessor removes token_type_ids and mm_token_type_ids from multimodal_data before layout metadata attachment.
IR Processor Cleanup
tensorrt_llm/_torch/auto_deploy/models/custom/modeling_qwen3_5_moe_ir.py
Multimodal wrapper removes mm_token_type_ids from kwargs; Qwen3_5MoeADInputProcessor removes token-type fields from multimodal_data and deletes unused get_vocab_size() method.

Test Reference and Configuration Updates

Layer / File(s) Summary
Import Wiring and Docstrings
tests/integration/defs/accuracy/test_llm_api_autodeploy.py
llm_models_root imported from ..conftest instead of test_common.llm_data; minor docstring formatting and punctuation adjustments.
Reference Accuracy
tests/integration/defs/accuracy/references/gsm8k.yaml
MiniMaxAI/MiniMax-M2 top-level accuracy changed from 93.75 to 89.045.
Test Waiver Updates
tests/integration/test_lists/waives.txt
Removed waivers for GLM4Flash, Gemma4MoE, and MiniMaxM2 tests; adjusted waiver entries accordingly.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete; it contains only the repository template with placeholder sections and no substantive explanation of the changes, rationale, or test coverage details. Fill in the Description section explaining the root causes of the accuracy failures and how they were fixed, and provide specific test coverage details for the changes made.
Docstring Coverage ⚠️ Warning Docstring coverage is 29.41% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies this as a fix for AutoDeploy accuracy tests, directly corresponding to the code changes that restore test coverage and remove test waivers.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

- accuracy: 93.75
- accuracy: 89.045
- quant_algo: FP8_BLOCK_SCALES
accuracy: 93.75
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs update too?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no this was registered by pytorch backend. AD flow doesn't use quant_algo row.
The difference could be some setup changes I think

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47450 [ run ] triggered by Bot. Commit: 2e89ebb Link to invocation

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tensorrt_llm/_torch/auto_deploy/models/custom/modeling_gemma4.py (1)

2291-2299: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Update transformers requirement to ≥5.5.0 or add version guard before AutoTokenizer.from_pretrained() calls.

The code uses AutoTokenizer.from_pretrained() at lines 2296 and 2635 to load Gemma4 tokenizers, but the repo is pinned to transformers==5.3.0. Gemma4 was introduced in Transformers v5.5.0; attempting to load it with 5.3.0 will fail with "Transformers does not recognize this architecture." Either update requirements.txt to transformers>=5.5.0 throughout the repo, or add a version check (similar to the pattern in transformers_causal_mask.py) to guard these calls and fail fast with a clear error message if the requirement is not met.

Also applies to: 2632-2635

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/auto_deploy/models/custom/modeling_gemma4.py` around
lines 2291 - 2299, The Gemma4 tokenizer calls in
ADGemma4Processor.from_pretrained (and the similar call near the end of
modeling_gemma4.py) rely on Transformers >=5.5.0; either update the repo
requirement to transformers>=5.5.0 or add a runtime version guard before calling
AutoTokenizer.from_pretrained() that checks transformers.__version__ (or uses
packaging.version.parse) and raises a clear, fast-failing error indicating the
minimum version required. Locate the two AutoTokenizer.from_pretrained usages in
modeling_gemma4.py (inside ADGemma4Processor.from_pretrained and the other
Gemma4 loader) and implement the version check there so the code never calls
AutoTokenizer when the installed transformers is <5.5.0.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@tensorrt_llm/_torch/auto_deploy/models/custom/modeling_gemma4.py`:
- Around line 2291-2299: The Gemma4 tokenizer calls in
ADGemma4Processor.from_pretrained (and the similar call near the end of
modeling_gemma4.py) rely on Transformers >=5.5.0; either update the repo
requirement to transformers>=5.5.0 or add a runtime version guard before calling
AutoTokenizer.from_pretrained() that checks transformers.__version__ (or uses
packaging.version.parse) and raises a clear, fast-failing error indicating the
minimum version required. Locate the two AutoTokenizer.from_pretrained usages in
modeling_gemma4.py (inside ADGemma4Processor.from_pretrained and the other
Gemma4 loader) and implement the version check there so the code never calls
AutoTokenizer when the installed transformers is <5.5.0.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4e8b3ad0-f423-48b0-a7ea-a9494d07e7bc

📥 Commits

Reviewing files that changed from the base of the PR and between 43f4b94 and 2e89ebb.

📒 Files selected for processing (6)
  • tensorrt_llm/_torch/auto_deploy/models/custom/modeling_gemma4.py
  • tensorrt_llm/_torch/auto_deploy/models/custom/modeling_qwen3_5_moe.py
  • tensorrt_llm/_torch/auto_deploy/models/custom/modeling_qwen3_5_moe_ir.py
  • tests/integration/defs/accuracy/references/gsm8k.yaml
  • tests/integration/defs/accuracy/test_llm_api_autodeploy.py
  • tests/integration/test_lists/waives.txt
💤 Files with no reviewable changes (1)
  • tests/integration/test_lists/waives.txt

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47450 [ run ] completed with state SUCCESS. Commit: 2e89ebb
/LLM/main/L0_MergeRequest_PR pipeline #37371 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants