Skip to content

[#12332][feat] AutoDeploy: SuperV3 MTP Support#12326

Merged
govind-ramnarayan merged 2 commits intoNVIDIA:mainfrom
nv-auto-deploy:gramnarayan/load-superv3-mtp-head-rebased
Apr 2, 2026
Merged

[#12332][feat] AutoDeploy: SuperV3 MTP Support#12326
govind-ramnarayan merged 2 commits intoNVIDIA:mainfrom
nv-auto-deploy:gramnarayan/load-superv3-mtp-head-rebased

Conversation

@govind-ramnarayan
Copy link
Copy Markdown
Collaborator

@govind-ramnarayan govind-ramnarayan commented Mar 18, 2026

Resolves: #12332

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for MTP (Multi-Token Prediction) speculative decoding alongside existing Eagle3 one-model support.
    • Introduced intermediate state caching for speculative decoding to optimize memory and performance.
  • Refactor

    • Generalized Eagle model architecture to support multiple model types (Llama, NemotronH) for improved flexibility.
    • Restructured attention and custom operations to support optional speculative decoding configuration.
    • Enhanced resource handler system with specialized handlers for speculative decoding scenarios.
  • Tests

    • Added integration tests for MTP-based speculative decoding.
    • Added unit tests for speculative resource handlers and model weight loading.
    • Expanded smoke tests for speculative decoding workflows.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@govind-ramnarayan govind-ramnarayan changed the title Gramnarayan/load superv3 mtp head rebased [feat][AutoDeploy] SuperV3 MTP Support Mar 18, 2026
Comment thread tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/triton_backend_causal_conv.py Outdated
Comment thread tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/triton_backend_causal_conv.py Outdated
Comment thread tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/triton_backend_mamba.py Outdated
Comment thread tensorrt_llm/_torch/auto_deploy/models/custom/modeling_nemotron_h.py Outdated
Comment thread tensorrt_llm/_torch/auto_deploy/models/custom/modeling_nemotron_h.py Outdated
Comment thread tensorrt_llm/_torch/auto_deploy/models/custom/modeling_nemotron_h.py Outdated
Comment thread tensorrt_llm/_torch/auto_deploy/models/eagle.py Outdated
Comment thread tensorrt_llm/_torch/auto_deploy/models/eagle.py
Comment thread tensorrt_llm/_torch/auto_deploy/models/eagle.py Outdated
Comment thread tensorrt_llm/_torch/auto_deploy/models/eagle.py Outdated
Comment thread tensorrt_llm/_torch/auto_deploy/shim/interface.py Outdated
Comment thread tensorrt_llm/_torch/auto_deploy/shim/interface.py Outdated
Comment thread tensorrt_llm/_torch/auto_deploy/shim/interface.py Outdated
Comment thread tensorrt_llm/_torch/auto_deploy/shim/interface.py Outdated
Comment thread tensorrt_llm/_torch/auto_deploy/shim/interface.py Outdated
Comment thread tensorrt_llm/_torch/auto_deploy/shim/interface.py Outdated
Comment thread tensorrt_llm/_torch/auto_deploy/shim/interface.py Outdated
Comment thread tensorrt_llm/_torch/auto_deploy/llm_args.py Outdated
Comment thread tensorrt_llm/_torch/auto_deploy/llm_args.py Outdated
Comment thread tensorrt_llm/_torch/auto_deploy/llm_args.py Outdated
Comment thread tensorrt_llm/_torch/auto_deploy/llm_args.py Outdated
Comment thread tensorrt_llm/_torch/pyexecutor/mamba_cache_manager.py Outdated
Comment thread tests/integration/defs/accuracy/references/gsm8k.yaml Outdated
Comment thread tests/integration/defs/accuracy/test_llm_api_autodeploy.py
Comment thread tests/integration/defs/accuracy/test_llm_api_autodeploy.py Outdated
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40746 [ run ] completed with state ABORTED. Commit: f6bd237

Link to invocation

Copy link
Copy Markdown
Collaborator

@venkywonka venkywonka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm from llmapi side

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40755 [ run ] completed with state SUCCESS. Commit: 12ae4e3
/LLM/main/L0_MergeRequest_PR pipeline #31773 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@govind-ramnarayan
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40840 [ run ] triggered by Bot. Commit: 12ae4e3 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40840 [ run ] completed with state SUCCESS. Commit: 12ae4e3
/LLM/main/L0_MergeRequest_PR pipeline #31849 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@govind-ramnarayan govind-ramnarayan force-pushed the gramnarayan/load-superv3-mtp-head-rebased branch from 12ae4e3 to 7b587ca Compare March 31, 2026 17:33
@govind-ramnarayan
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40991 [ run ] triggered by Bot. Commit: 7b587ca Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40991 [ run ] completed with state FAILURE. Commit: 7b587ca
/LLM/main/L0_MergeRequest_PR pipeline #31972 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@govind-ramnarayan govind-ramnarayan force-pushed the gramnarayan/load-superv3-mtp-head-rebased branch from 7b587ca to 042fbbf Compare April 1, 2026 04:18
@govind-ramnarayan
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41101 [ run ] triggered by Bot. Commit: 042fbbf Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41101 [ run ] completed with state SUCCESS. Commit: 042fbbf
/LLM/main/L0_MergeRequest_PR pipeline #32075 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@govind-ramnarayan
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41234 [ run ] triggered by Bot. Commit: 042fbbf Link to invocation

Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
@govind-ramnarayan govind-ramnarayan force-pushed the gramnarayan/load-superv3-mtp-head-rebased branch from 042fbbf to 2c429f6 Compare April 1, 2026 22:03
@govind-ramnarayan
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41266 [ run ] triggered by Bot. Commit: 2c429f6 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41266 [ run ] completed with state SUCCESS. Commit: 2c429f6
/LLM/main/L0_MergeRequest_PR pipeline #32224 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@govind-ramnarayan
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41366 [ run ] triggered by Bot. Commit: 2c429f6 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41366 [ run ] completed with state SUCCESS. Commit: 2c429f6
/LLM/main/L0_MergeRequest_PR pipeline #32309 completed with status: 'SUCCESS'

CI Report

Link to invocation

@govind-ramnarayan govind-ramnarayan merged commit 3b08bed into NVIDIA:main Apr 2, 2026
5 checks passed
@github-project-automation github-project-automation Bot moved this from In review to Done in AutoDeploy Board Apr 2, 2026
karen-sy pushed a commit to karen-sy/TensorRT-LLM that referenced this pull request Apr 7, 2026
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[Feature]: AutoDeploy: SuperV3 MTP Support

9 participants