[TRTLLM-12647][feat] Parallelize LTX-2 LoRA weight loading by yibinl-nvidia · Pull Request #13911 · NVIDIA/TensorRT-LLM

yibinl-nvidia · 2026-05-08T15:25:27Z

Summary by CodeRabbit

Refactor
- Improved LoRA delta loading with enhanced parameter handling.
Performance
- Added optional parallel loading of LoRA deltas during model initialization for faster startup times.

Description

Overlap LoRA weights loading with other components to speed up LTX-2 model loading time. This reduces model load time as much as 30%.

precision	mode	pipeline initialization (model load + warm up)	model load	warmup	transformer load	LoRA load
BF16	no overlap	65.15s	32.82s	32.33s	5s	11s
BF16	overlap	58.18s (-10.7%)	25.94s (-21.0%)	32.24s	4s	22s
FP8	no overlap	62.08s	32.03s	30.05s	4s	11s
FP8	overlap	54.36s (-12.4%)	22.32s (-30.3%)	32.05s	3s	19s
FP4	no overlap	63.11s	33.87s	29.24s	3s	12s
FP4	overlap	51.79s (-17.9%)	22.34s (-34.0%)	29.44s	3s	18s

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

coderabbitai · 2026-05-08T15:30:20Z

📝 Walkthrough

Walkthrough

This PR refactors LoRA delta loading in the LTX2 visual generation pipeline by introducing parameter mapping helpers and enabling optional background pre-computation overlapped with component loading via ThreadPoolExecutor for performance optimization.

Changes

LoRA Delta Loading Optimization with Threading

Layer / File(s)	Summary
Imports and Module Constants `tensorrt_llm/_torch/visual_gen/models/ltx2/pipeline_ltx2_two_stages.py`	Adds `os` and `ThreadPoolExecutor` imports; defines `_QKV_SUFFIXES` tuple and `_DISABLE_OVERLAP_LORA_LOAD_ENV` constant for QKV fusion and overlap control.
LoRA Parameter Helper Functions `tensorrt_llm/_torch/visual_gen/models/ltx2/pipeline_ltx2_two_stages.py`	Introduces `_map_lora_param_name()` to remap LoRA parameter keys to TRT-LLM naming and `_has_lora_target()` to detect whether a LoRA delta targets transformer parameters with special Q/K/V suffix handling.
LoRA Delta Loading Core Logic `tensorrt_llm/_torch/visual_gen/models/ltx2/pipeline_ltx2_two_stages.py`	Refactors `_load_lora_deltas()` to use helper functions for key remapping, skip and count non-target deltas, compute fused QKV deltas, and update success log with skipped count.
Pipeline Threading Integration `tensorrt_llm/_torch/visual_gen/models/ltx2/pipeline_ltx2_two_stages.py`	`load_standard_components()` optionally starts background LoRA pre-computation with `ThreadPoolExecutor` (unless disabled by env var) and ensures executor shutdown; `_load_two_stage_components()` waits on background future or loads synchronously.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	⚠️ Warning	The PR description provides a clear summary of the change and includes performance benchmarks, but the Test Coverage section is empty.	Complete the Test Coverage section by listing the relevant test cases that safeguard the parallel LoRA loading changes and ensure sufficient test coverage for the new code paths.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title '[TRTLLM-12647][feat] Parallelize LTX-2 LoRA weight loading' directly and clearly describes the main change: implementing parallel/overlapped LoRA weight loading for the LTX-2 model pipeline. This matches the file summary which describes overlapping LoRA delta pre-computation with component loading using a background thread.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

tensorrt_llm/_torch/visual_gen/models/ltx2/pipeline_ltx2_two_stages.py (1)

671-679: 💤 Low value

Add type annotation for lora_future parameter.

The lora_future parameter lacks a type annotation, which violates the coding guideline requiring type annotations for all function arguments. As per coding guidelines, Python code should use type annotations for all function arguments and return types.

Suggested fix

+from concurrent.futures import Future, ThreadPoolExecutor
...
     def _load_two_stage_components(
         self,
         device: torch.device,
         dtype: torch.dtype,
         spatial_upsampler_path: str,
         distilled_lora_path: str,
-        lora_future,
+        lora_future: Optional[Future[Dict[str, torch.Tensor]]],
         disable_overlap_lora_load: bool,
     ) -> None:

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/visual_gen/models/ltx2/pipeline_ltx2_two_stages.py`
around lines 671 - 679, The parameter lora_future in _load_two_stage_components
lacks a type annotation; update the function signature to annotate it (e.g., use
concurrent.futures.Future or asyncio.Future depending on implementation) like
lora_future: concurrent.futures.Future, add the corresponding import (from
concurrent.futures import Future or import concurrent.futures) at the top of the
module, and adjust to Optional[Future] if the value can be None—ensure the
annotation matches the actual future type used elsewhere in this file.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tensorrt_llm/_torch/visual_gen/models/ltx2/pipeline_ltx2_two_stages.py`:
- Around line 671-679: The parameter lora_future in _load_two_stage_components
lacks a type annotation; update the function signature to annotate it (e.g., use
concurrent.futures.Future or asyncio.Future depending on implementation) like
lora_future: concurrent.futures.Future, add the corresponding import (from
concurrent.futures import Future or import concurrent.futures) at the top of the
module, and adjust to Optional[Future] if the value can be None—ensure the
annotation matches the actual future type used elsewhere in this file.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e4e89d16-9958-418e-bdca-0485cd5a82a6

📥 Commits

Reviewing files that changed from the base of the PR and between 1651d1b and f921c18.

📒 Files selected for processing (1)

tensorrt_llm/_torch/visual_gen/models/ltx2/pipeline_ltx2_two_stages.py

yibinl-nvidia · 2026-05-08T18:08:58Z

/bot run

tensorrt-cicd · 2026-05-08T18:21:44Z

PR_Github #47432 [ run ] triggered by Bot. Commit: 2c2f177 Link to invocation

tensorrt-cicd · 2026-05-08T19:38:32Z

PR_Github #47432 [ run ] completed with state ABORTED. Commit: 2c2f177

Link to invocation

yibinl-nvidia · 2026-05-08T20:16:53Z

/bot run

tensorrt-cicd · 2026-05-08T20:22:12Z

PR_Github #47441 [ run ] triggered by Bot. Commit: 2c2f177 Link to invocation

tensorrt-cicd · 2026-05-09T01:12:20Z

PR_Github #47441 [ run ] completed with state SUCCESS. Commit: 2c2f177
/LLM/main/L0_MergeRequest_PR pipeline #37364 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

yibinl-nvidia · 2026-05-11T21:54:12Z

/bot run

tensorrt-cicd · 2026-05-11T22:00:20Z

PR_Github #47794 [ run ] triggered by Bot. Commit: 2c2f177 Link to invocation

tensorrt-cicd · 2026-05-11T22:57:21Z

PR_Github #47794 [ run ] completed with state SUCCESS. Commit: 2c2f177
/LLM/main/L0_MergeRequest_PR pipeline #37685 completed with status: 'SUCCESS'

CI Report

Link to invocation

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

yibinl-nvidia · 2026-05-14T21:47:58Z

/bot run

tensorrt-cicd · 2026-05-14T21:53:55Z

PR_Github #48444 [ run ] triggered by Bot. Commit: aaaeeab Link to invocation

tensorrt-cicd · 2026-05-14T23:18:07Z

PR_Github #48444 [ run ] completed with state SUCCESS. Commit: aaaeeab
/LLM/main/L0_MergeRequest_PR pipeline #38241 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

yibinl-nvidia · 2026-05-14T23:19:11Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-14T23:26:51Z

PR_Github #48459 [ run ] triggered by Bot. Commit: aaaeeab Link to invocation

yibinl-nvidia · 2026-05-15T02:42:05Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-15T02:48:11Z

PR_Github #48492 [ run ] triggered by Bot. Commit: aaaeeab Link to invocation

tensorrt-cicd · 2026-05-15T02:48:14Z

PR_Github #48459 [ run ] completed with state ABORTED. Commit: aaaeeab
/LLM/main/L0_MergeRequest_PR pipeline #38255 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

tensorrt-cicd · 2026-05-15T05:42:14Z

PR_Github #48492 [ run ] completed with state SUCCESS. Commit: aaaeeab
/LLM/main/L0_MergeRequest_PR pipeline #38288 completed with status: 'SUCCESS'

CI Report

Link to invocation

yibinl-nvidia requested a review from a team as a code owner May 8, 2026 15:25

github-actions Bot assigned yibinl-nvidia May 8, 2026

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

yibinl-nvidia changed the title ~~[TRTLLM-12527][feat] Parallel LTX-2 LoRA weight loading~~ [TRTLLM-12527][feat] Parallelize LTX-2 LoRA weight loading May 12, 2026

yibinl-nvidia changed the title ~~[TRTLLM-12527][feat] Parallelize LTX-2 LoRA weight loading~~ [TRTLLM-12647][feat] Parallelize LTX-2 LoRA weight loading May 12, 2026

yibinl-nvidia requested a review from chang-l May 12, 2026 19:27

yibinl-nvidia added 2 commits May 14, 2026 21:28

parallel LoRA weight loading

7b1ed0a

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

add tests and remove env variable

af700c8

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

yibinl-nvidia force-pushed the dev-yibinl-TRT-12527 branch from 2c2f177 to af700c8 Compare May 14, 2026 21:37

Add LoRA future type annotation

aaaeeab

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

Conversation

yibinl-nvidia commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

yibinl-nvidia commented May 8, 2026

Uh oh!

tensorrt-cicd commented May 8, 2026

Uh oh!

tensorrt-cicd commented May 8, 2026

Uh oh!

yibinl-nvidia commented May 8, 2026

Uh oh!

tensorrt-cicd commented May 8, 2026

Uh oh!

tensorrt-cicd commented May 9, 2026

Uh oh!

yibinl-nvidia commented May 11, 2026

Uh oh!

tensorrt-cicd commented May 11, 2026

Uh oh!

tensorrt-cicd commented May 11, 2026

Uh oh!

yibinl-nvidia commented May 14, 2026

Uh oh!

tensorrt-cicd commented May 14, 2026

Uh oh!

tensorrt-cicd commented May 14, 2026

Uh oh!

yibinl-nvidia commented May 14, 2026

Uh oh!

tensorrt-cicd commented May 14, 2026

Uh oh!

yibinl-nvidia commented May 15, 2026

Uh oh!

tensorrt-cicd commented May 15, 2026

Uh oh!

tensorrt-cicd commented May 15, 2026

Uh oh!

tensorrt-cicd commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yibinl-nvidia commented May 8, 2026 •

edited

Loading

coderabbitai Bot commented May 8, 2026 •

edited

Loading