Skip to content

[TRTLLM-11410][feat] MoT World Model Support#14012

Open
NVShreyas wants to merge 6 commits into
NVIDIA:mainfrom
NVShreyas:user/shreyasm/world-models
Open

[TRTLLM-11410][feat] MoT World Model Support#14012
NVShreyas wants to merge 6 commits into
NVIDIA:mainfrom
NVShreyas:user/shreyasm/world-models

Conversation

@NVShreyas
Copy link
Copy Markdown
Collaborator

@NVShreyas NVShreyas commented May 12, 2026

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Summary by CodeRabbit

Release Notes

  • New Features
    • Added Cosmos3 video generation pipeline supporting text-to-video and image-to-video generation modes.
    • Introduced text and video guardrails with profanity filtering, content safety classification, and face anonymization capabilities.
    • Enhanced diffusion model architecture with improved attention mechanisms for visual generation tasks.

Review Change Stack

NVShreyas added 6 commits May 11, 2026 10:38
Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>
Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>
Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>
Signed-off-by: Shreyas Mista <shreyasm@nvidia.com>
Signed-off-by: Shreyas Mista <shreyasm@nvidia.com>
Signed-off-by: Shreyas Mista <shreyasm@nvidia.com>
@NVShreyas NVShreyas requested review from a team as code owners May 12, 2026 00:15
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

📝 Walkthrough

Walkthrough

This PR integrates a Cosmos3 visual-to-video diffusion pipeline into TensorRT-LLM, introducing text/video guardrails, a dual-pathway mRoPE-based transformer, and a complete generation flow supporting both text-to-video and image-to-video modes. It also refines Ulysses sequence parallelism control across pipelines.

Changes

Cosmos3 Pipeline & Guardrails

Layer / File(s) Summary
Configuration & Dependencies
requirements.txt, tensorrt_llm/_torch/visual_gen/config.py, tensorrt_llm/_torch/visual_gen/models/cosmos3/defaults.py
Three new guardrail dependencies added (nltk, better-profanity, retinaface). Pipeline configuration extended with TEXT_GUARDRAIL/VIDEO_GUARDRAIL enum members, guardrail_checkpoint_dir field in VisualGenArgs, and Cosmos3 generation defaults (720p spatial/temporal params, guidance scale, inference steps).
Text & Video Guardrails
tensorrt_llm/_torch/visual_gen/models/cosmos3/guardrails.py
SafetyClassifier neural network for content safety classification. Text guardrail applies optional profanity blocklist and Qwen3Guard model check. Video guardrail runs SigLIP content safety filter (rejecting video if unsafe frame ratio exceeds threshold) and RetinaFace face detection with pixelation for anonymization. check_video_safety() converts tensors to NumPy, applies filtering, and returns None if rejected.
Cosmos3 Transformer & Attention
tensorrt_llm/_torch/visual_gen/models/cosmos3/transformer_cosmos3.py
Dual-pathway transformer: UND path applies causal self-attention with mRoPE and RMSNorm to text tokens (caching K/V). GEN path applies cross-attention to UND K/V with mRoPE applied to visual queries. Includes mRoPE position-ID computation for 3D (T,H,W) grids, timestep embedding, language model, and full VFMTransformer with checkpoint compatibility and sequence-parallel support.
Cosmos3 Pipeline Generation
tensorrt_llm/_torch/visual_gen/models/cosmos3/pipeline_cosmos3.py
Cosmos3OmniMoTPipeline orchestrates generation: loads Qwen2 tokenizer, WAN VAE, UniPCMultistep scheduler. Applies prompt templates (optional duration/resolution metadata), tokenizes via chat template, supports text-to-video (noise init) and image-to-video (encode conditioning image with velocity masking). Implements CFG via extra_cfg_tensors for token splitting, denoising loop with scheduler, latent decoding with VAE denormalization, and applies video guardrails on rank 0.
Module Exports & Registration
tensorrt_llm/_torch/visual_gen/models/__init__.py, tensorrt_llm/_torch/visual_gen/models/cosmos3/__init__.py, tensorrt_llm/_torch/visual_gen/pipeline_registry.py
Cosmos3OmniMoTPipeline added to pipeline module exports. Auto-detection in AutoPipeline._detect_from_checkpoint recognizes Cosmos3 checkpoints by _class_name match.
Ulysses Parallelism Control
tensorrt_llm/_torch/visual_gen/attention_backend/vanilla.py, tensorrt_llm/_torch/visual_gen/modules/attention.py, tensorrt_llm/_torch/visual_gen/models/ltx2/transformer_ltx2.py, tensorrt_llm/_torch/visual_gen/models/wan/transformer_wan.py
SDPA now explicitly sets enable_gqa when query heads differ from KV heads. Attention base class accepts enable_ulysses: bool = True flag. LTX2 disables Ulysses for cross-attention only (enable_ulysses=use_ulysses and not self._is_cross_attn). WAN disables Ulysses for its cross-attention path (enable_ulysses=False).

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is empty—it contains only the repository template with no actual explanation of changes, test coverage, or checklist completion. Add a description explaining the Cosmos3 implementation, guardrails, test coverage, and confirm checklist items; reference the commit messages and file summaries for context.
Docstring Coverage ⚠️ Warning Docstring coverage is 35.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the main feature addition: Cosmos3 (MoT World Model) support with guardrails, attention updates, and pipeline integration.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
tensorrt_llm/_torch/visual_gen/config.py (1)

1-3: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add the required NVIDIA SPDX/copyright header.

This modified Python source file currently has no header block at the top.

As per coding guidelines: All C++, Python, and other source files must contain NVIDIA copyright header with current modification year.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/visual_gen/config.py` around lines 1 - 3, This file is
missing the required NVIDIA SPDX/copyright header; add the standard NVIDIA
header block (including the SPDX-License-Identifier and the copyright line with
the current modification year) at the very top of the file before any imports
(i.e., before the existing import json / from enum import Enum / from pathlib
import Path lines), preserving encoding and formatting conventions used in other
Python sources in the repo.
tensorrt_llm/_torch/visual_gen/modules/attention.py (1)

1-3: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add the required NVIDIA SPDX/copyright header.

This modified Python source file currently has no header block at the top.

As per coding guidelines: All C++, Python, and other source files must contain NVIDIA copyright header with current modification year.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/visual_gen/modules/attention.py` around lines 1 - 3, Add
the required NVIDIA copyright/SPDX header at the very top of the file
tensorrt_llm/_torch/visual_gen/modules/attention.py (above the existing
imports), replacing the missing header; include the current modification year,
the NVIDIA copyright owner wording, and the SPDX-License-Identifier line as
required by project policy so the header appears before the lines containing
Enum and typing imports.
tensorrt_llm/_torch/visual_gen/models/wan/transformer_wan.py (1)

1-3: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add the required NVIDIA SPDX/copyright header.

This modified file is missing the header block required for Python source files.

As per coding guidelines: All C++, Python, and other source files must contain NVIDIA copyright header with current modification year.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/visual_gen/models/wan/transformer_wan.py` around lines 1
- 3, Add the required NVIDIA copyright/SPDX header to the top of
transformer_wan.py (above the existing imports like "import math" and "from
typing import Tuple"); include the current modification year, the NVIDIA
Corporation copyright statement and the SPDX-License-Identifier line exactly as
required by the project's header policy so the file complies with the C++/Python
source file header guidelines.
tensorrt_llm/_torch/visual_gen/attention_backend/vanilla.py (1)

1-2: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Update the copyright year on this modified file.

Line 1 still reflects 2025, but this file is modified in 2026.

As per coding guidelines: Include NVIDIA copyright header on all new files; update year on modified files.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/visual_gen/attention_backend/vanilla.py` around lines 1 -
2, Update the SPDX header year in the file
tensorrt_llm/_torch/visual_gen/attention_backend/vanilla.py by changing the
copyright year on the top-of-file header (the lines starting with "#
SPDX-FileCopyrightText" and "# SPDX-License-Identifier") from 2025 to 2026 so
the modified file reflects the current year.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@requirements.txt`:
- Line 92: The requirement line "retinaface @
git+https://github.com/NVShreyas/retinaface.git@main" uses a moving branch; pin
it to an immutable commit by replacing "@main" with a specific commit SHA (e.g.
"@<commit-hash>") from the retinaface repo, pick the desired stable commit via
the repo's commits page or git ls-remote, update the line in requirements.txt to
"retinaface @ git+https://github.com/NVShreyas/retinaface.git@<commit-hash>",
then regenerate/verify your lock or dependency install (pip-compile / pip
install -r requirements.txt) to ensure reproducible builds.

In `@tensorrt_llm/_torch/visual_gen/models/cosmos3/__init__.py`:
- Around line 1-3: This new module lacks the required NVIDIA copyright header;
add the repository’s standard NVIDIA copyright header block at the very top of
tensorrt_llm/_torch/visual_gen/models/cosmos3/__init__.py before any imports so
the file complies with the policy; keep the existing import of
Cosmos3OmniMoTPipeline and the __all__ export unchanged (refer to the module
name Cosmos3OmniMoTPipeline and the file __init__.py when making the edit).

In `@tensorrt_llm/_torch/visual_gen/models/cosmos3/defaults.py`:
- Around line 15-19: Module docstring incorrectly references "Wan" pipelines;
update the module-level docstring in defaults.py to reference "Cosmos3" (e.g.,
"Per-model default generation parameters for Cosmos3 pipelines.") and replace or
remove any mentions of WanPipeline and WanImageToVideoPipeline in that
docstring—if there are Cosmos3 pipeline class names (e.g., Cosmos3Pipeline /
Cosmos3ImageToVideoPipeline) use those exact names instead.

In `@tensorrt_llm/_torch/visual_gen/models/cosmos3/guardrails.py`:
- Around line 1-13: This file is missing the required NVIDIA SPDX/copyright
header; add the repository-standard header comment block (including
SPDX-License-Identifier and NVIDIA copyright line with the correct year) at the
very top of tensorrt_llm/_torch/visual_gen/models/cosmos3/guardrails.py before
the existing from __future__ import, ensuring the header format matches other
files in the repo and update the year if this is a modified file.
- Around line 126-233: Optional guardrail initialization may raise other runtime
exceptions (OSError, RuntimeError, etc.) beyond ImportError/FileNotFoundError;
wrap the Qwen guardrail and the video safety filter loading in broader exception
handlers so failures disable the guardrail instead of hard-failing.
Specifically, for the Qwen block that defines qwen_tokenizer, qwen_model and
_qwen_check (used by text_guardrail), replace the bare except ImportError with
except Exception as e and log the exception detail and skip adding the checker;
likewise in build_video_guardrail wrap the SiglipModel/SiglipProcessor and
classifier loading (siglip_model, siglip_processor, classifier, and the
_safety_check closure) with except Exception as e (not just
ImportError/FileNotFoundError), log the error including e, and leave
safety_checker as None so the pipeline continues. Ensure logs include the
exception message to aid debugging.
- Around line 361-370: The output-rank restoration is inverted: record the
original input rank (e.g., orig_dim = video_tensor.dim()) before converting to
numpy, then after creating result = torch.from_numpy(frames_np) restore to the
original shape—if orig_dim == 5 then result = result.unsqueeze(0) to return a 5D
tensor, and if orig_dim == 4 leave result as 4D; update the code around v,
video_tensor, v.dim(), torch.from_numpy and video_guardrail in
check_video_safety accordingly so the unsqueeze matches the original input rank.
- Line 206: Remove the unnecessary nonlocal declarations for siglip_model and
classifier in the guardrails closure(s): locate the lines containing "nonlocal
siglip_model, classifier" and any other "nonlocal classifier" inside the nested
function scopes in guardrails.py and delete those nonlocal statements (these
variables are only read, not reassigned). Keep all other logic unchanged so the
nested functions continue to reference the outer-scope siglip_model and
classifier as read-only. Run lint/tests to confirm the F824 error is resolved.

In `@tensorrt_llm/_torch/visual_gen/models/cosmos3/pipeline_cosmos3.py`:
- Around line 61-63: Normalize skip_components at the start of
load_standard_components (e.g., set to an empty list or set if None) so
membership checks like "PipelineComponent.VAE not in skip_components" are safe,
and update the logic around self.vae_scale_factor_spatial to only reference it
when the VAE is actually being loaded (i.e., when PipelineComponent.VAE is not
in skip_components and/or when the VAE has been initialized), avoiding
unconditional access that breaks skip_components=[PipelineComponent.VAE].
- Around line 439-445: Rank 0 currently checks text_guardrail (and similarly the
video-guardrail later) and returns early, but other ranks continue causing
deadlocks; change this so rank 0 computes a single boolean "blocked" (e.g.,
iterate prompts and set blocked=True if any text_guardrail returns false) and
then synchronize that decision across ranks using a torch.distributed collective
(broadcast or all_reduce) before any return; update the block that uses
self.rank, use_guardrails, text_guardrail, logger.warning, timer.mark_end,
timer.fill, PipelineOutput to have rank 0 set the blocked flag and an optional
message, broadcast them to all ranks, and have every rank perform
timer.mark_end() and return timer.fill(PipelineOutput()) when blocked; apply the
same pattern to the video-guardrail branch around the code at the later block
(lines ~575-580) so all ranks make the same early-exit decision.
- Around line 1-29: Add the standard NVIDIA copyright header at the top of the
new module pipeline_cosmos3.py before the first import; ensure the exact header
text used across the repo (including current year) is inserted as a block
comment so licensing scanners pick it up. Locate the file by the module name or
imports such as AutoencoderKLWan, Cosmos3VFMTransformer, and add the header
above those imports, keeping existing imports and code unchanged.
- Around line 90-104: Current logic gates both guardrails together; update the
block so text and video guardrails are loaded independently: check
TRTLLM_DISABLE_COSMOS3_GUARDRAILS and then separately if
PipelineComponent.TEXT_GUARDRAIL not in skip_components to initialize
self.text_guardrail (use model_config.extra_attrs['guardrail_checkpoint_dir'] or
download_guardrail_checkpoint() once) via build_text_guardrail(), and separately
if PipelineComponent.VIDEO_GUARDRAIL not in skip_components to initialize
self.video_guardrail via build_video_guardrail(); ensure the checkpoint
directory is resolved only once and reused for both builds and respect the
global TRTLLM_DISABLE_COSMOS3_GUARDRAILS flag.
- Around line 221-229: The code truncates token_ids before appending the
mandatory EOS and "<|vision_start|>" tokens, which can exceed
max_sequence_length; update the logic in the block handling token_ids and
max_sequence_length so you first ensure there is room for two suffix tokens by
truncating to max_sequence_length - 2 (or 0 min) before appending
self.tokenizer.eos_token_id and
self.tokenizer.convert_tokens_to_ids("<|vision_start|>"), then compute seq_len,
pad_len, attention_mask and pad with self.tokenizer.pad_token_id (or 0) as
before; apply this change around the token_ids manipulation to prevent exceeding
the model positional limit.

In `@tensorrt_llm/_torch/visual_gen/models/cosmos3/transformer_cosmos3.py`:
- Around line 1-19: This new module transformer_cosmos3.py is missing the
required NVIDIA copyright header at the top of the file; add the standard NVIDIA
license/header block as the very first lines (before the imports such as import
math, import torch, etc.), using the canonical header used across the repo and
updating the year if needed, so files like transformer_cosmos3.py (containing
symbols like TimestepEmbedding, DiffusionModelConfig, Attention, GatedMLP,
DynamicLinearWeightLoader) conform to the repository's copyright guidelines.
- Around line 662-671: The Ulysses divisibility check currently uses a logical
AND so it only raises when both self.num_attention_heads and self.num_kv_heads
are non-divisible by ulysses_size; change the condition in the block that
references use_ulysses, self.num_attention_heads, self.num_kv_heads, and
ulysses_size to use OR so the validation raises if either head count is not
divisible by ulysses_size (keep the ValueError message but ensure it triggers
when one or the other fails).
- Around line 85-98: The fps-division branch uses fps even when effective_fps
can be None (T<=1); update the enable_fps_modulation block (symbols:
enable_fps_modulation, fps, temporal_compression_factor, base_fps,
frame_indices, t_index, temporal_offset, grid_t, grid_h, grid_w) to first guard
on fps being not None — if fps is None, fall back to the non-modulation logic
that builds integer frame indices (like the else branch) or compute using
base_fps only; ensure you avoid doing fps / temporal_compression_factor when fps
is None and preserve the same tensor shapes/expansions for t_index.
- Around line 928-948: The current seq-parallel padding logic miscomputes shards
and pads the wrong axis: when S_gen % seq_parallel_size != 0 you must compute
pad and apply it to hidden_gen and to the sequence axis of cached_freqs_gen,
then compute S_shard from the padded sequence length (not the original S_gen)
and slice both hidden_gen and the padded cos/sin using seq_parallel_rank *
S_shard : (seq_parallel_rank+1) * S_shard; update cached_freqs_gen to the padded
versions and set freqs_gen from those padded, sliced cos/sin so no tail tokens
are dropped and shapes match. Ensure you reference hidden_gen,
self.cached_freqs_gen, S_gen, pad, S_shard, self.seq_parallel_size,
self.seq_parallel_rank, and freqs_gen when making the changes.

---

Outside diff comments:
In `@tensorrt_llm/_torch/visual_gen/attention_backend/vanilla.py`:
- Around line 1-2: Update the SPDX header year in the file
tensorrt_llm/_torch/visual_gen/attention_backend/vanilla.py by changing the
copyright year on the top-of-file header (the lines starting with "#
SPDX-FileCopyrightText" and "# SPDX-License-Identifier") from 2025 to 2026 so
the modified file reflects the current year.

In `@tensorrt_llm/_torch/visual_gen/config.py`:
- Around line 1-3: This file is missing the required NVIDIA SPDX/copyright
header; add the standard NVIDIA header block (including the
SPDX-License-Identifier and the copyright line with the current modification
year) at the very top of the file before any imports (i.e., before the existing
import json / from enum import Enum / from pathlib import Path lines),
preserving encoding and formatting conventions used in other Python sources in
the repo.

In `@tensorrt_llm/_torch/visual_gen/models/wan/transformer_wan.py`:
- Around line 1-3: Add the required NVIDIA copyright/SPDX header to the top of
transformer_wan.py (above the existing imports like "import math" and "from
typing import Tuple"); include the current modification year, the NVIDIA
Corporation copyright statement and the SPDX-License-Identifier line exactly as
required by the project's header policy so the file complies with the C++/Python
source file header guidelines.

In `@tensorrt_llm/_torch/visual_gen/modules/attention.py`:
- Around line 1-3: Add the required NVIDIA copyright/SPDX header at the very top
of the file tensorrt_llm/_torch/visual_gen/modules/attention.py (above the
existing imports), replacing the missing header; include the current
modification year, the NVIDIA copyright owner wording, and the
SPDX-License-Identifier line as required by project policy so the header appears
before the lines containing Enum and typing imports.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: abfe7a97-dcbb-477f-9bc9-a2f4dc9f9a7a

📥 Commits

Reviewing files that changed from the base of the PR and between 61eac2e and 049d813.

📒 Files selected for processing (13)
  • requirements.txt
  • tensorrt_llm/_torch/visual_gen/attention_backend/vanilla.py
  • tensorrt_llm/_torch/visual_gen/config.py
  • tensorrt_llm/_torch/visual_gen/models/__init__.py
  • tensorrt_llm/_torch/visual_gen/models/cosmos3/__init__.py
  • tensorrt_llm/_torch/visual_gen/models/cosmos3/defaults.py
  • tensorrt_llm/_torch/visual_gen/models/cosmos3/guardrails.py
  • tensorrt_llm/_torch/visual_gen/models/cosmos3/pipeline_cosmos3.py
  • tensorrt_llm/_torch/visual_gen/models/cosmos3/transformer_cosmos3.py
  • tensorrt_llm/_torch/visual_gen/models/ltx2/transformer_ltx2.py
  • tensorrt_llm/_torch/visual_gen/models/wan/transformer_wan.py
  • tensorrt_llm/_torch/visual_gen/modules/attention.py
  • tensorrt_llm/_torch/visual_gen/pipeline_registry.py

Comment thread requirements.txt
cache-dit>=1.3.5
nltk==3.9.4
better_profanity==0.7.0
retinaface @ git+https://github.com/NVShreyas/retinaface.git@main
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify mutable VCS refs in requirements
rg -n '^\s*[^#].*@\s*git\+https?://.*@(main|master|HEAD)\b' requirements.txt

Repository: NVIDIA/TensorRT-LLM

Length of output: 132


🏁 Script executed:

#!/bin/bash
# Check all git/VCS references in requirements.txt to ensure completeness
rg -n '@\s*git\+' requirements.txt

Repository: NVIDIA/TensorRT-LLM

Length of output: 132


Pin RetinaFace to an immutable commit, not @main.

Line 92 uses a moving branch ref, which makes builds non-reproducible and increases supply-chain risk.

🔒 Proposed fix
-retinaface @ git+https://github.com/NVShreyas/retinaface.git@main
+retinaface @ git+https://github.com/NVShreyas/retinaface.git@<commit_sha>
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@requirements.txt` at line 92, The requirement line "retinaface @
git+https://github.com/NVShreyas/retinaface.git@main" uses a moving branch; pin
it to an immutable commit by replacing "@main" with a specific commit SHA (e.g.
"@<commit-hash>") from the retinaface repo, pick the desired stable commit via
the repo's commits page or git ls-remote, update the line in requirements.txt to
"retinaface @ git+https://github.com/NVShreyas/retinaface.git@<commit-hash>",
then regenerate/verify your lock or dependency install (pip-compile / pip
install -r requirements.txt) to ensure reproducible builds.

Comment on lines +1 to +3
from .pipeline_cosmos3 import Cosmos3OmniMoTPipeline

__all__ = ["Cosmos3OmniMoTPipeline"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add the NVIDIA copyright header to this new module.

This file is newly added, but it starts directly with imports. The repo requires the standard NVIDIA header on every new Python source file.

As per coding guidelines, **/*.{cpp,cc,cxx,h,hpp,py} must “Include NVIDIA copyright header on all new files; update year on modified files”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/visual_gen/models/cosmos3/__init__.py` around lines 1 -
3, This new module lacks the required NVIDIA copyright header; add the
repository’s standard NVIDIA copyright header block at the very top of
tensorrt_llm/_torch/visual_gen/models/cosmos3/__init__.py before any imports so
the file complies with the policy; keep the existing import of
Cosmos3OmniMoTPipeline and the __all__ export unchanged (refer to the module
name Cosmos3OmniMoTPipeline and the file __init__.py when making the edit).

Comment on lines +15 to +19
"""Per-model default generation parameters for Wan pipelines.

Deduction cascade: model version (2.1/2.2) → model size → model name.
Shared by WanPipeline (T2V) and WanImageToVideoPipeline (I2V).
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the module docstring to reference Cosmos3, not Wan.

Line 15 currently describes Wan pipelines, but this module defines Cosmos3 defaults.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/visual_gen/models/cosmos3/defaults.py` around lines 15 -
19, Module docstring incorrectly references "Wan" pipelines; update the
module-level docstring in defaults.py to reference "Cosmos3" (e.g., "Per-model
default generation parameters for Cosmos3 pipelines.") and replace or remove any
mentions of WanPipeline and WanImageToVideoPipeline in that docstring—if there
are Cosmos3 pipeline class names (e.g., Cosmos3Pipeline /
Cosmos3ImageToVideoPipeline) use those exact names instead.

Comment on lines +1 to +13
from __future__ import annotations

import os
import warnings
from typing import Callable

import cv2
import numpy as np
import torch
import torch.nn as nn

from tensorrt_llm.logger import logger

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add the required NVIDIA SPDX/copyright header to this new file.

This new Python source file is missing the repository-required header block.

As per coding guidelines: Include NVIDIA copyright header on all new files; update year on modified files.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/visual_gen/models/cosmos3/guardrails.py` around lines 1 -
13, This file is missing the required NVIDIA SPDX/copyright header; add the
repository-standard header comment block (including SPDX-License-Identifier and
NVIDIA copyright line with the correct year) at the very top of
tensorrt_llm/_torch/visual_gen/models/cosmos3/guardrails.py before the existing
from __future__ import, ensuring the header format matches other files in the
repo and update the year if this is a modified file.

Comment on lines +126 to +233
try:
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Qwen/Qwen3Guard-Gen-0.6B"
qwen_tokenizer = AutoTokenizer.from_pretrained(model_id)
qwen_model = (
AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
)
.to("cuda")
.eval()
)

def _qwen_check(prompt: str) -> tuple[bool, str]:
conversations = [{"role": "user", "content": prompt}]
input_ids = qwen_tokenizer.apply_chat_template(
conversations,
tokenize=True,
return_tensors="pt",
add_generation_prompt=True,
return_dict=False,
).to("cuda")
with torch.no_grad():
output_ids = qwen_model.generate(input_ids, max_new_tokens=128)
response = qwen_tokenizer.decode(
output_ids[0][input_ids.shape[1] :],
skip_special_tokens=True,
)
if "unsafe" in response.lower():
return False, f"Qwen3Guard: {response.strip()}"
return True, ""

checkers.append(_qwen_check)
logger.info("Qwen3Guard guardrail loaded")
except ImportError:
logger.warning("transformers not installed; skipping Qwen3Guard")

def text_guardrail(prompt: str) -> None:
for checker in checkers:
is_safe, msg = checker(prompt)
if not is_safe:
return is_safe, msg
return True, ""

return text_guardrail


def build_video_guardrail(guardrail_ckpt_dir: str) -> VideoGuardrailFn:
safety_checker: Callable[[np.ndarray], tuple[bool, str]] | None = None
face_blurrer: Callable[[np.ndarray], np.ndarray] | None = None

# 1. Video content safety filter: SigLIP so400m + SafetyClassifier
try:
from PIL import Image
from transformers import SiglipModel, SiglipProcessor

siglip_dir = os.path.join(
guardrail_ckpt_dir,
"video_content_safety_filter",
"models--google--siglip-so400m-patch14-384/snapshots/9fdffc58afc957d1a03a25b10dba0329ab15c2a3",
)
if not os.path.exists(siglip_dir):
raise FileNotFoundError(siglip_dir)

siglip_model = (
SiglipModel.from_pretrained(siglip_dir).to("cuda", dtype=torch.float32).eval()
)
siglip_processor = SiglipProcessor.from_pretrained(siglip_dir)

classifier = SafetyClassifier(input_size=1152, num_classes=7)
ckpt_path = os.path.join(
guardrail_ckpt_dir, "video_content_safety_filter", "safety_filter.pt"
)
checkpoint = torch.load(ckpt_path, map_location="cpu", weights_only=True)
state = {k.removeprefix("network."): v for k, v in checkpoint["model"].items()}
classifier.load_state_dict(state)
classifier = classifier.to("cuda", dtype=torch.float32).eval()

def _safety_check(frames: np.ndarray) -> tuple[bool, str]:
nonlocal siglip_model, classifier

unsafe_count = 0
total = len(frames)
for frame in frames:
img = Image.fromarray(frame)
inputs = siglip_processor(images=img, return_tensors="pt").to(
"cuda", dtype=torch.float32
)
with torch.no_grad():
siglip_out = siglip_model.get_image_features(**inputs)
features = siglip_out.pooler_output
features = features / features.norm(dim=-1, keepdim=True)
logits = classifier(features)
pred = logits.argmax(dim=-1).item()
class_name = CLASS_IDX_TO_NAME.get(pred, "Unknown")
if class_name != "Safe":
unsafe_count += 1

if unsafe_count / total > CUTOFF_UNSAFE_FRAMES_PERCENT / 100:
return False, f"Video content safety: {unsafe_count}/{total} frames unsafe"
return True, ""

safety_checker = _safety_check
logger.info("Video content safety filter loaded (SigLIP so400m + classifier)")
except (ImportError, FileNotFoundError) as e:
logger.warning("Could not load video safety filter: %s", e)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Optional guardrail initialization can hard-fail on runtime load errors.

These blocks only catch ImportError/FileNotFoundError. Runtime failures from model load/device placement (e.g., OSError, RuntimeError) can still abort pipeline init instead of gracefully disabling optional guardrails.

🛠️ Suggested pattern
-    except ImportError:
+    except (ImportError, OSError, RuntimeError, ValueError) as e:
+        logger.warning("Could not load Qwen3Guard guardrail: %s", e)

-    except (ImportError, FileNotFoundError) as e:
+    except (ImportError, FileNotFoundError, OSError, RuntimeError, ValueError) as e:
         logger.warning("Could not load video safety filter: %s", e)

-    except (ImportError, FileNotFoundError) as e:
+    except (ImportError, FileNotFoundError, OSError, RuntimeError, ValueError) as e:
         logger.warning("Could not load face blur filter: %s", e)

Also applies to: 235-343

🧰 Tools
🪛 Flake8 (7.3.0)

[error] 206-206: nonlocal siglip_model is unused: name is never assigned in scope

(F824)


[error] 206-206: nonlocal classifier is unused: name is never assigned in scope

(F824)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/visual_gen/models/cosmos3/guardrails.py` around lines 126
- 233, Optional guardrail initialization may raise other runtime exceptions
(OSError, RuntimeError, etc.) beyond ImportError/FileNotFoundError; wrap the
Qwen guardrail and the video safety filter loading in broader exception handlers
so failures disable the guardrail instead of hard-failing. Specifically, for the
Qwen block that defines qwen_tokenizer, qwen_model and _qwen_check (used by
text_guardrail), replace the bare except ImportError with except Exception as e
and log the exception detail and skip adding the checker; likewise in
build_video_guardrail wrap the SiglipModel/SiglipProcessor and classifier
loading (siglip_model, siglip_processor, classifier, and the _safety_check
closure) with except Exception as e (not just ImportError/FileNotFoundError),
log the error including e, and leave safety_checker as None so the pipeline
continues. Ensure logs include the exception message to aid debugging.

Comment on lines +439 to +445
if self.rank == 0 and use_guardrails:
for p in prompt:
is_safe, msg = self.text_guardrail(p)
if not is_safe:
logger.warning(f"Text guardrail blocked prompt: {msg}")
timer.mark_end()
return timer.fill(PipelineOutput())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Synchronize guardrail block decisions across ranks before returning.

Only rank 0 evaluates guardrails and returns early. In distributed runs the other ranks keep going into denoise/decode collectives, so a blocked prompt or blocked video can deadlock the job.

Suggested fix
+        blocked = torch.zeros(1, device=self.device, dtype=torch.int32)
         if self.rank == 0 and use_guardrails:
             for p in prompt:
                 is_safe, msg = self.text_guardrail(p)
                 if not is_safe:
                     logger.warning(f"Text guardrail blocked prompt: {msg}")
-                    timer.mark_end()
-                    return timer.fill(PipelineOutput())
+                    blocked[0] = 1
+                    break
+
+        if torch.distributed.is_initialized():
+            torch.distributed.broadcast(blocked, src=0)
+
+        if blocked.item():
+            timer.mark_end()
+            return timer.fill(PipelineOutput())

Apply the same pattern to the video-guardrail branch before any rank returns.

Also applies to: 575-580

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/visual_gen/models/cosmos3/pipeline_cosmos3.py` around
lines 439 - 445, Rank 0 currently checks text_guardrail (and similarly the
video-guardrail later) and returns early, but other ranks continue causing
deadlocks; change this so rank 0 computes a single boolean "blocked" (e.g.,
iterate prompts and set blocked=True if any text_guardrail returns false) and
then synchronize that decision across ranks using a torch.distributed collective
(broadcast or all_reduce) before any return; update the block that uses
self.rank, use_guardrails, text_guardrail, logger.warning, timer.mark_end,
timer.fill, PipelineOutput to have rank 0 set the blocked flag and an optional
message, broadcast them to all ranks, and have every rank perform
timer.mark_end() and return timer.fill(PipelineOutput()) when blocked; apply the
same pattern to the video-guardrail branch around the code at the later block
(lines ~575-580) so all ranks make the same early-exit decision.

Comment on lines +1 to +19
import math
from typing import Tuple

import torch
import torch.distributed as dist
import torch.nn as nn
import torch.nn.functional as F
from diffusers.models.embeddings import TimestepEmbedding

from tensorrt_llm._torch.attention_backend.interface import PredefinedAttentionMask
from tensorrt_llm._torch.modules.embedding import Embedding
from tensorrt_llm._torch.modules.gated_mlp import GatedMLP
from tensorrt_llm._torch.modules.linear import Linear
from tensorrt_llm._torch.visual_gen.config import DiffusionModelConfig
from tensorrt_llm._torch.visual_gen.modules.attention import Attention, QKVMode
from tensorrt_llm._torch.visual_gen.quantization.loader import DynamicLinearWeightLoader
from tensorrt_llm.logger import logger
from tensorrt_llm.models.modeling_utils import QuantConfig

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add the NVIDIA copyright header to this new module.

This file is newly added, but it starts directly with imports. The repo requires the standard NVIDIA header on every new Python source file.

As per coding guidelines, **/*.{cpp,cc,cxx,h,hpp,py} must “Include NVIDIA copyright header on all new files; update year on modified files”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/visual_gen/models/cosmos3/transformer_cosmos3.py` around
lines 1 - 19, This new module transformer_cosmos3.py is missing the required
NVIDIA copyright header at the top of the file; add the standard NVIDIA
license/header block as the very first lines (before the imports such as import
math, import torch, etc.), using the canonical header used across the repo and
updating the year if needed, so files like transformer_cosmos3.py (containing
symbols like TimestepEmbedding, DiffusionModelConfig, Attention, GatedMLP,
DynamicLinearWeightLoader) conform to the repository's copyright guidelines.

Comment on lines +85 to +98
if enable_fps_modulation:
tps = fps / temporal_compression_factor
base_tps = base_fps / temporal_compression_factor
frame_indices = torch.arange(grid_t, dtype=torch.float32)
t_index = (
(frame_indices / tps * base_tps + temporal_offset)
.view(-1, 1)
.expand(-1, grid_h * grid_w)
.flatten()
)
else:
t_index = torch.arange(grid_t, dtype=torch.long).view(-1, 1).expand(
-1, grid_h * grid_w
).flatten() + int(temporal_offset)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard FPS modulation when fps is unavailable.

_compute_rope_freqs() deliberately passes effective_fps=None when T <= 1, but this branch still does fps / temporal_compression_factor. That makes single-frame runs crash as soon as enable_fps_modulation is enabled.

Suggested fix
-    if enable_fps_modulation:
+    if enable_fps_modulation and fps is not None:
         tps = fps / temporal_compression_factor
         base_tps = base_fps / temporal_compression_factor
         frame_indices = torch.arange(grid_t, dtype=torch.float32)
         t_index = (
             (frame_indices / tps * base_tps + temporal_offset)
             .view(-1, 1)
             .expand(-1, grid_h * grid_w)
             .flatten()
         )
     else:
         t_index = torch.arange(grid_t, dtype=torch.long).view(-1, 1).expand(
             -1, grid_h * grid_w
         ).flatten() + int(temporal_offset)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/visual_gen/models/cosmos3/transformer_cosmos3.py` around
lines 85 - 98, The fps-division branch uses fps even when effective_fps can be
None (T<=1); update the enable_fps_modulation block (symbols:
enable_fps_modulation, fps, temporal_compression_factor, base_fps,
frame_indices, t_index, temporal_offset, grid_t, grid_h, grid_w) to first guard
on fps being not None — if fps is None, fall back to the non-modulation logic
that builds integer frame indices (like the else branch) or compute using
base_fps only; ensure you avoid doing fps / temporal_compression_factor when fps
is None and preserve the same tensor shapes/expansions for t_index.

Comment on lines +662 to +671
if (
use_ulysses
and self.num_attention_heads % ulysses_size != 0
and self.num_kv_heads % ulysses_size != 0
):
raise ValueError(
f"num_attention_heads ({self.num_attention_heads}) and "
f"num_kv_heads ({self.num_kv_heads}) must be divisible by "
f"ulysses_size ({ulysses_size})"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use or in the Ulysses divisibility check.

Right now the validation only fails when both num_attention_heads and num_kv_heads are non-divisible. A config where just one of them is non-divisible still passes here and then breaks later during head sharding.

Suggested fix
         if (
             use_ulysses
-            and self.num_attention_heads % ulysses_size != 0
-            and self.num_kv_heads % ulysses_size != 0
+            and (
+                self.num_attention_heads % ulysses_size != 0
+                or self.num_kv_heads % ulysses_size != 0
+            )
         ):
             raise ValueError(
                 f"num_attention_heads ({self.num_attention_heads}) and "
                 f"num_kv_heads ({self.num_kv_heads}) must be divisible by "
                 f"ulysses_size ({ulysses_size})"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/visual_gen/models/cosmos3/transformer_cosmos3.py` around
lines 662 - 671, The Ulysses divisibility check currently uses a logical AND so
it only raises when both self.num_attention_heads and self.num_kv_heads are
non-divisible by ulysses_size; change the condition in the block that references
use_ulysses, self.num_attention_heads, self.num_kv_heads, and ulysses_size to
use OR so the validation raises if either head count is not divisible by
ulysses_size (keep the ValueError message but ensure it triggers when one or the
other fails).

Comment on lines +928 to +948
if self.use_seq_parallel:
S_gen = hidden_gen.shape[1]
pad = (self.seq_parallel_size - S_gen % self.seq_parallel_size) % self.seq_parallel_size
if pad > 0:
# This will cause minor noise in softmax due to padding.
hidden_gen = F.pad(hidden_gen, (0, 0, 0, pad))
cos, sin = self.cached_freqs_gen
self.cached_freqs_gen = (
F.pad(cos, (0, 0, 0, pad)),
F.pad(sin, (0, 0, 0, pad)),
)
S_shard = S_gen // self.seq_parallel_size
hidden_gen = hidden_gen[
:, self.seq_parallel_rank * S_shard : (self.seq_parallel_rank + 1) * S_shard
]
# Shard freqs_gen to match
cos, sin = self.cached_freqs_gen
freqs_gen = (
cos[:, self.seq_parallel_rank * S_shard : (self.seq_parallel_rank + 1) * S_shard],
sin[:, self.seq_parallel_rank * S_shard : (self.seq_parallel_rank + 1) * S_shard],
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Non-divisible GEN lengths are sharded incorrectly after padding.

This block has two coupled bugs when S_gen % seq_parallel_size != 0: S_shard is computed from the pre-pad length, so tail tokens are dropped, and F.pad(cos, (0, 0, 0, pad)) pads the singleton head axis on [B, S, 1, D] instead of the sequence axis. Seq-parallel inference will mis-shard or shape-mismatch on common video sizes.

Suggested fix
         if self.use_seq_parallel:
             S_gen = hidden_gen.shape[1]
             pad = (self.seq_parallel_size - S_gen % self.seq_parallel_size) % self.seq_parallel_size
             if pad > 0:
                 # This will cause minor noise in softmax due to padding.
                 hidden_gen = F.pad(hidden_gen, (0, 0, 0, pad))
                 cos, sin = self.cached_freqs_gen
                 self.cached_freqs_gen = (
-                    F.pad(cos, (0, 0, 0, pad)),
-                    F.pad(sin, (0, 0, 0, pad)),
+                    F.pad(cos, (0, 0, 0, 0, 0, pad)),
+                    F.pad(sin, (0, 0, 0, 0, 0, pad)),
                 )
-            S_shard = S_gen // self.seq_parallel_size
+            padded_s_gen = S_gen + pad
+            S_shard = padded_s_gen // self.seq_parallel_size
             hidden_gen = hidden_gen[
                 :, self.seq_parallel_rank * S_shard : (self.seq_parallel_rank + 1) * S_shard
             ]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/visual_gen/models/cosmos3/transformer_cosmos3.py` around
lines 928 - 948, The current seq-parallel padding logic miscomputes shards and
pads the wrong axis: when S_gen % seq_parallel_size != 0 you must compute pad
and apply it to hidden_gen and to the sequence axis of cached_freqs_gen, then
compute S_shard from the padded sequence length (not the original S_gen) and
slice both hidden_gen and the padded cos/sin using seq_parallel_rank * S_shard :
(seq_parallel_rank+1) * S_shard; update cached_freqs_gen to the padded versions
and set freqs_gen from those padded, sliced cos/sin so no tail tokens are
dropped and shapes match. Ensure you reference hidden_gen,
self.cached_freqs_gen, S_gen, pad, S_shard, self.seq_parallel_size,
self.seq_parallel_rank, and freqs_gen when making the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant