[TRTLLM-11410][feat] MoT World Model Support by NVShreyas · Pull Request #14012 · NVIDIA/TensorRT-LLM

NVShreyas · 2026-05-12T00:15:05Z

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Summary by CodeRabbit

Release Notes

New Features
- Added Cosmos3 video generation pipeline supporting text-to-video and image-to-video generation modes.
- Introduced text and video guardrails with profanity filtering, content safety classification, and face anonymization capabilities.
- Enhanced diffusion model architecture with improved attention mechanisms for visual generation tasks.

Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>

Signed-off-by: Shreyas Mista <shreyasm@nvidia.com>

coderabbitai · 2026-05-12T00:21:55Z

📝 Walkthrough

Walkthrough

This PR integrates a Cosmos3 visual-to-video diffusion pipeline into TensorRT-LLM, introducing text/video guardrails, a dual-pathway mRoPE-based transformer, and a complete generation flow supporting both text-to-video and image-to-video modes. It also refines Ulysses sequence parallelism control across pipelines.

Changes

Cosmos3 Pipeline & Guardrails

Layer / File(s)	Summary
Configuration & Dependencies `requirements.txt`, `tensorrt_llm/_torch/visual_gen/config.py`, `tensorrt_llm/_torch/visual_gen/models/cosmos3/defaults.py`	Three new guardrail dependencies added (nltk, better-profanity, retinaface). Pipeline configuration extended with `TEXT_GUARDRAIL`/`VIDEO_GUARDRAIL` enum members, `guardrail_checkpoint_dir` field in `VisualGenArgs`, and Cosmos3 generation defaults (720p spatial/temporal params, guidance scale, inference steps).
Text & Video Guardrails `tensorrt_llm/_torch/visual_gen/models/cosmos3/guardrails.py`	`SafetyClassifier` neural network for content safety classification. Text guardrail applies optional profanity blocklist and Qwen3Guard model check. Video guardrail runs SigLIP content safety filter (rejecting video if unsafe frame ratio exceeds threshold) and RetinaFace face detection with pixelation for anonymization. `check_video_safety()` converts tensors to NumPy, applies filtering, and returns None if rejected.
Cosmos3 Transformer & Attention `tensorrt_llm/_torch/visual_gen/models/cosmos3/transformer_cosmos3.py`	Dual-pathway transformer: UND path applies causal self-attention with mRoPE and RMSNorm to text tokens (caching K/V). GEN path applies cross-attention to UND K/V with mRoPE applied to visual queries. Includes mRoPE position-ID computation for 3D (T,H,W) grids, timestep embedding, language model, and full VFMTransformer with checkpoint compatibility and sequence-parallel support.
Cosmos3 Pipeline Generation `tensorrt_llm/_torch/visual_gen/models/cosmos3/pipeline_cosmos3.py`	`Cosmos3OmniMoTPipeline` orchestrates generation: loads Qwen2 tokenizer, WAN VAE, UniPCMultistep scheduler. Applies prompt templates (optional duration/resolution metadata), tokenizes via chat template, supports text-to-video (noise init) and image-to-video (encode conditioning image with velocity masking). Implements CFG via `extra_cfg_tensors` for token splitting, denoising loop with scheduler, latent decoding with VAE denormalization, and applies video guardrails on rank 0.
Module Exports & Registration `tensorrt_llm/_torch/visual_gen/models/__init__.py`, `tensorrt_llm/_torch/visual_gen/models/cosmos3/__init__.py`, `tensorrt_llm/_torch/visual_gen/pipeline_registry.py`	`Cosmos3OmniMoTPipeline` added to pipeline module exports. Auto-detection in `AutoPipeline._detect_from_checkpoint` recognizes Cosmos3 checkpoints by `_class_name` match.
Ulysses Parallelism Control `tensorrt_llm/_torch/visual_gen/attention_backend/vanilla.py`, `tensorrt_llm/_torch/visual_gen/modules/attention.py`, `tensorrt_llm/_torch/visual_gen/models/ltx2/transformer_ltx2.py`, `tensorrt_llm/_torch/visual_gen/models/wan/transformer_wan.py`	SDPA now explicitly sets `enable_gqa` when query heads differ from KV heads. `Attention` base class accepts `enable_ulysses: bool = True` flag. LTX2 disables Ulysses for cross-attention only (`enable_ulysses=use_ulysses and not self._is_cross_attn`). WAN disables Ulysses for its cross-attention path (`enable_ulysses=False`).

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is empty—it contains only the repository template with no actual explanation of changes, test coverage, or checklist completion.	Add a description explaining the Cosmos3 implementation, guardrails, test coverage, and confirm checklist items; reference the commit messages and file summaries for context.
Docstring Coverage	⚠️ Warning	Docstring coverage is 35.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies the main feature addition: Cosmos3 (MoT World Model) support with guardrails, attention updates, and pipeline integration.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

tensorrt_llm/_torch/visual_gen/config.py (1)
1-3: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add the required NVIDIA SPDX/copyright header.

This modified Python source file currently has no header block at the top.

As per coding guidelines: All C++, Python, and other source files must contain NVIDIA copyright header with current modification year.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/visual_gen/config.py` around lines 1 - 3, This file is
missing the required NVIDIA SPDX/copyright header; add the standard NVIDIA
header block (including the SPDX-License-Identifier and the copyright line with
the current modification year) at the very top of the file before any imports
(i.e., before the existing import json / from enum import Enum / from pathlib
import Path lines), preserving encoding and formatting conventions used in other
Python sources in the repo.
tensorrt_llm/_torch/visual_gen/modules/attention.py (1)
1-3: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add the required NVIDIA SPDX/copyright header.

This modified Python source file currently has no header block at the top.

As per coding guidelines: All C++, Python, and other source files must contain NVIDIA copyright header with current modification year.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/visual_gen/modules/attention.py` around lines 1 - 3, Add
the required NVIDIA copyright/SPDX header at the very top of the file
tensorrt_llm/_torch/visual_gen/modules/attention.py (above the existing
imports), replacing the missing header; include the current modification year,
the NVIDIA copyright owner wording, and the SPDX-License-Identifier line as
required by project policy so the header appears before the lines containing
Enum and typing imports.
tensorrt_llm/_torch/visual_gen/models/wan/transformer_wan.py (1)
1-3: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add the required NVIDIA SPDX/copyright header.

This modified file is missing the header block required for Python source files.

As per coding guidelines: All C++, Python, and other source files must contain NVIDIA copyright header with current modification year.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/visual_gen/models/wan/transformer_wan.py` around lines 1
- 3, Add the required NVIDIA copyright/SPDX header to the top of
transformer_wan.py (above the existing imports like "import math" and "from
typing import Tuple"); include the current modification year, the NVIDIA
Corporation copyright statement and the SPDX-License-Identifier line exactly as
required by the project's header policy so the file complies with the C++/Python
source file header guidelines.
tensorrt_llm/_torch/visual_gen/attention_backend/vanilla.py (1)
1-2: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Update the copyright year on this modified file.

Line 1 still reflects 2025, but this file is modified in 2026.

As per coding guidelines: Include NVIDIA copyright header on all new files; update year on modified files.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/visual_gen/attention_backend/vanilla.py` around lines 1 -
2, Update the SPDX header year in the file
tensorrt_llm/_torch/visual_gen/attention_backend/vanilla.py by changing the
copyright year on the top-of-file header (the lines starting with "#
SPDX-FileCopyrightText" and "# SPDX-License-Identifier") from 2025 to 2026 so
the modified file reflects the current year.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@requirements.txt`:
- Line 92: The requirement line "retinaface @
git+https://github.com/NVShreyas/retinaface.git@main" uses a moving branch; pin
it to an immutable commit by replacing "@main" with a specific commit SHA (e.g.
"@<commit-hash>") from the retinaface repo, pick the desired stable commit via
the repo's commits page or git ls-remote, update the line in requirements.txt to
"retinaface @ git+https://github.com/NVShreyas/retinaface.git@<commit-hash>",
then regenerate/verify your lock or dependency install (pip-compile / pip
install -r requirements.txt) to ensure reproducible builds.

In `@tensorrt_llm/_torch/visual_gen/models/cosmos3/__init__.py`:
- Around line 1-3: This new module lacks the required NVIDIA copyright header;
add the repository’s standard NVIDIA copyright header block at the very top of
tensorrt_llm/_torch/visual_gen/models/cosmos3/__init__.py before any imports so
the file complies with the policy; keep the existing import of
Cosmos3OmniMoTPipeline and the __all__ export unchanged (refer to the module
name Cosmos3OmniMoTPipeline and the file __init__.py when making the edit).

In `@tensorrt_llm/_torch/visual_gen/models/cosmos3/defaults.py`:
- Around line 15-19: Module docstring incorrectly references "Wan" pipelines;
update the module-level docstring in defaults.py to reference "Cosmos3" (e.g.,
"Per-model default generation parameters for Cosmos3 pipelines.") and replace or
remove any mentions of WanPipeline and WanImageToVideoPipeline in that
docstring—if there are Cosmos3 pipeline class names (e.g., Cosmos3Pipeline /
Cosmos3ImageToVideoPipeline) use those exact names instead.

In `@tensorrt_llm/_torch/visual_gen/models/cosmos3/guardrails.py`:
- Around line 1-13: This file is missing the required NVIDIA SPDX/copyright
header; add the repository-standard header comment block (including
SPDX-License-Identifier and NVIDIA copyright line with the correct year) at the
very top of tensorrt_llm/_torch/visual_gen/models/cosmos3/guardrails.py before
the existing from __future__ import, ensuring the header format matches other
files in the repo and update the year if this is a modified file.
- Around line 126-233: Optional guardrail initialization may raise other runtime
exceptions (OSError, RuntimeError, etc.) beyond ImportError/FileNotFoundError;
wrap the Qwen guardrail and the video safety filter loading in broader exception
handlers so failures disable the guardrail instead of hard-failing.
Specifically, for the Qwen block that defines qwen_tokenizer, qwen_model and
_qwen_check (used by text_guardrail), replace the bare except ImportError with
except Exception as e and log the exception detail and skip adding the checker;
likewise in build_video_guardrail wrap the SiglipModel/SiglipProcessor and
classifier loading (siglip_model, siglip_processor, classifier, and the
_safety_check closure) with except Exception as e (not just
ImportError/FileNotFoundError), log the error including e, and leave
safety_checker as None so the pipeline continues. Ensure logs include the
exception message to aid debugging.
- Around line 361-370: The output-rank restoration is inverted: record the
original input rank (e.g., orig_dim = video_tensor.dim()) before converting to
numpy, then after creating result = torch.from_numpy(frames_np) restore to the
original shape—if orig_dim == 5 then result = result.unsqueeze(0) to return a 5D
tensor, and if orig_dim == 4 leave result as 4D; update the code around v,
video_tensor, v.dim(), torch.from_numpy and video_guardrail in
check_video_safety accordingly so the unsqueeze matches the original input rank.
- Line 206: Remove the unnecessary nonlocal declarations for siglip_model and
classifier in the guardrails closure(s): locate the lines containing "nonlocal
siglip_model, classifier" and any other "nonlocal classifier" inside the nested
function scopes in guardrails.py and delete those nonlocal statements (these
variables are only read, not reassigned). Keep all other logic unchanged so the
nested functions continue to reference the outer-scope siglip_model and
classifier as read-only. Run lint/tests to confirm the F824 error is resolved.

In `@tensorrt_llm/_torch/visual_gen/models/cosmos3/pipeline_cosmos3.py`:
- Around line 61-63: Normalize skip_components at the start of
load_standard_components (e.g., set to an empty list or set if None) so
membership checks like "PipelineComponent.VAE not in skip_components" are safe,
and update the logic around self.vae_scale_factor_spatial to only reference it
when the VAE is actually being loaded (i.e., when PipelineComponent.VAE is not
in skip_components and/or when the VAE has been initialized), avoiding
unconditional access that breaks skip_components=[PipelineComponent.VAE].
- Around line 439-445: Rank 0 currently checks text_guardrail (and similarly the
video-guardrail later) and returns early, but other ranks continue causing
deadlocks; change this so rank 0 computes a single boolean "blocked" (e.g.,
iterate prompts and set blocked=True if any text_guardrail returns false) and
then synchronize that decision across ranks using a torch.distributed collective
(broadcast or all_reduce) before any return; update the block that uses
self.rank, use_guardrails, text_guardrail, logger.warning, timer.mark_end,
timer.fill, PipelineOutput to have rank 0 set the blocked flag and an optional
message, broadcast them to all ranks, and have every rank perform
timer.mark_end() and return timer.fill(PipelineOutput()) when blocked; apply the
same pattern to the video-guardrail branch around the code at the later block
(lines ~575-580) so all ranks make the same early-exit decision.
- Around line 1-29: Add the standard NVIDIA copyright header at the top of the
new module pipeline_cosmos3.py before the first import; ensure the exact header
text used across the repo (including current year) is inserted as a block
comment so licensing scanners pick it up. Locate the file by the module name or
imports such as AutoencoderKLWan, Cosmos3VFMTransformer, and add the header
above those imports, keeping existing imports and code unchanged.
- Around line 90-104: Current logic gates both guardrails together; update the
block so text and video guardrails are loaded independently: check
TRTLLM_DISABLE_COSMOS3_GUARDRAILS and then separately if
PipelineComponent.TEXT_GUARDRAIL not in skip_components to initialize
self.text_guardrail (use model_config.extra_attrs['guardrail_checkpoint_dir'] or
download_guardrail_checkpoint() once) via build_text_guardrail(), and separately
if PipelineComponent.VIDEO_GUARDRAIL not in skip_components to initialize
self.video_guardrail via build_video_guardrail(); ensure the checkpoint
directory is resolved only once and reused for both builds and respect the
global TRTLLM_DISABLE_COSMOS3_GUARDRAILS flag.
- Around line 221-229: The code truncates token_ids before appending the
mandatory EOS and "<|vision_start|>" tokens, which can exceed
max_sequence_length; update the logic in the block handling token_ids and
max_sequence_length so you first ensure there is room for two suffix tokens by
truncating to max_sequence_length - 2 (or 0 min) before appending
self.tokenizer.eos_token_id and
self.tokenizer.convert_tokens_to_ids("<|vision_start|>"), then compute seq_len,
pad_len, attention_mask and pad with self.tokenizer.pad_token_id (or 0) as
before; apply this change around the token_ids manipulation to prevent exceeding
the model positional limit.

In `@tensorrt_llm/_torch/visual_gen/models/cosmos3/transformer_cosmos3.py`:
- Around line 1-19: This new module transformer_cosmos3.py is missing the
required NVIDIA copyright header at the top of the file; add the standard NVIDIA
license/header block as the very first lines (before the imports such as import
math, import torch, etc.), using the canonical header used across the repo and
updating the year if needed, so files like transformer_cosmos3.py (containing
symbols like TimestepEmbedding, DiffusionModelConfig, Attention, GatedMLP,
DynamicLinearWeightLoader) conform to the repository's copyright guidelines.
- Around line 662-671: The Ulysses divisibility check currently uses a logical
AND so it only raises when both self.num_attention_heads and self.num_kv_heads
are non-divisible by ulysses_size; change the condition in the block that
references use_ulysses, self.num_attention_heads, self.num_kv_heads, and
ulysses_size to use OR so the validation raises if either head count is not
divisible by ulysses_size (keep the ValueError message but ensure it triggers
when one or the other fails).
- Around line 85-98: The fps-division branch uses fps even when effective_fps
can be None (T<=1); update the enable_fps_modulation block (symbols:
enable_fps_modulation, fps, temporal_compression_factor, base_fps,
frame_indices, t_index, temporal_offset, grid_t, grid_h, grid_w) to first guard
on fps being not None — if fps is None, fall back to the non-modulation logic
that builds integer frame indices (like the else branch) or compute using
base_fps only; ensure you avoid doing fps / temporal_compression_factor when fps
is None and preserve the same tensor shapes/expansions for t_index.
- Around line 928-948: The current seq-parallel padding logic miscomputes shards
and pads the wrong axis: when S_gen % seq_parallel_size != 0 you must compute
pad and apply it to hidden_gen and to the sequence axis of cached_freqs_gen,
then compute S_shard from the padded sequence length (not the original S_gen)
and slice both hidden_gen and the padded cos/sin using seq_parallel_rank *
S_shard : (seq_parallel_rank+1) * S_shard; update cached_freqs_gen to the padded
versions and set freqs_gen from those padded, sliced cos/sin so no tail tokens
are dropped and shapes match. Ensure you reference hidden_gen,
self.cached_freqs_gen, S_gen, pad, S_shard, self.seq_parallel_size,
self.seq_parallel_rank, and freqs_gen when making the changes.

---

Outside diff comments:
In `@tensorrt_llm/_torch/visual_gen/attention_backend/vanilla.py`:
- Around line 1-2: Update the SPDX header year in the file
tensorrt_llm/_torch/visual_gen/attention_backend/vanilla.py by changing the
copyright year on the top-of-file header (the lines starting with "#
SPDX-FileCopyrightText" and "# SPDX-License-Identifier") from 2025 to 2026 so
the modified file reflects the current year.

In `@tensorrt_llm/_torch/visual_gen/config.py`:
- Around line 1-3: This file is missing the required NVIDIA SPDX/copyright
header; add the standard NVIDIA header block (including the
SPDX-License-Identifier and the copyright line with the current modification
year) at the very top of the file before any imports (i.e., before the existing
import json / from enum import Enum / from pathlib import Path lines),
preserving encoding and formatting conventions used in other Python sources in
the repo.

In `@tensorrt_llm/_torch/visual_gen/models/wan/transformer_wan.py`:
- Around line 1-3: Add the required NVIDIA copyright/SPDX header to the top of
transformer_wan.py (above the existing imports like "import math" and "from
typing import Tuple"); include the current modification year, the NVIDIA
Corporation copyright statement and the SPDX-License-Identifier line exactly as
required by the project's header policy so the file complies with the C++/Python
source file header guidelines.

In `@tensorrt_llm/_torch/visual_gen/modules/attention.py`:
- Around line 1-3: Add the required NVIDIA copyright/SPDX header at the very top
of the file tensorrt_llm/_torch/visual_gen/modules/attention.py (above the
existing imports), replacing the missing header; include the current
modification year, the NVIDIA copyright owner wording, and the
SPDX-License-Identifier line as required by project policy so the header appears
before the lines containing Enum and typing imports.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: abfe7a97-dcbb-477f-9bc9-a2f4dc9f9a7a

📥 Commits

Reviewing files that changed from the base of the PR and between 61eac2e and 049d813.

📒 Files selected for processing (13)

requirements.txt
tensorrt_llm/_torch/visual_gen/attention_backend/vanilla.py
tensorrt_llm/_torch/visual_gen/config.py
tensorrt_llm/_torch/visual_gen/models/__init__.py
tensorrt_llm/_torch/visual_gen/models/cosmos3/__init__.py
tensorrt_llm/_torch/visual_gen/models/cosmos3/defaults.py
tensorrt_llm/_torch/visual_gen/models/cosmos3/guardrails.py
tensorrt_llm/_torch/visual_gen/models/cosmos3/pipeline_cosmos3.py
tensorrt_llm/_torch/visual_gen/models/cosmos3/transformer_cosmos3.py
tensorrt_llm/_torch/visual_gen/models/ltx2/transformer_ltx2.py
tensorrt_llm/_torch/visual_gen/models/wan/transformer_wan.py
tensorrt_llm/_torch/visual_gen/modules/attention.py
tensorrt_llm/_torch/visual_gen/pipeline_registry.py

coderabbitai · 2026-05-12T00:21:58Z

 cache-dit>=1.3.5
+nltk==3.9.4
+better_profanity==0.7.0
+retinaface @ git+https://github.com/NVShreyas/retinaface.git@main


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify mutable VCS refs in requirements rg -n '^\s*[^#].*@\s*git\+https?://.*@(main|master|HEAD)\b' requirements.txt

Repository: NVIDIA/TensorRT-LLM

Length of output: 132

🏁 Script executed:

#!/bin/bash # Check all git/VCS references in requirements.txt to ensure completeness rg -n '@\s*git\+' requirements.txt

Repository: NVIDIA/TensorRT-LLM

Length of output: 132

Pin RetinaFace to an immutable commit, not @main.

Line 92 uses a moving branch ref, which makes builds non-reproducible and increases supply-chain risk.

🔒 Proposed fix

-retinaface @ git+https://github.com/NVShreyas/retinaface.git@main +retinaface @ git+https://github.com/NVShreyas/retinaface.git@<commit_sha>

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@requirements.txt` at line 92, The requirement line "retinaface @ git+https://github.com/NVShreyas/retinaface.git@main" uses a moving branch; pin it to an immutable commit by replacing "@main" with a specific commit SHA (e.g. "@<commit-hash>") from the retinaface repo, pick the desired stable commit via the repo's commits page or git ls-remote, update the line in requirements.txt to "retinaface @ git+https://github.com/NVShreyas/retinaface.git@<commit-hash>", then regenerate/verify your lock or dependency install (pip-compile / pip install -r requirements.txt) to ensure reproducible builds.

coderabbitai · 2026-05-12T00:21:58Z

+from .pipeline_cosmos3 import Cosmos3OmniMoTPipeline
+
+__all__ = ["Cosmos3OmniMoTPipeline"]


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add the NVIDIA copyright header to this new module.

This file is newly added, but it starts directly with imports. The repo requires the standard NVIDIA header on every new Python source file.

As per coding guidelines, **/*.{cpp,cc,cxx,h,hpp,py} must “Include NVIDIA copyright header on all new files; update year on modified files”.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/_torch/visual_gen/models/cosmos3/__init__.py` around lines 1 - 3, This new module lacks the required NVIDIA copyright header; add the repository’s standard NVIDIA copyright header block at the very top of tensorrt_llm/_torch/visual_gen/models/cosmos3/__init__.py before any imports so the file complies with the policy; keep the existing import of Cosmos3OmniMoTPipeline and the __all__ export unchanged (refer to the module name Cosmos3OmniMoTPipeline and the file __init__.py when making the edit).

coderabbitai · 2026-05-12T00:21:58Z

+"""Per-model default generation parameters for Wan pipelines.
+
+Deduction cascade: model version (2.1/2.2) → model size → model name.
+Shared by WanPipeline (T2V) and WanImageToVideoPipeline (I2V).
+"""


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the module docstring to reference Cosmos3, not Wan.

Line 15 currently describes Wan pipelines, but this module defines Cosmos3 defaults.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/_torch/visual_gen/models/cosmos3/defaults.py` around lines 15 - 19, Module docstring incorrectly references "Wan" pipelines; update the module-level docstring in defaults.py to reference "Cosmos3" (e.g., "Per-model default generation parameters for Cosmos3 pipelines.") and replace or remove any mentions of WanPipeline and WanImageToVideoPipeline in that docstring—if there are Cosmos3 pipeline class names (e.g., Cosmos3Pipeline / Cosmos3ImageToVideoPipeline) use those exact names instead.

coderabbitai · 2026-05-12T00:21:59Z

+from __future__ import annotations
+
+import os
+import warnings
+from typing import Callable
+
+import cv2
+import numpy as np
+import torch
+import torch.nn as nn
+
+from tensorrt_llm.logger import logger
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add the required NVIDIA SPDX/copyright header to this new file.

This new Python source file is missing the repository-required header block.

As per coding guidelines: Include NVIDIA copyright header on all new files; update year on modified files.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/_torch/visual_gen/models/cosmos3/guardrails.py` around lines 1 - 13, This file is missing the required NVIDIA SPDX/copyright header; add the repository-standard header comment block (including SPDX-License-Identifier and NVIDIA copyright line with the correct year) at the very top of tensorrt_llm/_torch/visual_gen/models/cosmos3/guardrails.py before the existing from __future__ import, ensuring the header format matches other files in the repo and update the year if this is a modified file.

coderabbitai · 2026-05-12T00:21:59Z

+    try:
+        from transformers import AutoModelForCausalLM, AutoTokenizer
+
+        model_id = "Qwen/Qwen3Guard-Gen-0.6B"
+        qwen_tokenizer = AutoTokenizer.from_pretrained(model_id)
+        qwen_model = (
+            AutoModelForCausalLM.from_pretrained(
+                model_id,
+                torch_dtype=torch.bfloat16,
+            )
+            .to("cuda")
+            .eval()
+        )
+
+        def _qwen_check(prompt: str) -> tuple[bool, str]:
+            conversations = [{"role": "user", "content": prompt}]
+            input_ids = qwen_tokenizer.apply_chat_template(
+                conversations,
+                tokenize=True,
+                return_tensors="pt",
+                add_generation_prompt=True,
+                return_dict=False,
+            ).to("cuda")
+            with torch.no_grad():
+                output_ids = qwen_model.generate(input_ids, max_new_tokens=128)
+            response = qwen_tokenizer.decode(
+                output_ids[0][input_ids.shape[1] :],
+                skip_special_tokens=True,
+            )
+            if "unsafe" in response.lower():
+                return False, f"Qwen3Guard: {response.strip()}"
+            return True, ""
+
+        checkers.append(_qwen_check)
+        logger.info("Qwen3Guard guardrail loaded")
+    except ImportError:
+        logger.warning("transformers not installed; skipping Qwen3Guard")
+
+    def text_guardrail(prompt: str) -> None:
+        for checker in checkers:
+            is_safe, msg = checker(prompt)
+            if not is_safe:
+                return is_safe, msg
+        return True, ""
+
+    return text_guardrail
+
+
+def build_video_guardrail(guardrail_ckpt_dir: str) -> VideoGuardrailFn:
+    safety_checker: Callable[[np.ndarray], tuple[bool, str]] | None = None
+    face_blurrer: Callable[[np.ndarray], np.ndarray] | None = None
+
+    # 1. Video content safety filter: SigLIP so400m + SafetyClassifier
+    try:
+        from PIL import Image
+        from transformers import SiglipModel, SiglipProcessor
+
+        siglip_dir = os.path.join(
+            guardrail_ckpt_dir,
+            "video_content_safety_filter",
+            "models--google--siglip-so400m-patch14-384/snapshots/9fdffc58afc957d1a03a25b10dba0329ab15c2a3",
+        )
+        if not os.path.exists(siglip_dir):
+            raise FileNotFoundError(siglip_dir)
+
+        siglip_model = (
+            SiglipModel.from_pretrained(siglip_dir).to("cuda", dtype=torch.float32).eval()
+        )
+        siglip_processor = SiglipProcessor.from_pretrained(siglip_dir)
+
+        classifier = SafetyClassifier(input_size=1152, num_classes=7)
+        ckpt_path = os.path.join(
+            guardrail_ckpt_dir, "video_content_safety_filter", "safety_filter.pt"
+        )
+        checkpoint = torch.load(ckpt_path, map_location="cpu", weights_only=True)
+        state = {k.removeprefix("network."): v for k, v in checkpoint["model"].items()}
+        classifier.load_state_dict(state)
+        classifier = classifier.to("cuda", dtype=torch.float32).eval()
+
+        def _safety_check(frames: np.ndarray) -> tuple[bool, str]:
+            nonlocal siglip_model, classifier
+
+            unsafe_count = 0
+            total = len(frames)
+            for frame in frames:
+                img = Image.fromarray(frame)
+                inputs = siglip_processor(images=img, return_tensors="pt").to(
+                    "cuda", dtype=torch.float32
+                )
+                with torch.no_grad():
+                    siglip_out = siglip_model.get_image_features(**inputs)
+                    features = siglip_out.pooler_output
+                    features = features / features.norm(dim=-1, keepdim=True)
+                    logits = classifier(features)
+                    pred = logits.argmax(dim=-1).item()
+                class_name = CLASS_IDX_TO_NAME.get(pred, "Unknown")
+                if class_name != "Safe":
+                    unsafe_count += 1
+
+            if unsafe_count / total > CUTOFF_UNSAFE_FRAMES_PERCENT / 100:
+                return False, f"Video content safety: {unsafe_count}/{total} frames unsafe"
+            return True, ""
+
+        safety_checker = _safety_check
+        logger.info("Video content safety filter loaded (SigLIP so400m + classifier)")
+    except (ImportError, FileNotFoundError) as e:
+        logger.warning("Could not load video safety filter: %s", e)
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Optional guardrail initialization can hard-fail on runtime load errors.

These blocks only catch ImportError/FileNotFoundError. Runtime failures from model load/device placement (e.g., OSError, RuntimeError) can still abort pipeline init instead of gracefully disabling optional guardrails.

🛠️ Suggested pattern

- except ImportError: + except (ImportError, OSError, RuntimeError, ValueError) as e: + logger.warning("Could not load Qwen3Guard guardrail: %s", e) - except (ImportError, FileNotFoundError) as e: + except (ImportError, FileNotFoundError, OSError, RuntimeError, ValueError) as e: logger.warning("Could not load video safety filter: %s", e) - except (ImportError, FileNotFoundError) as e: + except (ImportError, FileNotFoundError, OSError, RuntimeError, ValueError) as e: logger.warning("Could not load face blur filter: %s", e)

Also applies to: 235-343

🧰 Tools

🪛 Flake8 (7.3.0)

[error] 206-206: nonlocal siglip_model is unused: name is never assigned in scope

(F824)

[error] 206-206: nonlocal classifier is unused: name is never assigned in scope

(F824)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/_torch/visual_gen/models/cosmos3/guardrails.py` around lines 126 - 233, Optional guardrail initialization may raise other runtime exceptions (OSError, RuntimeError, etc.) beyond ImportError/FileNotFoundError; wrap the Qwen guardrail and the video safety filter loading in broader exception handlers so failures disable the guardrail instead of hard-failing. Specifically, for the Qwen block that defines qwen_tokenizer, qwen_model and _qwen_check (used by text_guardrail), replace the bare except ImportError with except Exception as e and log the exception detail and skip adding the checker; likewise in build_video_guardrail wrap the SiglipModel/SiglipProcessor and classifier loading (siglip_model, siglip_processor, classifier, and the _safety_check closure) with except Exception as e (not just ImportError/FileNotFoundError), log the error including e, and leave safety_checker as None so the pipeline continues. Ensure logs include the exception message to aid debugging.

coderabbitai · 2026-05-12T00:21:59Z

+        if self.rank == 0 and use_guardrails:
+            for p in prompt:
+                is_safe, msg = self.text_guardrail(p)
+                if not is_safe:
+                    logger.warning(f"Text guardrail blocked prompt: {msg}")
+                    timer.mark_end()
+                    return timer.fill(PipelineOutput())


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Synchronize guardrail block decisions across ranks before returning.

Only rank 0 evaluates guardrails and returns early. In distributed runs the other ranks keep going into denoise/decode collectives, so a blocked prompt or blocked video can deadlock the job.

Suggested fix

+ blocked = torch.zeros(1, device=self.device, dtype=torch.int32) if self.rank == 0 and use_guardrails: for p in prompt: is_safe, msg = self.text_guardrail(p) if not is_safe: logger.warning(f"Text guardrail blocked prompt: {msg}") - timer.mark_end() - return timer.fill(PipelineOutput()) + blocked[0] = 1 + break + + if torch.distributed.is_initialized(): + torch.distributed.broadcast(blocked, src=0) + + if blocked.item(): + timer.mark_end() + return timer.fill(PipelineOutput())

Apply the same pattern to the video-guardrail branch before any rank returns.

Also applies to: 575-580

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/_torch/visual_gen/models/cosmos3/pipeline_cosmos3.py` around lines 439 - 445, Rank 0 currently checks text_guardrail (and similarly the video-guardrail later) and returns early, but other ranks continue causing deadlocks; change this so rank 0 computes a single boolean "blocked" (e.g., iterate prompts and set blocked=True if any text_guardrail returns false) and then synchronize that decision across ranks using a torch.distributed collective (broadcast or all_reduce) before any return; update the block that uses self.rank, use_guardrails, text_guardrail, logger.warning, timer.mark_end, timer.fill, PipelineOutput to have rank 0 set the blocked flag and an optional message, broadcast them to all ranks, and have every rank perform timer.mark_end() and return timer.fill(PipelineOutput()) when blocked; apply the same pattern to the video-guardrail branch around the code at the later block (lines ~575-580) so all ranks make the same early-exit decision.

coderabbitai · 2026-05-12T00:21:59Z

+import math
+from typing import Tuple
+
+import torch
+import torch.distributed as dist
+import torch.nn as nn
+import torch.nn.functional as F
+from diffusers.models.embeddings import TimestepEmbedding
+
+from tensorrt_llm._torch.attention_backend.interface import PredefinedAttentionMask
+from tensorrt_llm._torch.modules.embedding import Embedding
+from tensorrt_llm._torch.modules.gated_mlp import GatedMLP
+from tensorrt_llm._torch.modules.linear import Linear
+from tensorrt_llm._torch.visual_gen.config import DiffusionModelConfig
+from tensorrt_llm._torch.visual_gen.modules.attention import Attention, QKVMode
+from tensorrt_llm._torch.visual_gen.quantization.loader import DynamicLinearWeightLoader
+from tensorrt_llm.logger import logger
+from tensorrt_llm.models.modeling_utils import QuantConfig
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add the NVIDIA copyright header to this new module.

This file is newly added, but it starts directly with imports. The repo requires the standard NVIDIA header on every new Python source file.

As per coding guidelines, **/*.{cpp,cc,cxx,h,hpp,py} must “Include NVIDIA copyright header on all new files; update year on modified files”.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/_torch/visual_gen/models/cosmos3/transformer_cosmos3.py` around lines 1 - 19, This new module transformer_cosmos3.py is missing the required NVIDIA copyright header at the top of the file; add the standard NVIDIA license/header block as the very first lines (before the imports such as import math, import torch, etc.), using the canonical header used across the repo and updating the year if needed, so files like transformer_cosmos3.py (containing symbols like TimestepEmbedding, DiffusionModelConfig, Attention, GatedMLP, DynamicLinearWeightLoader) conform to the repository's copyright guidelines.

coderabbitai · 2026-05-12T00:21:59Z

+    if enable_fps_modulation:
+        tps = fps / temporal_compression_factor
+        base_tps = base_fps / temporal_compression_factor
+        frame_indices = torch.arange(grid_t, dtype=torch.float32)
+        t_index = (
+            (frame_indices / tps * base_tps + temporal_offset)
+            .view(-1, 1)
+            .expand(-1, grid_h * grid_w)
+            .flatten()
+        )
+    else:
+        t_index = torch.arange(grid_t, dtype=torch.long).view(-1, 1).expand(
+            -1, grid_h * grid_w
+        ).flatten() + int(temporal_offset)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard FPS modulation when fps is unavailable.

_compute_rope_freqs() deliberately passes effective_fps=None when T <= 1, but this branch still does fps / temporal_compression_factor. That makes single-frame runs crash as soon as enable_fps_modulation is enabled.

Suggested fix

- if enable_fps_modulation: + if enable_fps_modulation and fps is not None: tps = fps / temporal_compression_factor base_tps = base_fps / temporal_compression_factor frame_indices = torch.arange(grid_t, dtype=torch.float32) t_index = ( (frame_indices / tps * base_tps + temporal_offset) .view(-1, 1) .expand(-1, grid_h * grid_w) .flatten() ) else: t_index = torch.arange(grid_t, dtype=torch.long).view(-1, 1).expand( -1, grid_h * grid_w ).flatten() + int(temporal_offset)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/_torch/visual_gen/models/cosmos3/transformer_cosmos3.py` around lines 85 - 98, The fps-division branch uses fps even when effective_fps can be None (T<=1); update the enable_fps_modulation block (symbols: enable_fps_modulation, fps, temporal_compression_factor, base_fps, frame_indices, t_index, temporal_offset, grid_t, grid_h, grid_w) to first guard on fps being not None — if fps is None, fall back to the non-modulation logic that builds integer frame indices (like the else branch) or compute using base_fps only; ensure you avoid doing fps / temporal_compression_factor when fps is None and preserve the same tensor shapes/expansions for t_index.

coderabbitai · 2026-05-12T00:21:59Z

+        if (
+            use_ulysses
+            and self.num_attention_heads % ulysses_size != 0
+            and self.num_kv_heads % ulysses_size != 0
+        ):
+            raise ValueError(
+                f"num_attention_heads ({self.num_attention_heads}) and "
+                f"num_kv_heads ({self.num_kv_heads}) must be divisible by "
+                f"ulysses_size ({ulysses_size})"
+            )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use or in the Ulysses divisibility check.

Right now the validation only fails when both num_attention_heads and num_kv_heads are non-divisible. A config where just one of them is non-divisible still passes here and then breaks later during head sharding.

Suggested fix

if ( use_ulysses - and self.num_attention_heads % ulysses_size != 0 - and self.num_kv_heads % ulysses_size != 0 + and ( + self.num_attention_heads % ulysses_size != 0 + or self.num_kv_heads % ulysses_size != 0 + ) ): raise ValueError( f"num_attention_heads ({self.num_attention_heads}) and " f"num_kv_heads ({self.num_kv_heads}) must be divisible by " f"ulysses_size ({ulysses_size})"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/_torch/visual_gen/models/cosmos3/transformer_cosmos3.py` around lines 662 - 671, The Ulysses divisibility check currently uses a logical AND so it only raises when both self.num_attention_heads and self.num_kv_heads are non-divisible by ulysses_size; change the condition in the block that references use_ulysses, self.num_attention_heads, self.num_kv_heads, and ulysses_size to use OR so the validation raises if either head count is not divisible by ulysses_size (keep the ValueError message but ensure it triggers when one or the other fails).

coderabbitai · 2026-05-12T00:21:59Z

+        if self.use_seq_parallel:
+            S_gen = hidden_gen.shape[1]
+            pad = (self.seq_parallel_size - S_gen % self.seq_parallel_size) % self.seq_parallel_size
+            if pad > 0:
+                # This will cause minor noise in softmax due to padding.
+                hidden_gen = F.pad(hidden_gen, (0, 0, 0, pad))
+                cos, sin = self.cached_freqs_gen
+                self.cached_freqs_gen = (
+                    F.pad(cos, (0, 0, 0, pad)),
+                    F.pad(sin, (0, 0, 0, pad)),
+                )
+            S_shard = S_gen // self.seq_parallel_size
+            hidden_gen = hidden_gen[
+                :, self.seq_parallel_rank * S_shard : (self.seq_parallel_rank + 1) * S_shard
+            ]
+            # Shard freqs_gen to match
+            cos, sin = self.cached_freqs_gen
+            freqs_gen = (
+                cos[:, self.seq_parallel_rank * S_shard : (self.seq_parallel_rank + 1) * S_shard],
+                sin[:, self.seq_parallel_rank * S_shard : (self.seq_parallel_rank + 1) * S_shard],
+            )


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Non-divisible GEN lengths are sharded incorrectly after padding.

This block has two coupled bugs when S_gen % seq_parallel_size != 0: S_shard is computed from the pre-pad length, so tail tokens are dropped, and F.pad(cos, (0, 0, 0, pad)) pads the singleton head axis on [B, S, 1, D] instead of the sequence axis. Seq-parallel inference will mis-shard or shape-mismatch on common video sizes.

Suggested fix

if self.use_seq_parallel: S_gen = hidden_gen.shape[1] pad = (self.seq_parallel_size - S_gen % self.seq_parallel_size) % self.seq_parallel_size if pad > 0: # This will cause minor noise in softmax due to padding. hidden_gen = F.pad(hidden_gen, (0, 0, 0, pad)) cos, sin = self.cached_freqs_gen self.cached_freqs_gen = ( - F.pad(cos, (0, 0, 0, pad)), - F.pad(sin, (0, 0, 0, pad)), + F.pad(cos, (0, 0, 0, 0, 0, pad)), + F.pad(sin, (0, 0, 0, 0, 0, pad)), ) - S_shard = S_gen // self.seq_parallel_size + padded_s_gen = S_gen + pad + S_shard = padded_s_gen // self.seq_parallel_size hidden_gen = hidden_gen[ :, self.seq_parallel_rank * S_shard : (self.seq_parallel_rank + 1) * S_shard ]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/_torch/visual_gen/models/cosmos3/transformer_cosmos3.py` around lines 928 - 948, The current seq-parallel padding logic miscomputes shards and pads the wrong axis: when S_gen % seq_parallel_size != 0 you must compute pad and apply it to hidden_gen and to the sequence axis of cached_freqs_gen, then compute S_shard from the padded sequence length (not the original S_gen) and slice both hidden_gen and the padded cos/sin using seq_parallel_rank * S_shard : (seq_parallel_rank+1) * S_shard; update cached_freqs_gen to the padded versions and set freqs_gen from those padded, sliced cos/sin so no tail tokens are dropped and shapes match. Ensure you reference hidden_gen, self.cached_freqs_gen, S_gen, pad, S_shard, self.seq_parallel_size, self.seq_parallel_rank, and freqs_gen when making the changes.

NVShreyas added 6 commits May 11, 2026 10:38

initial commit

0447289

Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>

add defaults, templates, fix image resizing

0892e25

Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>

enable ulysses for cross attention

a88c785

Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>

make it work with latest code changes

8d8f12c

Signed-off-by: Shreyas Mista <shreyasm@nvidia.com>

guardrail fixes

182cf99

Signed-off-by: Shreyas Mista <shreyasm@nvidia.com>

add guardrail-checkpoint-dir argument

049d813

Signed-off-by: Shreyas Mista <shreyasm@nvidia.com>

NVShreyas requested review from a team as code owners May 12, 2026 00:15

github-actions Bot assigned NVShreyas May 12, 2026

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

		from .pipeline_cosmos3 import Cosmos3OmniMoTPipeline

		__all__ = ["Cosmos3OmniMoTPipeline"]

Conversation

NVShreyas commented May 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

PR Checklist

GitHub Bot Help

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 12, 2026

Walkthrough

Changes

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

NVShreyas commented May 12, 2026 •

edited by coderabbitai Bot

Loading