Add VSA by kaix-nv · Pull Request #1053 · NVIDIA/Model-Optimizer

kaix-nv · 2026-03-17T06:25:43Z

What does this PR do?

Type of change: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅ / ❌ / N/A
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
Did you write any new necessary tests?: ✅ / ❌ / N/A
Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

New Features
- Added Video Sparse Attention (VSA), a new sparsity method for attention operations that supports 3D video blocks with configurable block sizes, top-K block selection, and gating mechanisms for balanced compression and sparse attention branches.
Tests
- Added comprehensive unit test suite for VSA covering configuration validation, tile-based utilities, metadata computation, and ModelOpt integration workflows.

Signed-off-by: Kai Xu <kaix@nvidia.com>

copy-pr-bot · 2026-03-17T06:25:47Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-03-17T06:25:50Z

📝 Walkthrough

Walkthrough

This PR introduces Video Sparse Attention (VSA), a new sparse attention method for video processing. It adds configuration classes (VSAAttributeConfig, VSAConfig), a VSA implementation with compression and sparse branches operating on 3D video blocks, tile-based utility functions, method registration, dynamic configuration selection, and comprehensive unit tests.

Changes

Cohort / File(s)	Summary
Configuration Classes `modelopt/torch/sparsity/attention_sparsity/config.py`	Introduces VSAAttributeConfig and VSAConfig classes with fields for VSA method, block size, top_k_ratio, video_shape, and calibration settings. Adds VSA_DEFAULT preset and exports to public API.
VSA Implementation `modelopt/torch/sparsity/attention_sparsity/methods/vsa.py`, `modelopt/torch/sparsity/attention_sparsity/methods/vsa_utils.py`	Implements VSA with two-branch architecture (compression and sparse branches), forward_attention method, metadata computation and caching, tiling/untiling utilities, and Triton kernel integration. Provides tile partition indices, reverse indices, variable block sizes, and non-padding index utilities.
Method Registration & Integration `modelopt/torch/sparsity/attention_sparsity/methods/__init__.py`, `modelopt/torch/sparsity/attention_sparsity/methods/registry.py`, `modelopt/torch/sparsity/attention_sparsity/sparse_attention.py`	Registers vsa submodule, adds set_calibration_mode() public method to SparseAttentionMethod base class, and implements dynamic config class selection based on sparse attention method in sparse_attention module.
Test Coverage `tests/unit/torch/sparsity/attention_sparsity/test_vsa.py`	Comprehensive unit tests covering tile utilities, VSA initialization, metadata computation, configuration validation, integration with sparsify(), enable/disable toggling, and state save/restore.
Cleanup `tests/examples/llm_eval/test_llm_eval.py`	Removes FP8 LLM evaluation test file.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Forward Call
    participant VSA as VSA.forward_attention()
    participant Meta as Metadata Computation
    participant Tile as Tiling Utilities
    participant Kernel as Triton Kernel<br/>(video_sparse_attn)
    participant Untile as Untiling Utilities
    
    Client->>VSA: query, key, value, gate_compress, video_shape
    VSA->>Meta: Compute/retrieve metadata (tiles, block sizes)
    Meta-->>VSA: tile_indices, variable_block_sizes, non_pad_indices
    VSA->>Tile: Tile Q, K, V, gate_compress
    Tile-->>VSA: tiled tensors (padded layout)
    VSA->>Kernel: Invoke video_sparse_attn(Q, K, V, block_size_3d, top_k_ratio)
    Kernel-->>VSA: attended output (tiled, padded)
    VSA->>Untile: Untile output back to original sequence
    Untile-->>VSA: output (original sequence order)
    VSA-->>Client: final_output, stats dict (sparsity, block_stats)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 70.91% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Add VSA' is too vague and generic. While it refers to a real component (VSA = Video Sparse Attention), it lacks specificity about what VSA entails and doesn't convey the substantial scope of changes across multiple modules, config classes, and utilities.	Consider a more descriptive title such as 'Implement Video Sparse Attention (VSA) method for attention sparsity' to better communicate the main change and its significance.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Security Anti-Patterns	✅ Passed	PR introduces VSA sparse attention implementation without security anti-patterns: no unsafe torch.load/numpy.load, hardcoded trust_remote_code, eval/exec, nosec comments, or non-permissive dependencies.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch kaix/vsa

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Kai Xu <kaix@nvidia.com>

codecov · 2026-03-17T21:45:34Z

Codecov Report

❌ Patch coverage is 86.11111% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.41%. Comparing base (cb1ff32) to head (8c76a00).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
...t/torch/sparsity/attention_sparsity/methods/vsa.py	75.86%	21 Missing ⚠️
...delopt/torch/sparsity/attention_sparsity/config.py	94.73%	2 Missing ⚠️
...ch/sparsity/attention_sparsity/sparse_attention.py	81.81%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1053      +/-   ##
==========================================
+ Coverage   70.18%   70.41%   +0.22%     
==========================================
  Files         223      229       +6     
  Lines       25712    26039     +327     
==========================================
+ Hits        18047    18336     +289     
- Misses       7665     7703      +38

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Kai Xu <kaix@nvidia.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/unit/torch/sparsity/attention_sparsity/test_vsa.py (1)

269-299: Consider adding edge case test for top_k_ratio=1.0.

The tests cover invalid values (0.0, 1.5) but don't explicitly test that top_k_ratio=1.0 (keep all blocks) is valid. This boundary is allowed by the validator (0.0 < v <= 1.0).

💡 Suggested addition

     def test_video_shape_none_allowed(self):
         cfg = VSAAttributeConfig(video_shape=None)
         assert cfg.video_shape is None
 
+    def test_top_k_ratio_boundary_valid(self):
+        """top_k_ratio=1.0 (keep all blocks) should be valid."""
+        cfg = VSAAttributeConfig(top_k_ratio=1.0)
+        assert cfg.top_k_ratio == 1.0
+
     def test_vsa_config_defaults(self):

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit/torch/sparsity/attention_sparsity/test_vsa.py` around lines 269 -
299, Add a test ensuring top_k_ratio == 1.0 is accepted: inside
TestVSAAttributeConfig add a new test (e.g., test_top_k_ratio_one_valid) that
constructs VSAAttributeConfig(top_k_ratio=1.0) and asserts cfg.top_k_ratio ==
1.0 (and/or no ValidationError is raised); reference VSAAttributeConfig and the
TestVSAAttributeConfig class so the new test sits with the existing tests for
top_k_ratio edge cases.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/sparsity/attention_sparsity/sparse_attention.py`:
- Around line 73-92: The code can lose config data when attribute_cfg is a
config object of the wrong class because config_dict is only populated for dict
inputs; fix by populating config_dict from the object before re-instantiation:
if attribute_cfg is not a dict but is an object (e.g., a Pydantic model),
extract its fields into config_dict (use attribute_cfg.dict() if available, else
attribute_cfg.__dict__ or vars(attribute_cfg)) and then instantiate
config_class(**config_dict); update the logic around attribute_cfg, config_dict,
config_class, VSAAttributeConfig and SparseAttentionAttributeConfig so the
original object's values are preserved when converting between classes.

---

Nitpick comments:
In `@tests/unit/torch/sparsity/attention_sparsity/test_vsa.py`:
- Around line 269-299: Add a test ensuring top_k_ratio == 1.0 is accepted:
inside TestVSAAttributeConfig add a new test (e.g., test_top_k_ratio_one_valid)
that constructs VSAAttributeConfig(top_k_ratio=1.0) and asserts cfg.top_k_ratio
== 1.0 (and/or no ValidationError is raised); reference VSAAttributeConfig and
the TestVSAAttributeConfig class so the new test sits with the existing tests
for top_k_ratio edge cases.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 855aa644-273c-466e-b11c-4b3b2e9d9e99

📥 Commits

Reviewing files that changed from the base of the PR and between 7c33d85 and 8c76a00.

📒 Files selected for processing (8)

modelopt/torch/sparsity/attention_sparsity/config.py
modelopt/torch/sparsity/attention_sparsity/methods/__init__.py
modelopt/torch/sparsity/attention_sparsity/methods/registry.py
modelopt/torch/sparsity/attention_sparsity/methods/vsa.py
modelopt/torch/sparsity/attention_sparsity/methods/vsa_utils.py
modelopt/torch/sparsity/attention_sparsity/sparse_attention.py
tests/examples/llm_eval/test_llm_eval.py
tests/unit/torch/sparsity/attention_sparsity/test_vsa.py

💤 Files with no reviewable changes (1)

tests/examples/llm_eval/test_llm_eval.py

coderabbitai · 2026-03-18T00:47:07Z

modelopt/torch/sparsity/attention_sparsity/sparse_attention.py

+        from .config import VSAAttributeConfig
+
+        # Determine which config class to use based on method
+        config_dict = attribute_cfg or {}
+        if isinstance(attribute_cfg, dict):
+            method = config_dict.get("method", "flash_skip_softmax")
+        elif attribute_cfg is not None and hasattr(attribute_cfg, "method"):
+            method = attribute_cfg.method
+        else:
+            method = "flash_skip_softmax"
+
+        # Select appropriate config class based on method
+        if method == "vsa":
+            config_class = VSAAttributeConfig
+        else:
+            config_class = SparseAttentionAttributeConfig
+
        # Ensure config is validated through Pydantic
-        if not isinstance(attribute_cfg, SparseAttentionAttributeConfig):
-            attribute_cfg = SparseAttentionAttributeConfig(**(attribute_cfg or {}))
+        if not isinstance(attribute_cfg, (SparseAttentionAttributeConfig, VSAAttributeConfig)):
+            attribute_cfg = config_class(**(config_dict))


⚠️ Potential issue | 🟡 Minor

Potential data loss when config object type doesn't match selected method.

If attribute_cfg is a config object (not a dict) with method="vsa", but is of type SparseAttentionAttributeConfig, the code selects VSAAttributeConfig as the config_class. Then the isinstance check at line 91 fails, and line 92 creates a new config from config_dict which is empty {} (since line 76-82 only populates it for dict inputs). This silently discards the original config values.

🐛 Proposed fix to preserve config values

# Determine which config class to use based on method config_dict = attribute_cfg or {} if isinstance(attribute_cfg, dict): method = config_dict.get("method", "flash_skip_softmax") elif attribute_cfg is not None and hasattr(attribute_cfg, "method"): method = attribute_cfg.method + config_dict = attribute_cfg.model_dump() else: method = "flash_skip_softmax" # Select appropriate config class based on method if method == "vsa": config_class = VSAAttributeConfig else: config_class = SparseAttentionAttributeConfig # Ensure config is validated through Pydantic if not isinstance(attribute_cfg, (SparseAttentionAttributeConfig, VSAAttributeConfig)): attribute_cfg = config_class(**(config_dict))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/sparsity/attention_sparsity/sparse_attention.py` around lines 73 - 92, The code can lose config data when attribute_cfg is a config object of the wrong class because config_dict is only populated for dict inputs; fix by populating config_dict from the object before re-instantiation: if attribute_cfg is not a dict but is an object (e.g., a Pydantic model), extract its fields into config_dict (use attribute_cfg.dict() if available, else attribute_cfg.__dict__ or vars(attribute_cfg)) and then instantiate config_class(**config_dict); update the logic around attribute_cfg, config_dict, config_class, VSAAttributeConfig and SparseAttentionAttributeConfig so the original object's values are preserved when converting between classes.

kaix-nv added 7 commits March 16, 2026 23:15

Add sparse attention integration to llm_eval

76847b6

Signed-off-by: Kai Xu <kaix@nvidia.com>

Add hf unified checkpoint export for sparse attention

9933bb6

Signed-off-by: Kai Xu <kaix@nvidia.com>

Address feedbacks

47a5ed0

Signed-off-by: Kai Xu <kaix@nvidia.com>

Move the data folder under example

f901399

Signed-off-by: Kai Xu <kaix@nvidia.com>

Implement Inverse Power calibration for sparse attention

612821a

Signed-off-by: Kai Xu <kaix@nvidia.com>

Add Video Sparse Attention (VSA) Support for Video Diffusion Models

be700b5

Signed-off-by: Kai Xu <kaix@nvidia.com>

Add unit test

2825227

Signed-off-by: Kai Xu <kaix@nvidia.com>

Fix VSA config

a2b4fde

Signed-off-by: Kai Xu <kaix@nvidia.com>

kaix-nv force-pushed the kaix/vsa branch from 41b1d36 to a2b4fde Compare March 17, 2026 06:44

Remove ltx-2 integration

021353f

Signed-off-by: Kai Xu <kaix@nvidia.com>

kaix-nv force-pushed the kaix/vsa branch from 3214dc7 to 679e9f9 Compare March 17, 2026 23:42

Add unit test for vsa

8c76a00

Signed-off-by: Kai Xu <kaix@nvidia.com>

kaix-nv force-pushed the kaix/vsa branch from 679e9f9 to 8c76a00 Compare March 17, 2026 23:54

kaix-nv marked this pull request as ready for review March 18, 2026 00:35

kaix-nv requested a review from a team as a code owner March 18, 2026 00:35

kaix-nv requested a review from kevalmorabia97 March 18, 2026 00:35

coderabbitai bot reviewed Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add VSA#1053

Add VSA#1053
kaix-nv wants to merge 10 commits intomainfrom
kaix/vsa

kaix-nv commented Mar 17, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Mar 17, 2026

Uh oh!

coderabbitai bot commented Mar 17, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

codecov bot commented Mar 17, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kaix-nv commented Mar 17, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Mar 17, 2026

Uh oh!

coderabbitai bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

codecov bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kaix-nv commented Mar 17, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 17, 2026 •

edited

Loading

codecov bot commented Mar 17, 2026 •

edited

Loading