Skip to content

Add VSA#1053

Open
kaix-nv wants to merge 10 commits intomainfrom
kaix/vsa
Open

Add VSA#1053
kaix-nv wants to merge 10 commits intomainfrom
kaix/vsa

Conversation

@kaix-nv
Copy link
Contributor

@kaix-nv kaix-nv commented Mar 17, 2026

What does this PR do?

Type of change: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅ / ❌ / N/A
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
  • Did you write any new necessary tests?: ✅ / ❌ / N/A
  • Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

  • New Features

    • Added Video Sparse Attention (VSA), a new sparsity method for attention operations that supports 3D video blocks with configurable block sizes, top-K block selection, and gating mechanisms for balanced compression and sparse attention branches.
  • Tests

    • Added comprehensive unit test suite for VSA covering configuration validation, tile-based utilities, metadata computation, and ModelOpt integration workflows.

kaix-nv added 7 commits March 16, 2026 23:15
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 17, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 17, 2026

📝 Walkthrough

Walkthrough

This PR introduces Video Sparse Attention (VSA), a new sparse attention method for video processing. It adds configuration classes (VSAAttributeConfig, VSAConfig), a VSA implementation with compression and sparse branches operating on 3D video blocks, tile-based utility functions, method registration, dynamic configuration selection, and comprehensive unit tests.

Changes

Cohort / File(s) Summary
Configuration Classes
modelopt/torch/sparsity/attention_sparsity/config.py
Introduces VSAAttributeConfig and VSAConfig classes with fields for VSA method, block size, top_k_ratio, video_shape, and calibration settings. Adds VSA_DEFAULT preset and exports to public API.
VSA Implementation
modelopt/torch/sparsity/attention_sparsity/methods/vsa.py, modelopt/torch/sparsity/attention_sparsity/methods/vsa_utils.py
Implements VSA with two-branch architecture (compression and sparse branches), forward_attention method, metadata computation and caching, tiling/untiling utilities, and Triton kernel integration. Provides tile partition indices, reverse indices, variable block sizes, and non-padding index utilities.
Method Registration & Integration
modelopt/torch/sparsity/attention_sparsity/methods/__init__.py, modelopt/torch/sparsity/attention_sparsity/methods/registry.py, modelopt/torch/sparsity/attention_sparsity/sparse_attention.py
Registers vsa submodule, adds set_calibration_mode() public method to SparseAttentionMethod base class, and implements dynamic config class selection based on sparse attention method in sparse_attention module.
Test Coverage
tests/unit/torch/sparsity/attention_sparsity/test_vsa.py
Comprehensive unit tests covering tile utilities, VSA initialization, metadata computation, configuration validation, integration with sparsify(), enable/disable toggling, and state save/restore.
Cleanup
tests/examples/llm_eval/test_llm_eval.py
Removes FP8 LLM evaluation test file.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Forward Call
    participant VSA as VSA.forward_attention()
    participant Meta as Metadata Computation
    participant Tile as Tiling Utilities
    participant Kernel as Triton Kernel<br/>(video_sparse_attn)
    participant Untile as Untiling Utilities
    
    Client->>VSA: query, key, value, gate_compress, video_shape
    VSA->>Meta: Compute/retrieve metadata (tiles, block sizes)
    Meta-->>VSA: tile_indices, variable_block_sizes, non_pad_indices
    VSA->>Tile: Tile Q, K, V, gate_compress
    Tile-->>VSA: tiled tensors (padded layout)
    VSA->>Kernel: Invoke video_sparse_attn(Q, K, V, block_size_3d, top_k_ratio)
    Kernel-->>VSA: attended output (tiled, padded)
    VSA->>Untile: Untile output back to original sequence
    Untile-->>VSA: output (original sequence order)
    VSA-->>Client: final_output, stats dict (sparsity, block_stats)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 70.91% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Add VSA' is too vague and generic. While it refers to a real component (VSA = Video Sparse Attention), it lacks specificity about what VSA entails and doesn't convey the substantial scope of changes across multiple modules, config classes, and utilities. Consider a more descriptive title such as 'Implement Video Sparse Attention (VSA) method for attention sparsity' to better communicate the main change and its significance.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Security Anti-Patterns ✅ Passed PR introduces VSA sparse attention implementation without security anti-patterns: no unsafe torch.load/numpy.load, hardcoded trust_remote_code, eval/exec, nosec comments, or non-permissive dependencies.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch kaix/vsa
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
@codecov
Copy link

codecov bot commented Mar 17, 2026

Codecov Report

❌ Patch coverage is 86.11111% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.41%. Comparing base (cb1ff32) to head (8c76a00).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...t/torch/sparsity/attention_sparsity/methods/vsa.py 75.86% 21 Missing ⚠️
...delopt/torch/sparsity/attention_sparsity/config.py 94.73% 2 Missing ⚠️
...ch/sparsity/attention_sparsity/sparse_attention.py 81.81% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1053      +/-   ##
==========================================
+ Coverage   70.18%   70.41%   +0.22%     
==========================================
  Files         223      229       +6     
  Lines       25712    26039     +327     
==========================================
+ Hits        18047    18336     +289     
- Misses       7665     7703      +38     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Kai Xu <kaix@nvidia.com>
@kaix-nv kaix-nv marked this pull request as ready for review March 18, 2026 00:35
@kaix-nv kaix-nv requested a review from a team as a code owner March 18, 2026 00:35
@kaix-nv kaix-nv requested a review from kevalmorabia97 March 18, 2026 00:35
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/unit/torch/sparsity/attention_sparsity/test_vsa.py (1)

269-299: Consider adding edge case test for top_k_ratio=1.0.

The tests cover invalid values (0.0, 1.5) but don't explicitly test that top_k_ratio=1.0 (keep all blocks) is valid. This boundary is allowed by the validator (0.0 < v <= 1.0).

💡 Suggested addition
     def test_video_shape_none_allowed(self):
         cfg = VSAAttributeConfig(video_shape=None)
         assert cfg.video_shape is None
 
+    def test_top_k_ratio_boundary_valid(self):
+        """top_k_ratio=1.0 (keep all blocks) should be valid."""
+        cfg = VSAAttributeConfig(top_k_ratio=1.0)
+        assert cfg.top_k_ratio == 1.0
+
     def test_vsa_config_defaults(self):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/torch/sparsity/attention_sparsity/test_vsa.py` around lines 269 -
299, Add a test ensuring top_k_ratio == 1.0 is accepted: inside
TestVSAAttributeConfig add a new test (e.g., test_top_k_ratio_one_valid) that
constructs VSAAttributeConfig(top_k_ratio=1.0) and asserts cfg.top_k_ratio ==
1.0 (and/or no ValidationError is raised); reference VSAAttributeConfig and the
TestVSAAttributeConfig class so the new test sits with the existing tests for
top_k_ratio edge cases.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/sparsity/attention_sparsity/sparse_attention.py`:
- Around line 73-92: The code can lose config data when attribute_cfg is a
config object of the wrong class because config_dict is only populated for dict
inputs; fix by populating config_dict from the object before re-instantiation:
if attribute_cfg is not a dict but is an object (e.g., a Pydantic model),
extract its fields into config_dict (use attribute_cfg.dict() if available, else
attribute_cfg.__dict__ or vars(attribute_cfg)) and then instantiate
config_class(**config_dict); update the logic around attribute_cfg, config_dict,
config_class, VSAAttributeConfig and SparseAttentionAttributeConfig so the
original object's values are preserved when converting between classes.

---

Nitpick comments:
In `@tests/unit/torch/sparsity/attention_sparsity/test_vsa.py`:
- Around line 269-299: Add a test ensuring top_k_ratio == 1.0 is accepted:
inside TestVSAAttributeConfig add a new test (e.g., test_top_k_ratio_one_valid)
that constructs VSAAttributeConfig(top_k_ratio=1.0) and asserts cfg.top_k_ratio
== 1.0 (and/or no ValidationError is raised); reference VSAAttributeConfig and
the TestVSAAttributeConfig class so the new test sits with the existing tests
for top_k_ratio edge cases.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 855aa644-273c-466e-b11c-4b3b2e9d9e99

📥 Commits

Reviewing files that changed from the base of the PR and between 7c33d85 and 8c76a00.

📒 Files selected for processing (8)
  • modelopt/torch/sparsity/attention_sparsity/config.py
  • modelopt/torch/sparsity/attention_sparsity/methods/__init__.py
  • modelopt/torch/sparsity/attention_sparsity/methods/registry.py
  • modelopt/torch/sparsity/attention_sparsity/methods/vsa.py
  • modelopt/torch/sparsity/attention_sparsity/methods/vsa_utils.py
  • modelopt/torch/sparsity/attention_sparsity/sparse_attention.py
  • tests/examples/llm_eval/test_llm_eval.py
  • tests/unit/torch/sparsity/attention_sparsity/test_vsa.py
💤 Files with no reviewable changes (1)
  • tests/examples/llm_eval/test_llm_eval.py

Comment on lines +73 to +92
from .config import VSAAttributeConfig

# Determine which config class to use based on method
config_dict = attribute_cfg or {}
if isinstance(attribute_cfg, dict):
method = config_dict.get("method", "flash_skip_softmax")
elif attribute_cfg is not None and hasattr(attribute_cfg, "method"):
method = attribute_cfg.method
else:
method = "flash_skip_softmax"

# Select appropriate config class based on method
if method == "vsa":
config_class = VSAAttributeConfig
else:
config_class = SparseAttentionAttributeConfig

# Ensure config is validated through Pydantic
if not isinstance(attribute_cfg, SparseAttentionAttributeConfig):
attribute_cfg = SparseAttentionAttributeConfig(**(attribute_cfg or {}))
if not isinstance(attribute_cfg, (SparseAttentionAttributeConfig, VSAAttributeConfig)):
attribute_cfg = config_class(**(config_dict))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential data loss when config object type doesn't match selected method.

If attribute_cfg is a config object (not a dict) with method="vsa", but is of type SparseAttentionAttributeConfig, the code selects VSAAttributeConfig as the config_class. Then the isinstance check at line 91 fails, and line 92 creates a new config from config_dict which is empty {} (since line 76-82 only populates it for dict inputs). This silently discards the original config values.

🐛 Proposed fix to preserve config values
         # Determine which config class to use based on method
         config_dict = attribute_cfg or {}
         if isinstance(attribute_cfg, dict):
             method = config_dict.get("method", "flash_skip_softmax")
         elif attribute_cfg is not None and hasattr(attribute_cfg, "method"):
             method = attribute_cfg.method
+            config_dict = attribute_cfg.model_dump()
         else:
             method = "flash_skip_softmax"
 
         # Select appropriate config class based on method
         if method == "vsa":
             config_class = VSAAttributeConfig
         else:
             config_class = SparseAttentionAttributeConfig
 
         # Ensure config is validated through Pydantic
         if not isinstance(attribute_cfg, (SparseAttentionAttributeConfig, VSAAttributeConfig)):
             attribute_cfg = config_class(**(config_dict))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/sparsity/attention_sparsity/sparse_attention.py` around lines
73 - 92, The code can lose config data when attribute_cfg is a config object of
the wrong class because config_dict is only populated for dict inputs; fix by
populating config_dict from the object before re-instantiation: if attribute_cfg
is not a dict but is an object (e.g., a Pydantic model), extract its fields into
config_dict (use attribute_cfg.dict() if available, else attribute_cfg.__dict__
or vars(attribute_cfg)) and then instantiate config_class(**config_dict); update
the logic around attribute_cfg, config_dict, config_class, VSAAttributeConfig
and SparseAttentionAttributeConfig so the original object's values are preserved
when converting between classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant