Fix pinBlocks/unpinBlocksById to correctly handle multiple window sizes by thorjohnsen · Pull Request #12325 · NVIDIA/TensorRT-LLM

thorjohnsen · 2026-03-18T17:02:29Z

Summary

pinBlocks now returns unordered_map<SizeType32, vector<KVCacheBlock::IdType>> mapping each window size to the IDs of blocks pinned in that window
unpinBlocksById updated to accept the same map type, routing each block to the correct WindowBlockManager for unpinning
Fixes a bug where BlockManager::unpinBlocksById only unpinned blocks from the first window manager, leaving blocks in other windows permanently pinned after sequence removal

Test plan

Updated PinAndUnpinBlocksById unit test to verify returned map contains expected window size and block IDs
Verify blocks are not freed after removeSequence when pinned
Verify all blocks are freed after unpinBlocksById
Python bindings in cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp require no code changes; nanobind auto-converts unordered_map to dict[int, list[int]] (headers already include nanobind/stl/unordered_map.h and nanobind/stl/vector.h)

🤖 Generated with Claude Code

Summary by CodeRabbit

Refactor
- Enhanced KV cache block management with improved tracking of pinned blocks per window.
- Updated cache block pinning and unpinning operations to provide more granular block allocation information.
Tests
- Updated unit tests to validate new cache block tracking behavior.

pinBlocks now returns an unordered_map<windowSize, blockIds> so callers know exactly which blocks were pinned per window. unpinBlocksById accepts the same map type and correctly routes each block to its window manager, fixing a bug where only the first window manager's blocks were unpinned. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai · 2026-03-18T17:06:41Z

📝 Walkthrough

Walkthrough

The changes modify KV cache block management APIs to track pinned blocks per attention window. The pinBlocks() method now returns an unordered_map<SizeType32, vector<KVCacheBlock::IdType>> instead of void, and unpinBlocksById() accepts the same structure instead of a flat block ID vector.

Changes

Cohort / File(s)	Summary
KV Cache Manager Interface `cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h`	Updated method signatures for `WindowBlockManager`, `BlockManager`, `BaseKVCacheManager`, and `KVCacheManager`: `pinBlocks()` now returns per-window block ID mappings; `unpinBlocksById()` updated to accept `unordered_map<SizeType32, vector<KVCacheBlock::IdType>>` instead of flat vector.
KV Cache Manager Implementation `cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp`	Implementation updated to collect and propagate per-window pinned block IDs through `WindowBlockManager::pinBlocks()`, `BlockManager::pinBlocks()`, and `KVCacheManager::pinBlocks()`. `unpinBlocksById()` methods refactored to iterate over per-window mappings with validation checks for window size presence.
Unit Tests `cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp`	Test assertions updated to validate the new `pinBlocks()` return structure; verifies returned map contains expected window size keys and pinned block IDs. Updated `unpinBlocksById()` call sites to pass the per-window mapping from `pinBlocks()` result.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: fixing pinBlocks/unpinBlocksById to correctly handle multiple window sizes, which is the core focus of all modifications across the three files.
Description check	✅ Passed	The description provides a clear summary of changes, explains the bug fix, and includes a test plan with checkboxes. However, it does not follow the required template format with explicit sections for Description, Test Coverage, and PR Checklist completion.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can customize the high-level summary generated by CodeRabbit.

Configure the reviews.high_level_summary_instructions setting to provide custom instructions for generating the high-level summary.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h`:
- Line 1564: Re-run LLVM clang-format (configured for 120-column width) on the
header and commit the resulting formatting changes; ensure the declaration of
virtual std::unordered_map<SizeType32, std::vector<KVCacheBlock::IdType>>
pinBlocks(LlmRequest::RequestIdType requestId) and the nearby declarations (the
symbols around the pinBlocks method and the other affected declarations near the
KVCacheManager interface) are reformatted to match the project's clang-format
rules and line-length limits, then stage and commit the updated header so CI no
longer reports formatting diffs.

In `@cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp`:
- Around line 2676-2683: The file fails clang-format checks; run LLVM
clang-format over the implementation (including the KVCacheManager::pinBlocks
and KVCacheManager::unpinBlocksById regions) to apply the project's formatting
rules, verify the formatting changes (e.g., spacing/brace alignment) are
correct, and commit the formatted file so the CI formatting hook passes.

In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp`:
- Around line 4225-4233: The test only checks the pinnedBlocks entry for
maxAttentionWindow so it misses the multi-window unpin regression; update the
test to configure at least two attention windows in the setup, then after
calling kvCacheManager.pinBlocks(requestId) iterate over every entry in the
returned pinnedBlocks map (not just pinnedBlocks.at(maxAttentionWindow)) and for
each window key verify each blockId is present in the corresponding result of
kvCacheManager.getCacheBlockIds(requestId, windowKey) (or the appropriate index)
so pin/unpin behavior is validated across all windows.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8e3eaf46-4cfa-42c6-8e16-40dd3e8cced9

📥 Commits

Reviewing files that changed from the base of the PR and between d37dd82 and b821a06.

📒 Files selected for processing (3)

cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h
cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp
cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp

coderabbitai · 2026-03-18T17:06:44Z

    /// @brief Pin blocks associated with a request to prevent eviction.
    /// @param requestId The ID of the request whose blocks should be pinned.
-    virtual void pinBlocks(LlmRequest::RequestIdType requestId) = 0;
+    virtual std::unordered_map<SizeType32, std::vector<KVCacheBlock::IdType>> pinBlocks(LlmRequest::RequestIdType requestId) = 0;


⚠️ Potential issue | 🟡 Minor

Re-run clang-format for updated declarations in this header.

CI reports formatting changes were auto-applied, and this file is listed as affected. Please format and commit the resulting diff before merge (notably around Lines 1564, 1738, and 2037-2039).

Suggested formatting shape (example)

- virtual std::unordered_map<SizeType32, std::vector<KVCacheBlock::IdType>> pinBlocks(LlmRequest::RequestIdType requestId) = 0; + virtual std::unordered_map<SizeType32, std::vector<KVCacheBlock::IdType>> pinBlocks( + LlmRequest::RequestIdType requestId) + = 0; - virtual void unpinBlocksById(std::unordered_map<SizeType32, std::vector<KVCacheBlock::IdType>> const& pinnedBlocks) = 0; + virtual void unpinBlocksById( + std::unordered_map<SizeType32, std::vector<KVCacheBlock::IdType>> const& pinnedBlocks) + = 0;

As per coding guidelines: "Use the LLVM clang-format tool for formatting C++ code changes prior to submitting the PR" and "Use a maximum of 120 characters per line in C++ code".
Also applies to: 1738-1738, 2037-2039

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h` at line 1564, Re-run LLVM clang-format (configured for 120-column width) on the header and commit the resulting formatting changes; ensure the declaration of virtual std::unordered_map<SizeType32, std::vector<KVCacheBlock::IdType>> pinBlocks(LlmRequest::RequestIdType requestId) and the nearby declarations (the symbols around the pinBlocks method and the other affected declarations near the KVCacheManager interface) are reformatted to match the project's clang-format rules and line-length limits, then stage and commit the updated header so CI no longer reports formatting diffs.

coderabbitai · 2026-03-18T17:06:45Z

+std::unordered_map<SizeType32, std::vector<KVCacheBlock::IdType>> KVCacheManager::pinBlocks(RequestIdType requestId)
 {
    auto& sequence = getSequence(requestId);
-    mBlockManager.pinBlocks(sequence);
+    return mBlockManager.pinBlocks(sequence);
 }

-void KVCacheManager::unpinBlocksById(std::vector<KVCacheBlock::IdType> const& blockIds)
+void KVCacheManager::unpinBlocksById(
+    std::unordered_map<SizeType32, std::vector<KVCacheBlock::IdType>> const& pinnedBlocks)


⚠️ Potential issue | 🟡 Minor

Format this implementation file before merge (CI is failing on clang-format).

Release checks indicate formatting hooks modified this file. Please run clang-format and commit the result.

As per coding guidelines: "Use the LLVM clang-format tool for formatting C++ code changes prior to submitting the PR".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp` around lines 2676 - 2683, The file fails clang-format checks; run LLVM clang-format over the implementation (including the KVCacheManager::pinBlocks and KVCacheManager::unpinBlocksById regions) to apply the project's formatting rules, verify the formatting changes (e.g., spacing/brace alignment) are correct, and commit the formatted file so the CI formatting hook passes.

coderabbitai · 2026-03-18T17:06:45Z

+    auto const pinnedBlocks = kvCacheManager.pinBlocks(requestId);
+    EXPECT_FALSE(pinnedBlocks.empty());
+    ASSERT_TRUE(pinnedBlocks.count(maxAttentionWindow));
+    auto const& pinnedIds = pinnedBlocks.at(maxAttentionWindow);
    auto const& allBlockIds = kvCacheManager.getCacheBlockIds(requestId, maxAttentionWindow)[0];
-    std::vector<SizeType32> pinnedBlockIds(allBlockIds.begin(), allBlockIds.end());
+    for (auto blockId : pinnedIds)
+    {
+        EXPECT_NE(std::find(allBlockIds.begin(), allBlockIds.end(), blockId), allBlockIds.end());
+    }


⚠️ Potential issue | 🟠 Major

This test still does not validate the multi-window unpin regression.

At Line 4227, the assertion only checks maxAttentionWindow, and the test setup uses a single window. That means the bug fixed by this PR (unpinning only the first window manager) would not be caught here. Please make this test configure at least two windows and validate pin/unpin across all returned map entries.

✅ Suggested test strengthening

- auto constexpr maxAttentionWindow = tokensPerBlock * blocksInPrimaryPool; + auto constexpr maxAttentionWindow = tokensPerBlock * blocksInPrimaryPool; + auto constexpr minAttentionWindow = maxAttentionWindow / 2; - BlocksPerWindow const blocksPerWindow{{maxAttentionWindow, {blocksInPrimaryPool, blocksInSecondaryPool}}}; + BlocksPerWindow const blocksPerWindow{ + {maxAttentionWindow, {blocksInPrimaryPool, blocksInSecondaryPool}}, + {minAttentionWindow, {blocksInPrimaryPool, blocksInSecondaryPool}}, + }; KVCacheManager kvCacheManager(numLayers, numKvHeads, sizePerHead, tokensPerBlock, blocksPerWindow, maxNumSequences, - beamWidth, std::vector<BlockManager::SizeType32>{maxAttentionWindow}, std::nullopt, nvinfer1::DataType::kHALF, + beamWidth, std::vector<BlockManager::SizeType32>{maxAttentionWindow, minAttentionWindow}, std::nullopt, + nvinfer1::DataType::kHALF, 0, stream, maxAttentionWindow, true, onboardBlocks); auto const pinnedBlocks = kvCacheManager.pinBlocks(requestId); EXPECT_FALSE(pinnedBlocks.empty()); - ASSERT_TRUE(pinnedBlocks.count(maxAttentionWindow)); - auto const& pinnedIds = pinnedBlocks.at(maxAttentionWindow); - auto const& allBlockIds = kvCacheManager.getCacheBlockIds(requestId, maxAttentionWindow)[0]; - for (auto blockId : pinnedIds) + ASSERT_THAT(pinnedBlocks, ::testing::Contains(::testing::Key(maxAttentionWindow))); + ASSERT_THAT(pinnedBlocks, ::testing::Contains(::testing::Key(minAttentionWindow))); + for (auto const& [windowSize, pinnedIds] : pinnedBlocks) { - EXPECT_NE(std::find(allBlockIds.begin(), allBlockIds.end(), blockId), allBlockIds.end()); + auto const& allBlockIds = kvCacheManager.getCacheBlockIds(requestId, windowSize)[0]; + for (auto blockId : pinnedIds) + { + EXPECT_NE(std::find(allBlockIds.begin(), allBlockIds.end(), blockId), allBlockIds.end()); + } }

Also applies to: 4238-4238

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp` around lines 4225 - 4233, The test only checks the pinnedBlocks entry for maxAttentionWindow so it misses the multi-window unpin regression; update the test to configure at least two attention windows in the setup, then after calling kvCacheManager.pinBlocks(requestId) iterate over every entry in the returned pinnedBlocks map (not just pinnedBlocks.at(maxAttentionWindow)) and for each window key verify each blockId is present in the corresponding result of kvCacheManager.getCacheBlockIds(requestId, windowKey) (or the appropriate index) so pin/unpin behavior is validated across all windows.

… API Update resource_manager.py and py_executor.py to match the updated C++ signatures where pinBlocks returns unordered_map<windowSize, blockIds> and unpinBlocksById accepts the same map type. - KVCacheManager.pin_blocks now returns dict[int, list[int]] instead of None - KVCacheManager.unpin_blocks_by_id parameter changed from int to dict[int, list[int]] - AsyncTransferManager stores the pinned_blocks map (not a bare block_id) and passes it back to unpin_blocks_by_id on transfer completion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

nvpohanh · 2026-03-19T02:43:59Z

@eopXD could you review this? thanks

thorjohnsen requested a review from a team as a code owner March 18, 2026 17:02

github-actions Bot assigned thorjohnsen Mar 18, 2026

coderabbitai Bot reviewed Mar 18, 2026

View reviewed changes

thorjohnsen requested a review from a team as a code owner March 18, 2026 17:22

thorjohnsen requested a review from byshiue March 18, 2026 17:22

Tabrizian approved these changes Mar 18, 2026

View reviewed changes

Merge branch 'main' into thorjohnsen/pinblocks_multiple_windowsizes

be05c30

thorjohnsen closed this Apr 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix pinBlocks/unpinBlocksById to correctly handle multiple window sizes#12325

Fix pinBlocks/unpinBlocksById to correctly handle multiple window sizes#12325
thorjohnsen wants to merge 3 commits intoNVIDIA:mainfrom
thorjohnsen:thorjohnsen/pinblocks_multiple_windowsizes

thorjohnsen commented Mar 18, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 18, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Mar 18, 2026

Uh oh!

coderabbitai Bot Mar 18, 2026

Uh oh!

coderabbitai Bot Mar 18, 2026

Uh oh!

nvpohanh commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

thorjohnsen commented Mar 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 18, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

nvpohanh commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

thorjohnsen commented Mar 18, 2026 •

edited by coderabbitai Bot

Loading