Skip to content

Fix pinBlocks/unpinBlocksById to correctly handle multiple window sizes#12325

Closed
thorjohnsen wants to merge 3 commits intoNVIDIA:mainfrom
thorjohnsen:thorjohnsen/pinblocks_multiple_windowsizes
Closed

Fix pinBlocks/unpinBlocksById to correctly handle multiple window sizes#12325
thorjohnsen wants to merge 3 commits intoNVIDIA:mainfrom
thorjohnsen:thorjohnsen/pinblocks_multiple_windowsizes

Conversation

@thorjohnsen
Copy link
Copy Markdown
Collaborator

@thorjohnsen thorjohnsen commented Mar 18, 2026

Summary

  • pinBlocks now returns unordered_map<SizeType32, vector<KVCacheBlock::IdType>> mapping each window size to the IDs of blocks pinned in that window
  • unpinBlocksById updated to accept the same map type, routing each block to the correct WindowBlockManager for unpinning
  • Fixes a bug where BlockManager::unpinBlocksById only unpinned blocks from the first window manager, leaving blocks in other windows permanently pinned after sequence removal

Test plan

  • Updated PinAndUnpinBlocksById unit test to verify returned map contains expected window size and block IDs
  • Verify blocks are not freed after removeSequence when pinned
  • Verify all blocks are freed after unpinBlocksById
  • Python bindings in cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp require no code changes; nanobind auto-converts unordered_map to dict[int, list[int]] (headers already include nanobind/stl/unordered_map.h and nanobind/stl/vector.h)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Refactor

    • Enhanced KV cache block management with improved tracking of pinned blocks per window.
    • Updated cache block pinning and unpinning operations to provide more granular block allocation information.
  • Tests

    • Updated unit tests to validate new cache block tracking behavior.

pinBlocks now returns an unordered_map<windowSize, blockIds> so callers
know exactly which blocks were pinned per window. unpinBlocksById accepts
the same map type and correctly routes each block to its window manager,
fixing a bug where only the first window manager's blocks were unpinned.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@thorjohnsen thorjohnsen requested a review from a team as a code owner March 18, 2026 17:02
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 18, 2026

📝 Walkthrough

Walkthrough

The changes modify KV cache block management APIs to track pinned blocks per attention window. The pinBlocks() method now returns an unordered_map<SizeType32, vector<KVCacheBlock::IdType>> instead of void, and unpinBlocksById() accepts the same structure instead of a flat block ID vector.

Changes

Cohort / File(s) Summary
KV Cache Manager Interface
cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h
Updated method signatures for WindowBlockManager, BlockManager, BaseKVCacheManager, and KVCacheManager: pinBlocks() now returns per-window block ID mappings; unpinBlocksById() updated to accept unordered_map<SizeType32, vector<KVCacheBlock::IdType>> instead of flat vector.
KV Cache Manager Implementation
cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp
Implementation updated to collect and propagate per-window pinned block IDs through WindowBlockManager::pinBlocks(), BlockManager::pinBlocks(), and KVCacheManager::pinBlocks(). unpinBlocksById() methods refactored to iterate over per-window mappings with validation checks for window size presence.
Unit Tests
cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp
Test assertions updated to validate the new pinBlocks() return structure; verifies returned map contains expected window size keys and pinned block IDs. Updated unpinBlocksById() call sites to pass the per-window mapping from pinBlocks() result.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: fixing pinBlocks/unpinBlocksById to correctly handle multiple window sizes, which is the core focus of all modifications across the three files.
Description check ✅ Passed The description provides a clear summary of changes, explains the bug fix, and includes a test plan with checkboxes. However, it does not follow the required template format with explicit sections for Description, Test Coverage, and PR Checklist completion.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can customize the high-level summary generated by CodeRabbit.

Configure the reviews.high_level_summary_instructions setting to provide custom instructions for generating the high-level summary.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h`:
- Line 1564: Re-run LLVM clang-format (configured for 120-column width) on the
header and commit the resulting formatting changes; ensure the declaration of
virtual std::unordered_map<SizeType32, std::vector<KVCacheBlock::IdType>>
pinBlocks(LlmRequest::RequestIdType requestId) and the nearby declarations (the
symbols around the pinBlocks method and the other affected declarations near the
KVCacheManager interface) are reformatted to match the project's clang-format
rules and line-length limits, then stage and commit the updated header so CI no
longer reports formatting diffs.

In `@cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp`:
- Around line 2676-2683: The file fails clang-format checks; run LLVM
clang-format over the implementation (including the KVCacheManager::pinBlocks
and KVCacheManager::unpinBlocksById regions) to apply the project's formatting
rules, verify the formatting changes (e.g., spacing/brace alignment) are
correct, and commit the formatted file so the CI formatting hook passes.

In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp`:
- Around line 4225-4233: The test only checks the pinnedBlocks entry for
maxAttentionWindow so it misses the multi-window unpin regression; update the
test to configure at least two attention windows in the setup, then after
calling kvCacheManager.pinBlocks(requestId) iterate over every entry in the
returned pinnedBlocks map (not just pinnedBlocks.at(maxAttentionWindow)) and for
each window key verify each blockId is present in the corresponding result of
kvCacheManager.getCacheBlockIds(requestId, windowKey) (or the appropriate index)
so pin/unpin behavior is validated across all windows.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8e3eaf46-4cfa-42c6-8e16-40dd3e8cced9

📥 Commits

Reviewing files that changed from the base of the PR and between d37dd82 and b821a06.

📒 Files selected for processing (3)
  • cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h
  • cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp
  • cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp

/// @brief Pin blocks associated with a request to prevent eviction.
/// @param requestId The ID of the request whose blocks should be pinned.
virtual void pinBlocks(LlmRequest::RequestIdType requestId) = 0;
virtual std::unordered_map<SizeType32, std::vector<KVCacheBlock::IdType>> pinBlocks(LlmRequest::RequestIdType requestId) = 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Re-run clang-format for updated declarations in this header.

CI reports formatting changes were auto-applied, and this file is listed as affected. Please format and commit the resulting diff before merge (notably around Lines 1564, 1738, and 2037-2039).

Suggested formatting shape (example)
-    virtual std::unordered_map<SizeType32, std::vector<KVCacheBlock::IdType>> pinBlocks(LlmRequest::RequestIdType requestId) = 0;
+    virtual std::unordered_map<SizeType32, std::vector<KVCacheBlock::IdType>> pinBlocks(
+        LlmRequest::RequestIdType requestId)
+        = 0;

-    virtual void unpinBlocksById(std::unordered_map<SizeType32, std::vector<KVCacheBlock::IdType>> const& pinnedBlocks) = 0;
+    virtual void unpinBlocksById(
+        std::unordered_map<SizeType32, std::vector<KVCacheBlock::IdType>> const& pinnedBlocks)
+        = 0;
As per coding guidelines: "Use the LLVM clang-format tool for formatting C++ code changes prior to submitting the PR" and "Use a maximum of 120 characters per line in C++ code".

Also applies to: 1738-1738, 2037-2039

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h` at line 1564, Re-run
LLVM clang-format (configured for 120-column width) on the header and commit the
resulting formatting changes; ensure the declaration of virtual
std::unordered_map<SizeType32, std::vector<KVCacheBlock::IdType>>
pinBlocks(LlmRequest::RequestIdType requestId) and the nearby declarations (the
symbols around the pinBlocks method and the other affected declarations near the
KVCacheManager interface) are reformatted to match the project's clang-format
rules and line-length limits, then stage and commit the updated header so CI no
longer reports formatting diffs.

Comment on lines +2676 to +2683
std::unordered_map<SizeType32, std::vector<KVCacheBlock::IdType>> KVCacheManager::pinBlocks(RequestIdType requestId)
{
auto& sequence = getSequence(requestId);
mBlockManager.pinBlocks(sequence);
return mBlockManager.pinBlocks(sequence);
}

void KVCacheManager::unpinBlocksById(std::vector<KVCacheBlock::IdType> const& blockIds)
void KVCacheManager::unpinBlocksById(
std::unordered_map<SizeType32, std::vector<KVCacheBlock::IdType>> const& pinnedBlocks)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Format this implementation file before merge (CI is failing on clang-format).

Release checks indicate formatting hooks modified this file. Please run clang-format and commit the result.

As per coding guidelines: "Use the LLVM clang-format tool for formatting C++ code changes prior to submitting the PR".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp` around lines 2676 - 2683,
The file fails clang-format checks; run LLVM clang-format over the
implementation (including the KVCacheManager::pinBlocks and
KVCacheManager::unpinBlocksById regions) to apply the project's formatting
rules, verify the formatting changes (e.g., spacing/brace alignment) are
correct, and commit the formatted file so the CI formatting hook passes.

Comment on lines +4225 to +4233
auto const pinnedBlocks = kvCacheManager.pinBlocks(requestId);
EXPECT_FALSE(pinnedBlocks.empty());
ASSERT_TRUE(pinnedBlocks.count(maxAttentionWindow));
auto const& pinnedIds = pinnedBlocks.at(maxAttentionWindow);
auto const& allBlockIds = kvCacheManager.getCacheBlockIds(requestId, maxAttentionWindow)[0];
std::vector<SizeType32> pinnedBlockIds(allBlockIds.begin(), allBlockIds.end());
for (auto blockId : pinnedIds)
{
EXPECT_NE(std::find(allBlockIds.begin(), allBlockIds.end(), blockId), allBlockIds.end());
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This test still does not validate the multi-window unpin regression.

At Line 4227, the assertion only checks maxAttentionWindow, and the test setup uses a single window. That means the bug fixed by this PR (unpinning only the first window manager) would not be caught here. Please make this test configure at least two windows and validate pin/unpin across all returned map entries.

✅ Suggested test strengthening
-    auto constexpr maxAttentionWindow = tokensPerBlock * blocksInPrimaryPool;
+    auto constexpr maxAttentionWindow = tokensPerBlock * blocksInPrimaryPool;
+    auto constexpr minAttentionWindow = maxAttentionWindow / 2;

-    BlocksPerWindow const blocksPerWindow{{maxAttentionWindow, {blocksInPrimaryPool, blocksInSecondaryPool}}};
+    BlocksPerWindow const blocksPerWindow{
+        {maxAttentionWindow, {blocksInPrimaryPool, blocksInSecondaryPool}},
+        {minAttentionWindow, {blocksInPrimaryPool, blocksInSecondaryPool}},
+    };

     KVCacheManager kvCacheManager(numLayers, numKvHeads, sizePerHead, tokensPerBlock, blocksPerWindow, maxNumSequences,
-        beamWidth, std::vector<BlockManager::SizeType32>{maxAttentionWindow}, std::nullopt, nvinfer1::DataType::kHALF,
+        beamWidth, std::vector<BlockManager::SizeType32>{maxAttentionWindow, minAttentionWindow}, std::nullopt,
+        nvinfer1::DataType::kHALF,
         0, stream, maxAttentionWindow, true, onboardBlocks);

     auto const pinnedBlocks = kvCacheManager.pinBlocks(requestId);
     EXPECT_FALSE(pinnedBlocks.empty());
-    ASSERT_TRUE(pinnedBlocks.count(maxAttentionWindow));
-    auto const& pinnedIds = pinnedBlocks.at(maxAttentionWindow);
-    auto const& allBlockIds = kvCacheManager.getCacheBlockIds(requestId, maxAttentionWindow)[0];
-    for (auto blockId : pinnedIds)
+    ASSERT_THAT(pinnedBlocks, ::testing::Contains(::testing::Key(maxAttentionWindow)));
+    ASSERT_THAT(pinnedBlocks, ::testing::Contains(::testing::Key(minAttentionWindow)));
+    for (auto const& [windowSize, pinnedIds] : pinnedBlocks)
     {
-        EXPECT_NE(std::find(allBlockIds.begin(), allBlockIds.end(), blockId), allBlockIds.end());
+        auto const& allBlockIds = kvCacheManager.getCacheBlockIds(requestId, windowSize)[0];
+        for (auto blockId : pinnedIds)
+        {
+            EXPECT_NE(std::find(allBlockIds.begin(), allBlockIds.end(), blockId), allBlockIds.end());
+        }
     }

Also applies to: 4238-4238

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp` around lines 4225
- 4233, The test only checks the pinnedBlocks entry for maxAttentionWindow so it
misses the multi-window unpin regression; update the test to configure at least
two attention windows in the setup, then after calling
kvCacheManager.pinBlocks(requestId) iterate over every entry in the returned
pinnedBlocks map (not just pinnedBlocks.at(maxAttentionWindow)) and for each
window key verify each blockId is present in the corresponding result of
kvCacheManager.getCacheBlockIds(requestId, windowKey) (or the appropriate index)
so pin/unpin behavior is validated across all windows.

@thorjohnsen thorjohnsen requested a review from a team as a code owner March 18, 2026 17:22
@thorjohnsen thorjohnsen requested a review from byshiue March 18, 2026 17:22
… API

Update resource_manager.py and py_executor.py to match the updated C++
signatures where pinBlocks returns unordered_map<windowSize, blockIds>
and unpinBlocksById accepts the same map type.

- KVCacheManager.pin_blocks now returns dict[int, list[int]] instead of None
- KVCacheManager.unpin_blocks_by_id parameter changed from int to dict[int, list[int]]
- AsyncTransferManager stores the pinned_blocks map (not a bare block_id)
  and passes it back to unpin_blocks_by_id on transfer completion

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@nvpohanh
Copy link
Copy Markdown
Collaborator

@eopXD could you review this? thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants