refactor(tests): modernise kvCacheManagerTest readability#12329
refactor(tests): modernise kvCacheManagerTest readability#12329thorjohnsen wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
Change A – add assertReuseOccurred / assertNoReuseOccurred helpers Centralise the block-id reuse assertion pattern so each call site expresses *intent* rather than raw ElementsAreArray comparisons. Change B – replace sleep_for with bounded polling in getEvents() The original helper slept for a fixed 100 ms regardless of whether events had already arrived. Replace it with a tight loop that returns as soon as the expected number of events is available, with a time-out to prevent hangs in CI. Change C – convert 42 EXPECT_THAT / ElementsAreArray calls in reuse tests All six BlockManagerReuse* test functions now use the new helpers (assertReuseOccurred / assertNoReuseOccurred) and capture named snapshots (priorBlockIds*) before releaseBlocks() so the assertion reads like a sentence. Change D – split monolithic BlockManagerTest into five focused sub-tests BlockManagerTest_InitialState, _BasicAllocation, _SharedLastContextBlock, _DuplicateRequestIdThrows, _OutOfBlocksThrows. Each sub-test sets up its own state, making failures trivially locatable. Change E – add SCOPED_TRACE to KVCacheManagerLeafBlock* tests Four assertion groups in KVCacheManagerLeafBlockTest and KVCacheManagerLeafBlockWithDependentTest now carry a descriptive trace label so GTest output identifies which group failed. Change F – convert nine TEST_F PartialCopy wrappers to TYPED_TEST Introduce PartialCopyParam<StorageT, DType, Mask> and PartialCopyTypes (with preprocessor guards for BF16 / FP8) so a single TYPED_TEST(KVCacheManagerTypedTest, BlockManagerTestPartialCopy) replaces the nine near-identical TEST_F bodies. Change G – remove stale DISABLED tests, leave actionable TODOs DISABLED_KVCacheManagerAllocationTest and DISABLED_KVCacheManagerSinkTokenLengthTest were permanently skipped with no owners. Delete them and leave a one-line TODO so the intent is not lost. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThe changes comprehensively enhance the KVCacheManager unit test suite by improving test determinism with a bounded-time polling mechanism, introducing reusable assertion helpers for block reuse validation, and significantly expanding test coverage with new test cases and typed test variants across block allocation, reuse patterns, and eviction scenarios. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip You can disable the changed files summary in the walkthrough.Disable the |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp (1)
1-16:⚠️ Potential issue | 🟡 MinorUpdate the SPDX header year to 2026.
This file is modified in this PR, but the copyright line still ends at 2025. As per coding guidelines,
Add NVIDIA copyright header on ALL new files, and update year on modified files.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp` around lines 1 - 16, Update the SPDX header in cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp by changing the copyright year range in the top comment block from "2023-2025" to "2023-2026" so the file header reflects the current modification year; edit the SPDX header comment (the block starting with "SPDX-FileCopyrightText") accordingly.
🧹 Nitpick comments (1)
cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp (1)
152-175: Extract the commonBlockManagersetup.The split tests are easier to read, but this constructor/setup block now exists in five places. A small factory/helper would keep the new single-purpose cases readable without duplicating all of the knob values.
Also applies to: 184-205, 228-249, 271-292, 308-329
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp` around lines 152 - 175, Extract the repeated BlockManager construction into a helper (e.g., a static factory method or test fixture helper) used by KVCacheManagerTest so the five tests reuse the same setup; move the common constants (numLayers, numKvHeads, sizePerHead, tokensPerBlock, blocksInPrimaryPool, blocksInSecondaryPool, maxNumSequences, stream, onboardBlocks, beamWidth, maxAttentionWindow, blocksPerWindow) into that helper and have it return a configured BlockManager instance (or a struct containing the BlockManager and related derived values) so tests call something like MakeBlockManager(...) and then call allocatePools(false) on the returned manager, replacing the duplicated constructor blocks in BlockManagerTest_InitialState and the other four test cases.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp`:
- Around line 643-649: The comment indicates a codespell and formatting failure;
update the comment text to replace "re-uses" with "reuses" and then run
clang-format on the file to fix style issues so CI passes; the affected area is
the comment above the KVCacheManagerTypedTest class which inherits from
KVCacheManagerTest and is instantiated with PartialCopyTypes.
- Around line 121-136: The helper assertReuseOccurred currently only checks a
minimum reuse count which can hide over-reuse or the wrong subset; update it to
assert the exact reused set and count: compute the set of reused IDs from
reuseIds, assert that reusedSet.size() == numExpectedReused, and assert every
element of reusedSet is contained in priorIds (priorSet) and/or change the
signature to accept an explicit expectedReusedIds parameter
(std::vector<KVCacheBlock::IdType> expectedReused) and assert reusedSet ==
expectedReusedSet; reference function assertReuseOccurred, parameters priorIds,
reuseIds, numExpectedReused (or new expectedReused) and ensure the test failure
message includes the expected vs actual reused id lists and context.
- Around line 108-118: getEvents currently spins up to 5s whenever there are no
events because the loop condition checks result.empty(), causing zero-event
assertions to wait the full deadline; change the strategy in getEvents: use a
short bounded wait for the empty-path (e.g., total empty timeout ~50–100ms) by
polling getLatestEvents with a small per-call timeout, and once a non-empty
result is observed perform a brief quiet-period drain (e.g., an additional
100–200ms loop) to collect any burst-split events before returning; keep calls
to KVCacheManager::flushIterationEvents() and use
KVCacheManager::getLatestEvents(...) but adjust the loop logic to separate the
initial empty timeout and the post-first-event quiet drain to avoid long waits
on zero-event paths and to aggregate bursts.
---
Outside diff comments:
In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp`:
- Around line 1-16: Update the SPDX header in
cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp by changing the
copyright year range in the top comment block from "2023-2025" to "2023-2026" so
the file header reflects the current modification year; edit the SPDX header
comment (the block starting with "SPDX-FileCopyrightText") accordingly.
---
Nitpick comments:
In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp`:
- Around line 152-175: Extract the repeated BlockManager construction into a
helper (e.g., a static factory method or test fixture helper) used by
KVCacheManagerTest so the five tests reuse the same setup; move the common
constants (numLayers, numKvHeads, sizePerHead, tokensPerBlock,
blocksInPrimaryPool, blocksInSecondaryPool, maxNumSequences, stream,
onboardBlocks, beamWidth, maxAttentionWindow, blocksPerWindow) into that helper
and have it return a configured BlockManager instance (or a struct containing
the BlockManager and related derived values) so tests call something like
MakeBlockManager(...) and then call allocatePools(false) on the returned
manager, replacing the duplicated constructor blocks in
BlockManagerTest_InitialState and the other four test cases.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bae92cbf-a21c-4a0f-86a3-560cd07c67af
📒 Files selected for processing (1)
cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp
| std::deque<tle::KVCacheEvent> getEvents(KVCacheManager& kvCacheManager) | ||
| { | ||
| kvCacheManager.flushIterationEvents(); | ||
| std::this_thread::sleep_for(std::chrono::milliseconds(100)); | ||
| return kvCacheManager.getLatestEvents(std::chrono::milliseconds(100)); | ||
| // Poll with a bounded timeout instead of a fixed sleep to avoid CI flakiness. | ||
| auto const deadline = std::chrono::steady_clock::now() + std::chrono::seconds(5); | ||
| std::deque<tle::KVCacheEvent> result; | ||
| while (result.empty() && std::chrono::steady_clock::now() < deadline) | ||
| { | ||
| result = kvCacheManager.getLatestEvents(std::chrono::milliseconds(50)); | ||
| } | ||
| return result; |
There was a problem hiding this comment.
getEvents() now makes the no-event path wait the full 5 seconds.
Every zero-event assertion that uses this helper now spins until deadline because result.empty() keeps the loop alive. The non-empty path also returns on the first batch, so one logical burst can still be split across reads. A short empty-path timeout plus a brief quiet-period drain after the first event would keep the bounded wait without slowing or destabilizing the event-stream tests.
Possible fix
std::deque<tle::KVCacheEvent> getEvents(KVCacheManager& kvCacheManager)
{
kvCacheManager.flushIterationEvents();
- // Poll with a bounded timeout instead of a fixed sleep to avoid CI flakiness.
- auto const deadline = std::chrono::steady_clock::now() + std::chrono::seconds(5);
- std::deque<tle::KVCacheEvent> result;
- while (result.empty() && std::chrono::steady_clock::now() < deadline)
- {
- result = kvCacheManager.getLatestEvents(std::chrono::milliseconds(50));
- }
- return result;
+ auto constexpr kPollInterval = std::chrono::milliseconds{50};
+ auto constexpr kQuietPeriod = std::chrono::milliseconds{100};
+ auto constexpr kTimeout = std::chrono::seconds{5};
+
+ auto result = kvCacheManager.getLatestEvents(kPollInterval);
+ if (result.empty())
+ {
+ return result;
+ }
+
+ auto const deadline = std::chrono::steady_clock::now() + kTimeout;
+ auto lastEventAt = std::chrono::steady_clock::now();
+ while (std::chrono::steady_clock::now() < deadline
+ && std::chrono::steady_clock::now() - lastEventAt < kQuietPeriod)
+ {
+ auto next = kvCacheManager.getLatestEvents(kPollInterval);
+ if (!next.empty())
+ {
+ lastEventAt = std::chrono::steady_clock::now();
+ result.insert(result.end(), next.begin(), next.end());
+ }
+ }
+ return result;
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp` around lines 108 -
118, getEvents currently spins up to 5s whenever there are no events because the
loop condition checks result.empty(), causing zero-event assertions to wait the
full deadline; change the strategy in getEvents: use a short bounded wait for
the empty-path (e.g., total empty timeout ~50–100ms) by polling getLatestEvents
with a small per-call timeout, and once a non-empty result is observed perform a
brief quiet-period drain (e.g., an additional 100–200ms loop) to collect any
burst-split events before returning; keep calls to
KVCacheManager::flushIterationEvents() and use
KVCacheManager::getLatestEvents(...) but adjust the loop logic to separate the
initial empty timeout and the post-first-event quiet drain to avoid long waits
on zero-event paths and to aggregate bursts.
| /// @brief Asserts that at least numExpectedReused blocks from priorIds appear in reuseIds. | ||
| inline void assertReuseOccurred( | ||
| std::vector<KVCacheBlock::IdType> const& priorIds, | ||
| std::vector<KVCacheBlock::IdType> const& reuseIds, | ||
| int numExpectedReused, | ||
| std::string const& context = "") | ||
| { | ||
| std::set<KVCacheBlock::IdType> priorSet(priorIds.begin(), priorIds.end()); | ||
| int reusedCount = 0; | ||
| for (auto id : reuseIds) | ||
| if (priorSet.count(id)) | ||
| ++reusedCount; | ||
| EXPECT_GE(reusedCount, numExpectedReused) | ||
| << context << ": expected at least " << numExpectedReused | ||
| << " blocks reused, got " << reusedCount; | ||
| } |
There was a problem hiding this comment.
assertReuseOccurred() weakens the old reuse assertions.
The converted tests used to pin the exact block-id layout. EXPECT_GE(reusedCount, numExpectedReused) only checks a minimum, so over-reuse or the wrong subset can still pass even though the PR says coverage should be unchanged. Please make the helper assert the exact reused set/count, or accept the expected reused IDs explicitly.
Minimal tightening
- std::set<KVCacheBlock::IdType> priorSet(priorIds.begin(), priorIds.end());
- int reusedCount = 0;
- for (auto id : reuseIds)
- if (priorSet.count(id))
- ++reusedCount;
- EXPECT_GE(reusedCount, numExpectedReused)
- << context << ": expected at least " << numExpectedReused
- << " blocks reused, got " << reusedCount;
+ auto const priorSet = std::set<KVCacheBlock::IdType>{priorIds.begin(), priorIds.end()};
+ auto reusedSet = std::set<KVCacheBlock::IdType>{};
+ for (auto const id : reuseIds)
+ {
+ if (priorSet.count(id) != 0)
+ {
+ reusedSet.insert(id);
+ }
+ }
+ EXPECT_EQ(static_cast<int>(reusedSet.size()), numExpectedReused)
+ << context << ": expected exactly " << numExpectedReused
+ << " reused blocks, got " << reusedSet.size();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp` around lines 121 -
136, The helper assertReuseOccurred currently only checks a minimum reuse count
which can hide over-reuse or the wrong subset; update it to assert the exact
reused set and count: compute the set of reused IDs from reuseIds, assert that
reusedSet.size() == numExpectedReused, and assert every element of reusedSet is
contained in priorIds (priorSet) and/or change the signature to accept an
explicit expectedReusedIds parameter (std::vector<KVCacheBlock::IdType>
expectedReused) and assert reusedSet == expectedReusedSet; reference function
assertReuseOccurred, parameters priorIds, reuseIds, numExpectedReused (or new
expectedReused) and ensure the test failure message includes the expected vs
actual reused id lists and context.
| // KVCacheManagerTypedTest re-uses the same fixture as KVCacheManagerTest but | ||
| // is instantiated once per entry in PartialCopyTypes. | ||
| template <typename P> | ||
| class KVCacheManagerTypedTest : public KVCacheManagerTest | ||
| { | ||
| runPartialCopyTest<std::uint8_t, nvinfer1::DataType::kUINT8, 255, KvCacheTransferMode::DRAM>(); | ||
| runPartialCopyTest<std::uint8_t, nvinfer1::DataType::kUINT8, 255, KvCacheTransferMode::GDS>(); | ||
| } | ||
| }; | ||
| TYPED_TEST_SUITE(KVCacheManagerTypedTest, PartialCopyTypes); |
There was a problem hiding this comment.
This block still trips CI.
Line 643 has the re-uses → reuses codespell failure, and the file still needs a clang-format pass before the checks will go green.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp` around lines 643 -
649, The comment indicates a codespell and formatting failure; update the
comment text to replace "re-uses" with "reuses" and then run clang-format on the
file to fix style issues so CI passes; the affected area is the comment above
the KVCacheManagerTypedTest class which inherits from KVCacheManagerTest and is
instantiated with PartialCopyTypes.
|
@eopXD could you review this? thanks |
|
@eopXD could you review this? Thanks |
|
Lets have basic checkmarks pass first before a reviewal. |
Summary
Seven focused changes to
cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cppto improve readability, debuggability, and maintainability without altering test coverage:assertReuseOccurred/assertNoReuseOccurredhelpers that centralise the block-id reuse assertion pattern usingKVCacheBlock::IdTypeandstd::set.std::this_thread::sleep_for(100ms)ingetEvents()with a bounded polling loop so the helper returns as soon as events arrive (CI speed-up, no phantom timeouts).EXPECT_THAT(…, ElementsAreArray(…))calls across the sixBlockManagerReuse*tests to namedassertReuseOccurred/assertNoReuseOccurredcall-sites with capturedpriorBlockIds*snapshots.BlockManagerTestinto five single-responsibility sub-tests (_InitialState,_BasicAllocation,_SharedLastContextBlock,_DuplicateRequestIdThrows,_OutOfBlocksThrows).SCOPED_TRACElabels to four assertion groups inKVCacheManagerLeafBlockTestandKVCacheManagerLeafBlockWithDependentTestso GTest output pinpoints the failing group.TEST_FPartialCopy wrappers with aTYPED_TEST_SUITE(KVCacheManagerTypedTest) parameterised overPartialCopyParam<StorageT, DType, Mask>; preprocessor guards forENABLE_BF16/ENABLE_FP8are preserved.DISABLED_KVCacheManagerAllocationTestandDISABLED_KVCacheManagerSinkTokenLengthTest(permanently skipped, no owner); replace each with a one-line// TODO: re-enable … once …comment.Test plan
cmake --buildtargetingkvCacheManagerTestctest -R kvCacheManagerTest— all previously-passing tests should still passPartialCopyParamentries inPartialCopyTypes🤖 Generated with Claude Code
Summary by CodeRabbit