Fix Linux CI test failures: disk_cache timing + docid mapping diagnostics#149
Fix Linux CI test failures: disk_cache timing + docid mapping diagnostics#149rbrenton wants to merge 3 commits into
Conversation
The 5 failing Linux CI tests used sleep(100ms) to wait for the async background writer to complete, which was insufficient on the Linux runner. Replace all sleep-based synchronization with flush_blocking(), which properly waits for all pending writes — matching the pattern already used by the newer tests in this file.
…lure The test_multi_segment_docid_mapping_correctness test fails intermittently on Linux with a file slice bounds error (19139 > 18320) when opening the split bundle. This suggests the bundle footer records offsets that exceed the actual file size — possibly a non-deterministic flush ordering issue in BundleDirectory under the 15MB minimum heap. Changes: - Add split file size logging and bundle footer hex dump on failure to capture the exact state when the error occurs - Enable RUST_BACKTRACE=1 in CI Rust test step for full stack traces - Remove --lib --tests flags from cargo test (run all tests)
|
Claude finished @rbrenton's task —— View job Code Review AnalysisReview Checklist
Review Findings✅ Overall Assessment: LGTM - Well-Targeted CI FixesThis PR correctly addresses the root cause of CI failures with appropriate diagnostic improvements. The changes are minimal, focused, and follow good practices. 📝 Issue 1: Timing-Dependent Test Fixes (disk_cache tests) - EXCELLENTProblem Identified: The disk_cache tests were using Solution Applied: Replace all Review: ✅ Perfect fix
📝 Issue 2: Intermittent Test Diagnostics (multi-segment docid mapping) - APPROPRIATEProblem Identified: Intermittent failure with Solution Applied: Add diagnostic logging and error details without changing test logic. Review: ✅ Good diagnostic approach
📝 Issue 3: CI Infrastructure Improvements - GOODChanges:
Review: ✅ Reasonable improvements
❌ No Issues Found✅ Correctness: All logic changes are sound - replacing timing dependencies with proper synchronization 🎯 Recommendations
✅ ConclusionThis is a well-executed fix that properly addresses both deterministic CI failures (disk_cache timing) and provides appropriate diagnostics for intermittent failures. The changes are minimal, focused, and follow testing best practices. Recommendation: APPROVE ✅ |
cargo test without --lib --tests tries to run doc tests via rustdoc, which fails on the self-hosted macOS runner with "No such file or directory". Restore --lib --tests to skip doc tests while keeping RUST_BACKTRACE=1 for the intermittent failure diagnostics.
|
Claude finished @rbrenton's task —— View job Claude finished @rbrenton's task —— View job Code Review AnalysisReview Checklist
Review Findings✅ Overall Assessment: LGTM - Well-Targeted CI FixesThis PR correctly addresses the root cause of CI failures with appropriate diagnostic improvements. The changes are minimal, focused, and follow good practices. 📝 Issue 1: Timing-Dependent Test Fixes (disk_cache tests) - EXCELLENTProblem Identified: The disk_cache tests were using Solution Applied: Replace all Review: ✅ Perfect fix
📝 Issue 2: Intermittent Test Diagnostics (multi-segment docid mapping) - APPROPRIATEProblem Identified: Intermittent failure with Solution Applied: Add diagnostic logging and error details without changing test logic. Review: ✅ Good diagnostic approach
📝 Issue 3: CI Infrastructure Improvements - GOODChanges:
Review: ✅ Reasonable improvements
❌ No Issues Found✅ Correctness: All logic changes are sound - replacing timing dependencies with proper synchronization 🎯 Recommendations
✅ ConclusionThis is a well-executed fix that properly addresses both deterministic CI failures (disk_cache timing) and provides appropriate diagnostics for intermittent failures. The changes are minimal, focused, and follow testing best practices. Recommendation: APPROVE ✅ |
Summary
Fixes 5 deterministic Linux CI test failures and adds diagnostics for 1 intermittent failure.
disk_cache tests (5 fixes)
Replace
sleep(100ms)withflush_blocking()in all disk_cache tests. The async background writer doesn't complete within 100ms on the Linux runner. The newer tests in the same file already useflush_blocking()correctly — this aligns the older tests.multi_segment_docid_mapping (diagnostics)
The
test_multi_segment_docid_mapping_correctnesstest fails intermittently on Linux withend of requested range exceeds the fileslice length (19139 > 18320). This suggests theBundleDirectorybundle footer records offsets exceeding the actual file size under the 15MB minimum heap.Added:
RUST_BACKTRACE=1in CI for full stack traces on the next failureTest plan