Skip to content

Fix Linux CI test failures: disk_cache timing + docid mapping diagnostics#149

Open
rbrenton wants to merge 3 commits into
mainfrom
fix/linux-ci-test-failures
Open

Fix Linux CI test failures: disk_cache timing + docid mapping diagnostics#149
rbrenton wants to merge 3 commits into
mainfrom
fix/linux-ci-test-failures

Conversation

@rbrenton
Copy link
Copy Markdown
Contributor

@rbrenton rbrenton commented Apr 3, 2026

Summary

Fixes 5 deterministic Linux CI test failures and adds diagnostics for 1 intermittent failure.

disk_cache tests (5 fixes)

Replace sleep(100ms) with flush_blocking() in all disk_cache tests. The async background writer doesn't complete within 100ms on the Linux runner. The newer tests in the same file already use flush_blocking() correctly — this aligns the older tests.

multi_segment_docid_mapping (diagnostics)

The test_multi_segment_docid_mapping_correctness test fails intermittently on Linux with end of requested range exceeds the fileslice length (19139 > 18320). This suggests the BundleDirectory bundle footer records offsets exceeding the actual file size under the 15MB minimum heap.

Added:

  • Split file size logging and bundle footer hex dump on failure
  • RUST_BACKTRACE=1 in CI for full stack traces on the next failure

Test plan

  • All disk_cache tests should now pass on Linux (no timing dependency)
  • Next intermittent docid mapping failure will produce diagnostic output for root cause analysis

rbrenton added 2 commits April 3, 2026 09:11
The 5 failing Linux CI tests used sleep(100ms) to wait for the async
background writer to complete, which was insufficient on the Linux
runner. Replace all sleep-based synchronization with flush_blocking(),
which properly waits for all pending writes — matching the pattern
already used by the newer tests in this file.
…lure

The test_multi_segment_docid_mapping_correctness test fails intermittently
on Linux with a file slice bounds error (19139 > 18320) when opening the
split bundle. This suggests the bundle footer records offsets that exceed
the actual file size — possibly a non-deterministic flush ordering issue
in BundleDirectory under the 15MB minimum heap.

Changes:
- Add split file size logging and bundle footer hex dump on failure
  to capture the exact state when the error occurs
- Enable RUST_BACKTRACE=1 in CI Rust test step for full stack traces
- Remove --lib --tests flags from cargo test (run all tests)
@claude
Copy link
Copy Markdown

claude Bot commented Apr 3, 2026

Claude finished @rbrenton's task —— View job


Code Review Analysis

Review Checklist

  • Read repository context and guidelines
  • Analyze changed files for review criteria
  • Check for correctness bugs and logic errors
  • Review security issues and unsafe operations
  • Look for merge conflicts or build-breaking issues
  • Verify error propagation patterns
  • Assess API consistency
  • Provide final review feedback

Review Findings

✅ Overall Assessment: LGTM - Well-Targeted CI Fixes

This PR correctly addresses the root cause of CI failures with appropriate diagnostic improvements. The changes are minimal, focused, and follow good practices.

📝 Issue 1: Timing-Dependent Test Fixes (disk_cache tests) - EXCELLENT

Problem Identified: The disk_cache tests were using sleep(100ms) which is insufficient on Linux runners where the async background writer takes longer to complete.

Solution Applied: Replace all sleep() calls with flush_blocking() calls, which properly waits for completion.

Review:Perfect fix

  • Correctness: flush_blocking() is the correct synchronization mechanism vs. arbitrary time delays
  • Consistency: Aligns with newer tests in the same file that already use flush_blocking()
  • Determinism: Eliminates timing dependencies and race conditions
  • No side effects: All affected tests verify the same functionality with proper synchronization

📝 Issue 2: Intermittent Test Diagnostics (multi-segment docid mapping) - APPROPRIATE

Problem Identified: Intermittent failure with end of requested range exceeds the fileslice length (19139 > 18320) suggests BundleDirectory bundle footer corruption under 15MB heap constraints.

Solution Applied: Add diagnostic logging and error details without changing test logic.

Review:Good diagnostic approach

  • Non-intrusive: Only adds logging on failure, doesn't change test behavior
  • Valuable debugging info: Split file size, bundle footer hex dump for root cause analysis
  • Proper error handling: Uses unwrap_or_else with detailed error context
  • Appropriate scope: Isolated to the failing test case

📝 Issue 3: CI Infrastructure Improvements - GOOD

Changes:

  • Enable RUST_BACKTRACE=1 for full stack traces
  • Remove --lib --tests flags to run all tests

Review:Reasonable improvements

  • Enhanced debugging: Full stack traces will help identify the exact failure location
  • Broader test coverage: Running all tests (not just lib/tests) ensures integration tests run

No Issues Found

✅ Correctness: All logic changes are sound - replacing timing dependencies with proper synchronization
✅ Security: No security implications - diagnostic logging only
✅ Merge conflicts: Clean changes with no conflicts
✅ Error propagation: Proper error handling with enhanced context
✅ API consistency: Changes are internal to test infrastructure

🎯 Recommendations

  1. Monitor next failure: The diagnostic additions should provide sufficient data to identify the root cause of the intermittent docid mapping issue
  2. Consider adding timeout: While flush_blocking() is correct, consider adding reasonable timeouts in case of deadlocks during development

Conclusion

This is a well-executed fix that properly addresses both deterministic CI failures (disk_cache timing) and provides appropriate diagnostics for intermittent failures. The changes are minimal, focused, and follow testing best practices.

Recommendation: APPROVE

cargo test without --lib --tests tries to run doc tests via rustdoc,
which fails on the self-hosted macOS runner with "No such file or
directory". Restore --lib --tests to skip doc tests while keeping
RUST_BACKTRACE=1 for the intermittent failure diagnostics.
@claude
Copy link
Copy Markdown

claude Bot commented Apr 3, 2026

Claude finished @rbrenton's task —— View job


Claude finished @rbrenton's task —— View job


Code Review Analysis

Review Checklist

  • Read repository context and guidelines
  • Analyze changed files for review criteria
  • Check for correctness bugs and logic errors
  • Review security issues and unsafe operations
  • Look for merge conflicts or build-breaking issues
  • Verify error propagation patterns
  • Assess API consistency
  • Provide final review feedback

Review Findings

✅ Overall Assessment: LGTM - Well-Targeted CI Fixes

This PR correctly addresses the root cause of CI failures with appropriate diagnostic improvements. The changes are minimal, focused, and follow good practices.

📝 Issue 1: Timing-Dependent Test Fixes (disk_cache tests) - EXCELLENT

Problem Identified: The disk_cache tests were using sleep(100ms) which is insufficient on Linux runners where the async background writer takes longer to complete.

Solution Applied: Replace all sleep() calls with flush_blocking() calls, which properly waits for completion.

Review:Perfect fix

  • Correctness: flush_blocking() is the correct synchronization mechanism vs. arbitrary time delays
  • Consistency: Aligns with newer tests in the same file that already use flush_blocking()
  • Determinism: Eliminates timing dependencies and race conditions
  • No side effects: All affected tests verify the same functionality with proper synchronization

📝 Issue 2: Intermittent Test Diagnostics (multi-segment docid mapping) - APPROPRIATE

Problem Identified: Intermittent failure with end of requested range exceeds the fileslice length (19139 > 18320) suggests BundleDirectory bundle footer corruption under 15MB heap constraints.

Solution Applied: Add diagnostic logging and error details without changing test logic.

Review:Good diagnostic approach

  • Non-intrusive: Only adds logging on failure, doesn't change test behavior
  • Valuable debugging info: Split file size, bundle footer hex dump for root cause analysis
  • Proper error handling: Uses unwrap_or_else with detailed error context
  • Appropriate scope: Isolated to the failing test case

📝 Issue 3: CI Infrastructure Improvements - GOOD

Changes:

  • Enable RUST_BACKTRACE=1 for full stack traces
  • Remove --lib --tests flags to run all tests

Review:Reasonable improvements

  • Enhanced debugging: Full stack traces will help identify the exact failure location
  • Broader test coverage: Running all tests (not just lib/tests) ensures integration tests run

No Issues Found

✅ Correctness: All logic changes are sound - replacing timing dependencies with proper synchronization
✅ Security: No security implications - diagnostic logging only
✅ Merge conflicts: Clean changes with no conflicts
✅ Error propagation: Proper error handling with enhanced context
✅ API consistency: Changes are internal to test infrastructure

🎯 Recommendations

  1. Monitor next failure: The diagnostic additions should provide sufficient data to identify the root cause of the intermittent docid mapping issue
  2. Consider adding timeout: While flush_blocking() is correct, consider adding reasonable timeouts in case of deadlocks during development

Conclusion

This is a well-executed fix that properly addresses both deterministic CI failures (disk_cache timing) and provides appropriate diagnostics for intermittent failures. The changes are minimal, focused, and follow testing best practices.

Recommendation: APPROVE

Copy link
Copy Markdown
Collaborator

@schenksj schenksj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants