Fix Linux CI test failures: disk_cache timing + docid mapping diagnostics by rbrenton · Pull Request #149 · indextables/tantivy4java

rbrenton · 2026-04-03T16:00:12Z

Summary

Fixes 5 deterministic Linux CI test failures and adds diagnostics for 1 intermittent failure.

disk_cache tests (5 fixes)

Replace sleep(100ms) with flush_blocking() in all disk_cache tests. The async background writer doesn't complete within 100ms on the Linux runner. The newer tests in the same file already use flush_blocking() correctly — this aligns the older tests.

multi_segment_docid_mapping (diagnostics)

The test_multi_segment_docid_mapping_correctness test fails intermittently on Linux with end of requested range exceeds the fileslice length (19139 > 18320). This suggests the BundleDirectory bundle footer records offsets exceeding the actual file size under the 15MB minimum heap.

Added:

Split file size logging and bundle footer hex dump on failure
RUST_BACKTRACE=1 in CI for full stack traces on the next failure

Test plan

All disk_cache tests should now pass on Linux (no timing dependency)
Next intermittent docid mapping failure will produce diagnostic output for root cause analysis

The 5 failing Linux CI tests used sleep(100ms) to wait for the async background writer to complete, which was insufficient on the Linux runner. Replace all sleep-based synchronization with flush_blocking(), which properly waits for all pending writes — matching the pattern already used by the newer tests in this file.

…lure The test_multi_segment_docid_mapping_correctness test fails intermittently on Linux with a file slice bounds error (19139 > 18320) when opening the split bundle. This suggests the bundle footer records offsets that exceed the actual file size — possibly a non-deterministic flush ordering issue in BundleDirectory under the 15MB minimum heap. Changes: - Add split file size logging and bundle footer hex dump on failure to capture the exact state when the error occurs - Enable RUST_BACKTRACE=1 in CI Rust test step for full stack traces - Remove --lib --tests flags from cargo test (run all tests)

claude · 2026-04-03T16:00:27Z

Claude finished @rbrenton's task —— View job

Code Review Analysis

Review Checklist

Read repository context and guidelines
Analyze changed files for review criteria
Check for correctness bugs and logic errors
Review security issues and unsafe operations
Look for merge conflicts or build-breaking issues
Verify error propagation patterns
Assess API consistency
Provide final review feedback

Review Findings

✅ Overall Assessment: LGTM - Well-Targeted CI Fixes

This PR correctly addresses the root cause of CI failures with appropriate diagnostic improvements. The changes are minimal, focused, and follow good practices.

📝 Issue 1: Timing-Dependent Test Fixes (disk_cache tests) - EXCELLENT

Problem Identified: The disk_cache tests were using sleep(100ms) which is insufficient on Linux runners where the async background writer takes longer to complete.

Solution Applied: Replace all sleep() calls with flush_blocking() calls, which properly waits for completion.

Review: ✅ Perfect fix

Correctness: flush_blocking() is the correct synchronization mechanism vs. arbitrary time delays
Consistency: Aligns with newer tests in the same file that already use flush_blocking()
Determinism: Eliminates timing dependencies and race conditions
No side effects: All affected tests verify the same functionality with proper synchronization

📝 Issue 2: Intermittent Test Diagnostics (multi-segment docid mapping) - APPROPRIATE

Problem Identified: Intermittent failure with end of requested range exceeds the fileslice length (19139 > 18320) suggests BundleDirectory bundle footer corruption under 15MB heap constraints.

Solution Applied: Add diagnostic logging and error details without changing test logic.

Review: ✅ Good diagnostic approach

Non-intrusive: Only adds logging on failure, doesn't change test behavior
Valuable debugging info: Split file size, bundle footer hex dump for root cause analysis
Proper error handling: Uses unwrap_or_else with detailed error context
Appropriate scope: Isolated to the failing test case

📝 Issue 3: CI Infrastructure Improvements - GOOD

Changes:

Enable RUST_BACKTRACE=1 for full stack traces
Remove --lib --tests flags to run all tests

Review: ✅ Reasonable improvements

Enhanced debugging: Full stack traces will help identify the exact failure location
Broader test coverage: Running all tests (not just lib/tests) ensures integration tests run

❌ No Issues Found

✅ Correctness: All logic changes are sound - replacing timing dependencies with proper synchronization
✅ Security: No security implications - diagnostic logging only
✅ Merge conflicts: Clean changes with no conflicts
✅ Error propagation: Proper error handling with enhanced context
✅ API consistency: Changes are internal to test infrastructure

🎯 Recommendations

Monitor next failure: The diagnostic additions should provide sufficient data to identify the root cause of the intermittent docid mapping issue
Consider adding timeout: While flush_blocking() is correct, consider adding reasonable timeouts in case of deadlocks during development

✅ Conclusion

This is a well-executed fix that properly addresses both deterministic CI failures (disk_cache timing) and provides appropriate diagnostics for intermittent failures. The changes are minimal, focused, and follow testing best practices.

Recommendation: APPROVE ✅

cargo test without --lib --tests tries to run doc tests via rustdoc, which fails on the self-hosted macOS runner with "No such file or directory". Restore --lib --tests to skip doc tests while keeping RUST_BACKTRACE=1 for the intermittent failure diagnostics.

claude · 2026-04-03T16:23:53Z

Claude finished @rbrenton's task —— View job

Code Review Analysis

Review Checklist

Read repository context and guidelines
Analyze changed files for review criteria
Check for correctness bugs and logic errors
Review security issues and unsafe operations
Look for merge conflicts or build-breaking issues
Verify error propagation patterns
Assess API consistency
Provide final review feedback

Review Findings

✅ Overall Assessment: LGTM - Well-Targeted CI Fixes

This PR correctly addresses the root cause of CI failures with appropriate diagnostic improvements. The changes are minimal, focused, and follow good practices.

📝 Issue 1: Timing-Dependent Test Fixes (disk_cache tests) - EXCELLENT

Problem Identified: The disk_cache tests were using sleep(100ms) which is insufficient on Linux runners where the async background writer takes longer to complete.

Solution Applied: Replace all sleep() calls with flush_blocking() calls, which properly waits for completion.

Review: ✅ Perfect fix

Correctness: flush_blocking() is the correct synchronization mechanism vs. arbitrary time delays
Consistency: Aligns with newer tests in the same file that already use flush_blocking()
Determinism: Eliminates timing dependencies and race conditions
No side effects: All affected tests verify the same functionality with proper synchronization

📝 Issue 2: Intermittent Test Diagnostics (multi-segment docid mapping) - APPROPRIATE

Problem Identified: Intermittent failure with end of requested range exceeds the fileslice length (19139 > 18320) suggests BundleDirectory bundle footer corruption under 15MB heap constraints.

Solution Applied: Add diagnostic logging and error details without changing test logic.

Review: ✅ Good diagnostic approach

Non-intrusive: Only adds logging on failure, doesn't change test behavior
Valuable debugging info: Split file size, bundle footer hex dump for root cause analysis
Proper error handling: Uses unwrap_or_else with detailed error context
Appropriate scope: Isolated to the failing test case

📝 Issue 3: CI Infrastructure Improvements - GOOD

Changes:

Enable RUST_BACKTRACE=1 for full stack traces
Remove --lib --tests flags to run all tests

Review: ✅ Reasonable improvements

Enhanced debugging: Full stack traces will help identify the exact failure location
Broader test coverage: Running all tests (not just lib/tests) ensures integration tests run

❌ No Issues Found

✅ Correctness: All logic changes are sound - replacing timing dependencies with proper synchronization
✅ Security: No security implications - diagnostic logging only
✅ Merge conflicts: Clean changes with no conflicts
✅ Error propagation: Proper error handling with enhanced context
✅ API consistency: Changes are internal to test infrastructure

🎯 Recommendations

Monitor next failure: The diagnostic additions should provide sufficient data to identify the root cause of the intermittent docid mapping issue
Consider adding timeout: While flush_blocking() is correct, consider adding reasonable timeouts in case of deadlocks during development

✅ Conclusion

This is a well-executed fix that properly addresses both deterministic CI failures (disk_cache timing) and provides appropriate diagnostics for intermittent failures. The changes are minimal, focused, and follow testing best practices.

Recommendation: APPROVE ✅

schenksj

lgtm

rbrenton added 2 commits April 3, 2026 09:11

rbrenton temporarily deployed to test-runners April 3, 2026 16:00 — with GitHub Actions Inactive

rbrenton had a problem deploying to test-runners April 3, 2026 16:00 — with GitHub Actions Failure

rbrenton temporarily deployed to test-runners April 3, 2026 16:23 — with GitHub Actions Inactive

schenksj approved these changes Apr 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Linux CI test failures: disk_cache timing + docid mapping diagnostics#149

Fix Linux CI test failures: disk_cache timing + docid mapping diagnostics#149
rbrenton wants to merge 3 commits into
mainfrom
fix/linux-ci-test-failures

rbrenton commented Apr 3, 2026

Uh oh!

claude Bot commented Apr 3, 2026 •

edited

Loading

Uh oh!

claude Bot commented Apr 3, 2026 •

edited

Loading

Uh oh!

schenksj left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rbrenton commented Apr 3, 2026

Summary

disk_cache tests (5 fixes)

multi_segment_docid_mapping (diagnostics)

Test plan

Uh oh!

claude Bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Analysis

Review Checklist

Review Findings

✅ Overall Assessment: LGTM - Well-Targeted CI Fixes

📝 Issue 1: Timing-Dependent Test Fixes (disk_cache tests) - EXCELLENT

📝 Issue 2: Intermittent Test Diagnostics (multi-segment docid mapping) - APPROPRIATE

📝 Issue 3: CI Infrastructure Improvements - GOOD

❌ No Issues Found

🎯 Recommendations

✅ Conclusion

Uh oh!

claude Bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Analysis

Review Checklist

Review Findings

✅ Overall Assessment: LGTM - Well-Targeted CI Fixes

📝 Issue 1: Timing-Dependent Test Fixes (disk_cache tests) - EXCELLENT

📝 Issue 2: Intermittent Test Diagnostics (multi-segment docid mapping) - APPROPRIATE

📝 Issue 3: CI Infrastructure Improvements - GOOD

❌ No Issues Found

🎯 Recommendations

✅ Conclusion

Uh oh!

schenksj left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude Bot commented Apr 3, 2026 •

edited

Loading

claude Bot commented Apr 3, 2026 •

edited

Loading