P0: atomic manifest commit helper — clone-mutate-save-publish#27
Open
pbudzik wants to merge 1 commit into
Open
P0: atomic manifest commit helper — clone-mutate-save-publish#27pbudzik wants to merge 1 commit into
pbudzik wants to merge 1 commit into
Conversation
Before: every manifest mutation took the write lock, mutated the manifest in place, then called `save`. If `save` failed, the in-memory state held the mutation but disk did not. Queries saw writes that weren't durable; cleanup-on-save-failure paths (which unlink the segment files they just wrote) left the in-memory manifest pointing at files that no longer existed. Fix: `AppStateInner::commit_manifest` clones the manifest, runs the caller's mutation on the clone, saves the clone to disk, and only publishes the clone in-memory after the save succeeds. On failure both disk and memory are unchanged. `commit_manifest_if` is the variant that lets the closure decide under the write lock whether to commit (used by close-period / reopen-period for race-safe re-checks). Refactored every call site to use the helper: - flusher: raw_segments + last_sealed_wal_id - rollup tick: rollup_segments + watermark - rollup rebuild: drop overlapping rollups + rewind watermark - compaction commit: raw_segments swap + replacement record - compaction sweep: drop finalized replacements - close-period: append ClosedPeriod - reopen-period: remove ClosedPeriod The only mutate-then-save sites left in the tree are the two startup paths in main.rs and manifest.rs::migrate_legacy — both run before the manifest is shared, so the publish-in-memory step is moot. Regression test sabotages manifest writes by replacing the manifest/ directory with a regular file, then verifies the in-memory manifest is unchanged after a failed save. Two more tests verify the happy path and the commit_manifest_if early-exit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the P0 manifest-atomicity finding from external review (the "mutate-then-save" anti-pattern across flusher, rollup, compaction, and period close/reopen).
Before: every mutation looked like
If `save` failed, the in-memory state held the mutation but disk did not. Queries would see writes that weren't durable. Worse, every cleanup-on-save-failure path unlinks the segment files it just wrote — leaving the in-memory manifest pointing at deleted files until process restart.
After: `AppStateInner::commit_manifest(|m| …)` clones the manifest, mutates the clone, saves the clone, and only publishes it in-memory after the save succeeds. `commit_manifest_if` is the variant that lets the closure decide under the write lock whether to commit (used by close-period / reopen-period for race-safe re-checks).
Changes
Test plan
Notes
This is the second of several P0 fixes from the external review (#26 fixed the flusher shutdown deadlock). Next up:
🤖 Generated with Claude Code