feat(snapshot): dedupe rebuild work, lock per key, and prune the store by cleak · Pull Request #45 · cleak/tempyr

cleak · 2026-05-03T00:30:50Z

Summary

Eliminates three sources of waste in the shared snapshot index store at <git-common-dir>/tempyr/snapshots/<key>/. On a 30-worktree user the snapshot dir grew to 1.6 GB / 884 dirs because every tempyr index rebuild rewrote bytes that another worktree had already produced, and there was no GC.

What changes

tempyr index rebuild short-circuits when the snapshot already exists.
index_cmd::run_rebuild now seeds from snapshots/<key>/index.db and returns instead of re-tokenizing the corpus, re-running FTS5, and re-publishing identical bytes. New --force flag preserves the old behavior for corruption recovery.
Per-snapshot-key build lock. New SnapshotBuildLock (same File::try_lock pattern as the existing PublisherLock) takes an exclusive file lock at snapshots/.locks/<key>.build.lock. A second worktree racing on the same never-seen graph state sees the lock contended, waits for the snapshot to land, then short-circuits through path 1 — never double-builds.
tempyr snapshot {prune,list} with Nix-style hybrid retention:
- Pinned: every live worktree's <shared_root>/worktrees/<wt-id>/snapshot-key.txt cursor (cross-checked against git worktree list --porcelain so dangling cursors from deleted worktrees don't keep dead snapshots alive).
- Recent buffer: --keep-recent newest by mtime (default 20) above pinned.
- Soft cap: --max-size (default 500 MB) — newest unpinned win the headroom; older oversized ones evict.
- Two-phase delete: rename to .gc-<key>-<pid>-<ts>/ then remove_dir_all, with sweep of orphaned stubs from prior runs (handles Windows EBUSY when a long-running query holds the file open).
- Auto-runs at the tail of index rebuild and index update (best-effort, errors swallowed).
Plus tempyr doctor now reports snapshot-store dir count + total bytes with a hint:
```
snapshot store: 886 dirs, 1.54 GB (consider \`tempyr snapshot prune\`)
```

Live verification on the originating repo

tempyr index rebuild (no flag): `Index reused from existing snapshot: 354 nodes, 872 edges, 354 FTS entries`. Embeddings: 0 fetched, 354 cached. Zero work.
`tempyr snapshot prune --dry-run`: "would evict 671 of 886 snapshots, kept 215 (7 pinned, 20 buffer, 188 under cap)". Would shrink 1.6 GB → ~80 MB without losing any embedding spend (embeddings are content-keyed in the separate shared store).

Behavior notes

tempyr index rebuild no longer always rewrites. This is the headline behavior change. Existing callers that ran rebuild expecting fresh bytes get a no-op when the snapshot is already on disk. Use `--force` to recover from corruption (help text calls this out).
Embedding cost: zero impact. Embeddings live in a separate content-keyed store (`embeddings/.db`); deleting a snapshot doesn't lose any embedding.
Concurrency: WAL-mode SQLite already supports many readers + occasional writer. The build lock only serializes writers (rebuild). Reads from active queries are unaffected.

Test plan

`cargo fmt --check` clean
`cargo clippy --workspace --all-targets -- -D warnings` clean
`cargo test --workspace` — 656 passing, 0 failing, 12 ignored (was 583 on master; +73 includes 6 new `negotiate_rebuild_slot` tests covering both wait-then-short-circuit and blocking-acquire paths, 5 new prune-policy tests, plus build-lock and snapshot-key-validity tests)
Live smoke: `tempyr index rebuild` short-circuits on unchanged graph; `tempyr snapshot prune --dry-run` plans correctly against the originating 886-snapshot store; `tempyr doctor` shows the new snapshot-store summary line.
Reviewer: spot-check the `negotiate_rebuild_slot` enum-based control flow in `crates/tempyr-cli/src/commands/index_cmd.rs` and the two-phase delete in `crates/tempyr-cli/src/commands/snapshot_cmd.rs::two_phase_remove`.
On a non-Windows host: confirm the build-lock blocking-acquire timeout test passes (it polls a real `File::try_lock`).

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added snapshot management: tempyr snapshot prune (size/recency retention) and tempyr snapshot list (JSON or human).
- Rebuild gains a --force option; rebuild/update JSON now includes a source field (rebuilt vs reused).
Improvements
- Rebuild/update coordinate concurrent rebuilds to prefer reuse when possible and run best-effort pruning afterward.
- doctor now reports snapshot store usage with pruning hints.
- Health reporting includes snapshot store counts and sizes when available.
Chores
- Updated build/dev dependencies.

Eliminates three sources of waste in the shared snapshot index store under `<git-common-dir>/tempyr/snapshots/<key>/`: 1. `tempyr index rebuild` short-circuits to a free seed when the shared snapshot for the current graph state already exists, instead of rebuilding from scratch and producing identical bytes. Adds `--force` for corruption recovery. 2. New `SnapshotBuildLock` (per snapshot key, exclusive file lock under `snapshots/.locks/`) serializes concurrent rebuilds. A second worktree racing on the same never-seen graph state waits for the first to publish, then short-circuits through path 1. 3. New `tempyr snapshot {prune,list}` commands implement Nix-style hybrid retention: pinned (live worktree cursors cross-checked against `git worktree list`) ∪ recent buffer (`--keep-recent`, default 20) ∪ soft size cap (`--max-size`, default 500 MB). Two-phase rename-then- remove avoids open-file races on Windows. Auto-runs at the tail of `index rebuild` and `index update`. `tempyr doctor` now reports snapshot-store dir count + total bytes with a `consider \`tempyr snapshot prune\`` hint above the configurable thresholds. `is_snapshot_key` and `short_path_hash` are exposed from `tempyr-core::project` so health and prune use identical filtering. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-03T00:30:59Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 94d2bb06-4379-4f1e-8039-0eb8a1f166bd

📥 Commits

Reviewing files that changed from the base of the PR and between b7406e8 and fe40e73.

📒 Files selected for processing (2)

crates/tempyr-core/src/project.rs
crates/tempyr-index/src/health.rs

📝 Walkthrough

Walkthrough

Adds snapshot-store probing and CLI snapshot commands (prune/list), introduces per-snapshot build locks and bounded rebuild-slot negotiation to reuse or rebuild shared snapshots, wires snapshot metrics into health/doctor output, and updates CLI and ProjectContext wiring to support forced rebuilds and post-operation pruning.

Changes

Snapshot store management, rebuild coordination & CLI wiring

Layer / File(s)	Summary
Core primitives / Layout `crates/tempyr-core/src/project.rs`	Adds `SnapshotBuildLock` with non-blocking `try_acquire`, path helpers on `CacheLayout` (`snapshots_root`, `snapshot_locks_dir`, `snapshot_build_lock_path`, `worktrees_root`), `IndexLayout::try_acquire_snapshot_build_lock`, makes `short_path_hash` public, and adds `is_snapshot_key`.
Health probe & Index section `crates/tempyr-index/src/health.rs`	Adds `snapshot_store_count: Option<usize>` and `snapshot_store_bytes: Option<u64>` to `IndexSection`; introduces `probe_snapshot_store` and wires it into `build_index_section` on both success and error paths.
CLI snapshot commands `crates/tempyr-cli/src/commands/snapshot_cmd.rs`	New `run_prune` and `run_list` with `parse_size`, `PruneOptions`/`PruneReport`/`EvictedEntry`/`EvictionFailure`, three-tier retention (pinned, recent buffer, size cap), two-phase `.gc-*` removal, orphan sweep, enumeration and pin-collection helpers, and unit tests.
Index rebuild negotiation & behavior `crates/tempyr-cli/src/commands/index_cmd.rs`	`run_rebuild` gains `force`; adds `negotiate_rebuild_slot` with bounded publish polling and timed blocking lock acquisition, `wait_for_snapshot` and `acquire_build_lock_blocking` helpers, `seed_and_report` reuse path, scratch rebuild path, emits `"source"` in JSON, performs best-effort snapshot pruning, and adds unit tests covering slot-selection and edge cases.
ProjectContext / CLI wiring `crates/tempyr-cli/src/config.rs`, `crates/tempyr-cli/src/main.rs`, `crates/tempyr-cli/src/commands/mod.rs`, `crates/tempyr-cli/src/commands/init.rs`	`ProjectContext` adds `shared_snapshot_index_path()`, `try_acquire_snapshot_build_lock()`, `cache_layout()` and removes `refresh_index_for_current_snapshot()`. CLI adds `Snapshot` subcommand and `SnapshotAction` (Prune/List); `IndexAction::Rebuild` gains `force`; callsites updated.
Doctor / Health output `crates/tempyr-cli/src/commands/doctor.rs`	Renders snapshot-store metrics when present as `{count} dirs, {human_bytes(bytes)}` with threshold-based prune hint; adds `human_bytes` helper and tests; health fixture defaults snapshot fields to `None`.
Dependencies `crates/tempyr-cli/Cargo.toml`, `crates/tempyr-index/Cargo.toml`	Adds `filetime = "0.2"` to CLI dev-dependencies and `walkdir = { workspace = true }` to index dependencies.

Sequence Diagram

sequenceDiagram
    actor User as User / CLI
    participant Main as CLI Dispatcher
    participant Neg as negotiate_rebuild_slot
    participant Lock as SnapshotBuildLock
    participant Snapshot as Snapshot Store
    participant Reuse as seed_and_report
    participant Rebuild as rebuild_from_scratch
    participant Prune as run_prune

    User->>Main: tempyr index rebuild [--force]
    Main->>Neg: negotiate_rebuild_slot(ctx, force)

    alt shared snapshot exists & not forced
        Neg->>Snapshot: check shared snapshot presence
        Snapshot-->>Neg: found
        Neg-->>Main: RebuildSlot::UseExisting
        Main->>Reuse: seed_and_report()
        Reuse->>Snapshot: open index, read stats
        Reuse->>Reuse: optionally embed, publish key
        Reuse-->>Main: complete (source: reused)
    else needs rebuild
        Neg->>Neg: poll for late publish (bounded)
        Neg->>Lock: try_acquire_snapshot_build_lock()
        alt lock acquired
            Lock-->>Neg: Ok(Some(lock))
            Neg-->>Main: RebuildSlot::Build(lock)
            Main->>Rebuild: rebuild_from_scratch()
            Rebuild->>Rebuild: rebuild index, maybe embed
            Rebuild->>Snapshot: write/publish snapshot key
            Rebuild-->>Main: complete (source: rebuilt)
        else lock unavailable (timeout)
            Lock-->>Neg: Ok(None) / error
            Neg-->>Main: error (lock timeout)
        end
    end

    Main->>Prune: run_prune(..., best-effort)
    Prune->>Snapshot: enumerate snapshots, compute pinned set
    Prune->>Snapshot: evict per retention, two-phase remove, sweep `.gc-*`
    Prune-->>Main: done
    Main-->>User: print result / JSON

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Add worktree-aware graph cache layout #1: Related — touches snapshot/shared-cache and rebuild/lock coordination primitives used here.
feat: add tempyr doctor CLI + system_doctor MCP tool #19: Related — modifies doctor/health reporting and health types that this PR extends.
[codex] Warm worktree indices and rebuild structural snapshots on cache miss #9: Related — changes index rebuild/control surfaces and ProjectContext index APIs similar to this PR.

Poem

🐰
I nibbled logs by moonlit beam,
Locked a snapshot, shared the dream,
Pruned the burrows, kept the best,
Reused a build and saved the rest,
A tidy warren, fast and clean!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the three main changes: deduping rebuild work (short-circuit when snapshot exists), per-key locking (exclusive build lock), and snapshot store pruning (garbage collection).
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/snapshot-store-fixes

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get your free trial and get 200 agent minutes per Slack user (a $50 value).

_{Review rate limit: 4/5 reviews remaining, refill in 12 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…re-fixes # Conflicts: # crates/tempyr-cli/src/commands/mod.rs

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/tempyr-cli/src/commands/index_cmd.rs`:
- Around line 217-222: The code opens the local queryable index
(ctx.queryable_index_path() + Index::open) before ensuring the worktree has
copied the latest published snapshot; call the routine that seeds or syncs the
local queryable index from the shared snapshot (e.g., a
ctx.seed_queryable_from_shared_snapshot(snapshot_key) or similar existing
helper) immediately after computing index_path and before calling Index::open so
the local DB exists and stats are current; leave
ctx.write_active_snapshot_key(snapshot_key) and maybe_embed calls in their
current logical order after the seeding step.
- Around line 44-56: The current `if let RebuildSlot::UseExisting { .. } = slot`
drops `slot` (and its `Build(lock)`) before calling `rebuild_from_scratch()`,
releasing the build lock prematurely; replace the `if let` with a `match` on the
result of `negotiate_rebuild_slot(ctx, &shared, force, BUILD_LOCK_WAIT)?` so you
can bind the `Build(lock)` arm to a local name (e.g., `Build(lock)`) and keep
that binding in scope while you call `rebuild_from_scratch(ctx, &graph,
&snapshot_key, json, skip_embeddings)?`, and have the `UseExisting {
built_by_other }` arm immediately return via `seed_and_report(...)`; this
ensures the lock value lives until after `rebuild_from_scratch()` completes.

In `@crates/tempyr-core/src/project.rs`:
- Around line 794-796: is_snapshot_key currently accepts uppercase A-F because
it uses is_ascii_hexdigit(); tighten it to only accept lowercase hex to match
graph_snapshot_key() by validating each byte against '0'..'9' or 'a'..'f'.
Update the is_snapshot_key function (refer to is_snapshot_key and
graph_snapshot_key) so the predicate becomes something like checking each byte
matches b'0'..=b'9' or b'a'..=b'f' (instead of is_ascii_hexdigit()), and
run/update relevant tests expecting lowercase-only snapshot keys.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f0eada8a-40e2-4a0d-840a-3f03955151b4

📥 Commits

Reviewing files that changed from the base of the PR and between 65c62e0 and 5a53322.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (11)

crates/tempyr-cli/Cargo.toml
crates/tempyr-cli/src/commands/doctor.rs
crates/tempyr-cli/src/commands/index_cmd.rs
crates/tempyr-cli/src/commands/init.rs
crates/tempyr-cli/src/commands/mod.rs
crates/tempyr-cli/src/commands/snapshot_cmd.rs
crates/tempyr-cli/src/config.rs
crates/tempyr-cli/src/main.rs
crates/tempyr-core/src/project.rs
crates/tempyr-index/Cargo.toml
crates/tempyr-index/src/health.rs

coderabbitai · 2026-05-03T00:37:52Z

+    let slot = negotiate_rebuild_slot(ctx, &shared, force, BUILD_LOCK_WAIT)?;
+    if let RebuildSlot::UseExisting { built_by_other } = slot {
+        return seed_and_report(
+            ctx,
+            &graph,
+            &snapshot_key,
+            json,
+            skip_embeddings,
+            built_by_other,
+        );
+    }
+
+    rebuild_from_scratch(ctx, &graph, &snapshot_key, json, skip_embeddings)?;


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Find and examine the RebuildSlot enum definition rg -t rs "enum RebuildSlot" -A 10

Repository: cleak/tempyr

Length of output: 83

🏁 Script executed:

# Read the actual code at lines 44-56 in index_cmd.rs sed -n '40,65p' crates/tempyr-cli/src/commands/index_cmd.rs

Repository: cleak/tempyr

Length of output: 934

🏁 Script executed:

# Check the broader context of the run_rebuild function sed -n '1,100p' crates/tempyr-cli/src/commands/index_cmd.rs | head -80

Repository: cleak/tempyr

Length of output: 3077

🏁 Script executed:

rg "enum RebuildSlot" -A 12 --type-list | head -50

Repository: cleak/tempyr

Length of output: 446

🏁 Script executed:

# Search correctly across rust files find . -name "*.rs" -path "*/commands/index_cmd.rs" | head -1 | xargs grep -A 12 "enum RebuildSlot"

Repository: cleak/tempyr

Length of output: 608

🏁 Script executed:

# Get the full context of the run_rebuild function sed -n '30,75p' crates/tempyr-cli/src/commands/index_cmd.rs

Repository: cleak/tempyr

Length of output: 1508

🏁 Script executed:

# Check how SnapshotBuildLock is defined - this is critical for understanding the lock lifetime rg "struct SnapshotBuildLock|pub struct SnapshotBuildLock" -A 5

Repository: cleak/tempyr

Length of output: 412

🏁 Script executed:

# Check if there are any tests for run_rebuild to understand expected behavior rg "run_rebuild|negotiate_rebuild_slot" -B 2 -A 8 | head -100

Repository: cleak/tempyr

Length of output: 7185

🏁 Script executed:

# Verify the return type of negotiate_rebuild_slot rg "fn negotiate_rebuild_slot" -A 10

Repository: cleak/tempyr

Length of output: 824

Keep the build lock alive through rebuild_from_scratch().

When if let RebuildSlot::UseExisting { .. } = slot doesn't match, Rust drops the entire slot value (including RebuildSlot::Build(lock)) at the end of the if let statement. The build lock is released before rebuild_from_scratch() runs, allowing competing worktrees to acquire the lock and rebuild the same snapshot concurrently.

Use a match statement to extract the lock into a binding, keeping it alive until after rebuild_from_scratch() completes.

Suggested fix

- let slot = negotiate_rebuild_slot(ctx, &shared, force, BUILD_LOCK_WAIT)?; - if let RebuildSlot::UseExisting { built_by_other } = slot { - return seed_and_report( - ctx, - &graph, - &snapshot_key, - json, - skip_embeddings, - built_by_other, - ); - } - - rebuild_from_scratch(ctx, &graph, &snapshot_key, json, skip_embeddings)?; + let build_lock = match negotiate_rebuild_slot(ctx, &shared, force, BUILD_LOCK_WAIT)? { + RebuildSlot::UseExisting { built_by_other } => { + return seed_and_report( + ctx, + &graph, + &snapshot_key, + json, + skip_embeddings, + built_by_other, + ); + } + RebuildSlot::Build(lock) => lock, + }; + + rebuild_from_scratch(ctx, &graph, &snapshot_key, json, skip_embeddings)?; + drop(build_lock);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/tempyr-cli/src/commands/index_cmd.rs` around lines 44 - 56, The current `if let RebuildSlot::UseExisting { .. } = slot` drops `slot` (and its `Build(lock)`) before calling `rebuild_from_scratch()`, releasing the build lock prematurely; replace the `if let` with a `match` on the result of `negotiate_rebuild_slot(ctx, &shared, force, BUILD_LOCK_WAIT)?` so you can bind the `Build(lock)` arm to a local name (e.g., `Build(lock)`) and keep that binding in scope while you call `rebuild_from_scratch(ctx, &graph, &snapshot_key, json, skip_embeddings)?`, and have the `UseExisting { built_by_other }` arm immediately return via `seed_and_report(...)`; this ensures the lock value lives until after `rebuild_from_scratch()` completes.

coderabbitai · 2026-05-03T00:37:53Z

+    let kept_bytes: u64 = entries
+        .iter()
+        .filter(|e| kept.contains(&e.snapshot_key))
+        .map(|e| e.bytes)
+        .sum();
+    let mut running = kept_bytes;
+    let mut kept_under_cap = 0;
+    let mut evict: Vec<&SnapshotEntry> = Vec::new();
+    let mut unkept: Vec<&SnapshotEntry> = entries
+        .iter()
+        .filter(|e| !kept.contains(&e.snapshot_key))
+        .collect();
+    unkept.sort_by_key(|e| std::cmp::Reverse(e.modified_secs));
+    for entry in &unkept {
+        if running + entry.bytes <= opts.max_size_bytes {


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Exclude pinned snapshots from cap accounting.

Line 217 initializes the cap budget from all kept entries, so pinned snapshots still consume --max-size headroom. That contradicts the CLI contract for snapshot prune and will evict unpinned history too aggressively when several live worktrees have pins.

Suggested fix

- let kept_bytes: u64 = entries + let pinned_bytes: u64 = entries + .iter() + .filter(|e| pinned.contains(&e.snapshot_key)) + .map(|e| e.bytes) + .sum(); + let counted_bytes: u64 = entries .iter() - .filter(|e| kept.contains(&e.snapshot_key)) + .filter(|e| kept.contains(&e.snapshot_key) && !pinned.contains(&e.snapshot_key)) .map(|e| e.bytes) .sum(); - let mut running = kept_bytes; + let mut running = counted_bytes; @@ - total_bytes_after_estimate: running, + total_bytes_after_estimate: pinned_bytes + running,

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/tempyr-index/src/health.rs`:
- Around line 442-469: The current traversal in health.rs silently ignores IO
errors (using read.flatten() and multiple continue paths) which yields
misleading partial snapshot totals; change the logic so any failure reading the
snapshots_root entries, calling entry.file_type(), iterating walkdir::WalkDir,
or obtaining sub.metadata() causes an early return of (None, None) (rather than
continue), and only accumulate count/bytes when all operations succeed; update
the iteration over read (the result of std::fs::read_dir(snapshots_root)),
checks around entry.file_type(), the WalkDir iteration, and the meta =
sub.metadata()/meta.is_file() branch to detect and propagate errors back as
(None, None) while keeping the snapshots_root, read, entry, walkdir::WalkDir,
meta.is_file(), bytes and count variables as referenced.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 830f8a67-400b-40eb-9efc-c19c93b3ee35

📥 Commits

Reviewing files that changed from the base of the PR and between 5a53322 and 1338884.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (4)

crates/tempyr-cli/Cargo.toml
crates/tempyr-cli/src/commands/mod.rs
crates/tempyr-index/Cargo.toml
crates/tempyr-index/src/health.rs

Three fixes from review on PR #45, plus a regression test pinning the if-let drop semantics that one piece of feedback misidentified as a bug. - `seed_and_report` now re-seeds the local active.db from the shared snapshot before opening it. Without this, a `built_by_other = true` outcome (concurrent builder published during our wait) would update the worktree cursor to the new key while leaving a stale local index on disk. A subsequent query that fell through to the active path (e.g. after the shared snapshot was pruned) would then read stale bytes against a cursor that claimed they were current. Narrow window but real. - `project::is_snapshot_key` now rejects uppercase A–F. Snapshot keys are produced by BLAKE3 `to_hex()` which always emits lowercase, so any uppercase-named directory under `snapshots/` was created by some foreign process. Treating it as a snapshot would let `tempyr snapshot prune` evict it. Added a paired test that the predicate accepts whatever `graph_snapshot_key()` produces, so the two can't drift. - `health::probe_snapshot_store` now returns `(None, None)` on any IO failure during the walk instead of silently undercounting via `read.flatten()` and per-entry `continue`s. An undercount that hides a multi-GB snapshot store from the doctor's prune hint is worse than a clear "unavailable" signal. Concurrent prune (rename-then-remove) can briefly trigger this; a re-run after prune completes succeeds. - New `build_lock_survives_if_let_match_against_use_existing_arm` test empirically pins the Rust drop semantics in `run_rebuild`: the `if let RebuildSlot::UseExisting { .. } = slot { return ... }` pattern leaves `slot` (and any `Build(SnapshotBuildLock)` it carries) alive for the rest of the function, so the lock is held through `rebuild_from_scratch`. A future refactor that moves out of `slot` in the matched arm would fail this test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The .agents/ dir was already untracked before this branch — it's local agent state, not part of the snapshot store work. Got swept in by a `git add -A` in the prior commit; removing here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/tempyr-core/src/project.rs`:
- Around line 98-100: The snapshot_build_lock_path public helper must reject
non-canonical keys: call is_snapshot_key(snapshot_key) at the top of
snapshot_build_lock_path and return an error instead of blindly joining
unvalidated input (or make the helper non-public); change the signature to
return Result<PathBuf, E> (use the crate's existing error type) and return Err
when validation fails, and apply the same validation/signature change to the
other public lock/path helpers referenced around the same area (the functions
that call snapshot_locks_dir and build lock filenames) so no unvalidated
snapshot_key can produce a path outside the locks directory.

In `@crates/tempyr-index/src/health.rs`:
- Around line 443-446: The probe_snapshot_store function currently returns
(None, None) when snapshots_root.is_dir() is false, which makes a missing
snapshots directory look like a probe failure; change probe_snapshot_store to
treat a missing snapshots_root as an empty store and return (Some(0), Some(0))
instead of (None, None). Locate the probe_snapshot_store function and update the
early-return branch that checks snapshots_root.is_dir() so it returns counts of
zero (0 dirs and 0 bytes) for a non-existent directory while keeping the
existing logic for when the directory exists.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9ed1bebc-c260-426e-8285-2f465803e338

📥 Commits

Reviewing files that changed from the base of the PR and between 1338884 and b7406e8.

📒 Files selected for processing (3)

crates/tempyr-cli/src/commands/index_cmd.rs
crates/tempyr-core/src/project.rs
crates/tempyr-index/src/health.rs

coderabbitai · 2026-05-03T00:55:16Z

+fn negotiate_rebuild_slot(
+    ctx: &ProjectContext,
+    shared: &Path,
+    force: bool,
+    wait_budget: Duration,
+) -> anyhow::Result<RebuildSlot> {
+    if !force && shared.exists() {
+        return Ok(RebuildSlot::UseExisting {
+            built_by_other: false,
+        });
+    }
+
+    if let Some(lock) = ctx.try_acquire_snapshot_build_lock()? {
+        if !force && shared.exists() {
+            // Another process won the race and published while we were
+            // taking the lock. Release and use what they built.
+            drop(lock);
+            return Ok(RebuildSlot::UseExisting {
+                built_by_other: true,
+            });
+        }
+        return Ok(RebuildSlot::Build(lock));
+    }
+
+    // Lock is held by another rebuild. Wait for them to publish.
+    wait_for_snapshot(shared, wait_budget);
+    if !force && shared.exists() {
+        return Ok(RebuildSlot::UseExisting {
+            built_by_other: true,
+        });
+    }
+
+    // Other builder didn't publish in time. Block on the lock so the work
+    // is still serialized — better to wait than to double-build.
+    let lock = acquire_build_lock_blocking(ctx, wait_budget)?;
+    if !force && shared.exists() {
+        drop(lock);
+        return Ok(RebuildSlot::UseExisting {
+            built_by_other: true,
+        });
+    }
+    Ok(RebuildSlot::Build(lock))


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

--force still leaves the old shared snapshot in place.

The new force path correctly bypasses reuse, but it still publishes through the normal snapshot publisher. In crates/tempyr-core/src/project.rs, publish_snapshot_from() returns immediately when the shared snapshot already exists, so this rebuild only repairs the worktree-local DB. The shared snapshot that other worktrees reuse remains unchanged, which breaks the advertised corruption-recovery behavior.

You need a force-aware publish/replace path under the held per-key lock so the rebuilt index actually replaces the existing shared snapshot atomically.

Also applies to: 128-149

Two more fixes from review on PR #45. - `SnapshotBuildLock::try_acquire` now validates `snapshot_key` against `is_snapshot_key()` before doing IO. The path helpers (`snapshot_locks_dir`, `snapshot_build_lock_path`) are pure path construction and would happily produce a path outside the locks directory if given input like `"../../etc/passwd"`. All current internal callers source the key from `IndexLayout::snapshot_key()` (always BLAKE3 hex), but the function is `pub` so external code could pass arbitrary input. Validation at the IO boundary closes that surface without splaying `Result` returns across every path helper. Test: `snapshot_build_lock_rejects_path_traversal_keys`. - `health::probe_snapshot_store` now distinguishes "no snapshot store yet" from "probe failed". A missing snapshots directory returns `(Some(0), Some(0))` so `tempyr doctor` shows `snapshot store: 0 dirs, 0 B (ok)` on a fresh project instead of silently suppressing the line. Real anomalies (path exists but is not a directory, IO errors mid-walk) still return `(None, None)` so doctor's render can signal "unavailable". Tests: missing-dir, non-directory, and a fresh test that the canonical-snapshot filter excludes `.locks` and `.gc-*`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/master' into claude/snapshot-sto…

1338884

…re-fixes # Conflicts: # crates/tempyr-cli/src/commands/mod.rs

coderabbitai Bot reviewed May 3, 2026

View reviewed changes

Comment thread crates/tempyr-index/src/health.rs Outdated

cleak and others added 2 commits May 2, 2026 17:46

coderabbitai Bot reviewed May 3, 2026

View reviewed changes

cleak merged commit ef0d861 into master May 3, 2026
5 checks passed

cleak deleted the claude/snapshot-store-fixes branch May 3, 2026 04:29

Conversation

cleak commented May 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changes

Live verification on the originating repo

Behavior notes

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cleak commented May 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 3, 2026 •

edited

Loading