Skip to content

feat(snapshot): dedupe rebuild work, lock per key, and prune the store#45

Merged
cleak merged 5 commits into
masterfrom
claude/snapshot-store-fixes
May 3, 2026
Merged

feat(snapshot): dedupe rebuild work, lock per key, and prune the store#45
cleak merged 5 commits into
masterfrom
claude/snapshot-store-fixes

Conversation

@cleak
Copy link
Copy Markdown
Owner

@cleak cleak commented May 3, 2026

Summary

Eliminates three sources of waste in the shared snapshot index store at <git-common-dir>/tempyr/snapshots/<key>/. On a 30-worktree user the snapshot dir grew to 1.6 GB / 884 dirs because every tempyr index rebuild rewrote bytes that another worktree had already produced, and there was no GC.

What changes

  1. tempyr index rebuild short-circuits when the snapshot already exists.
    index_cmd::run_rebuild now seeds from snapshots/<key>/index.db and returns instead of re-tokenizing the corpus, re-running FTS5, and re-publishing identical bytes. New --force flag preserves the old behavior for corruption recovery.

  2. Per-snapshot-key build lock. New SnapshotBuildLock (same File::try_lock pattern as the existing PublisherLock) takes an exclusive file lock at snapshots/.locks/<key>.build.lock. A second worktree racing on the same never-seen graph state sees the lock contended, waits for the snapshot to land, then short-circuits through path 1 — never double-builds.

  3. tempyr snapshot {prune,list} with Nix-style hybrid retention:

    • Pinned: every live worktree's <shared_root>/worktrees/<wt-id>/snapshot-key.txt cursor (cross-checked against git worktree list --porcelain so dangling cursors from deleted worktrees don't keep dead snapshots alive).
    • Recent buffer: --keep-recent newest by mtime (default 20) above pinned.
    • Soft cap: --max-size (default 500 MB) — newest unpinned win the headroom; older oversized ones evict.
    • Two-phase delete: rename to .gc-<key>-<pid>-<ts>/ then remove_dir_all, with sweep of orphaned stubs from prior runs (handles Windows EBUSY when a long-running query holds the file open).
    • Auto-runs at the tail of index rebuild and index update (best-effort, errors swallowed).

    Plus tempyr doctor now reports snapshot-store dir count + total bytes with a hint:

    snapshot store: 886 dirs, 1.54 GB (consider \`tempyr snapshot prune\`)
    

Live verification on the originating repo

  • tempyr index rebuild (no flag): `Index reused from existing snapshot: 354 nodes, 872 edges, 354 FTS entries`. Embeddings: 0 fetched, 354 cached. Zero work.
  • `tempyr snapshot prune --dry-run`: "would evict 671 of 886 snapshots, kept 215 (7 pinned, 20 buffer, 188 under cap)". Would shrink 1.6 GB → ~80 MB without losing any embedding spend (embeddings are content-keyed in the separate shared store).

Behavior notes

  • tempyr index rebuild no longer always rewrites. This is the headline behavior change. Existing callers that ran rebuild expecting fresh bytes get a no-op when the snapshot is already on disk. Use `--force` to recover from corruption (help text calls this out).
  • Embedding cost: zero impact. Embeddings live in a separate content-keyed store (`embeddings/.db`); deleting a snapshot doesn't lose any embedding.
  • Concurrency: WAL-mode SQLite already supports many readers + occasional writer. The build lock only serializes writers (rebuild). Reads from active queries are unaffected.

Test plan

  • `cargo fmt --check` clean
  • `cargo clippy --workspace --all-targets -- -D warnings` clean
  • `cargo test --workspace` — 656 passing, 0 failing, 12 ignored (was 583 on master; +73 includes 6 new `negotiate_rebuild_slot` tests covering both wait-then-short-circuit and blocking-acquire paths, 5 new prune-policy tests, plus build-lock and snapshot-key-validity tests)
  • Live smoke: `tempyr index rebuild` short-circuits on unchanged graph; `tempyr snapshot prune --dry-run` plans correctly against the originating 886-snapshot store; `tempyr doctor` shows the new snapshot-store summary line.
  • Reviewer: spot-check the `negotiate_rebuild_slot` enum-based control flow in `crates/tempyr-cli/src/commands/index_cmd.rs` and the two-phase delete in `crates/tempyr-cli/src/commands/snapshot_cmd.rs::two_phase_remove`.
  • On a non-Windows host: confirm the build-lock blocking-acquire timeout test passes (it polls a real `File::try_lock`).

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added snapshot management: tempyr snapshot prune (size/recency retention) and tempyr snapshot list (JSON or human).
    • Rebuild gains a --force option; rebuild/update JSON now includes a source field (rebuilt vs reused).
  • Improvements

    • Rebuild/update coordinate concurrent rebuilds to prefer reuse when possible and run best-effort pruning afterward.
    • doctor now reports snapshot store usage with pruning hints.
    • Health reporting includes snapshot store counts and sizes when available.
  • Chores

    • Updated build/dev dependencies.

Eliminates three sources of waste in the shared snapshot index store
under `<git-common-dir>/tempyr/snapshots/<key>/`:

1. `tempyr index rebuild` short-circuits to a free seed when the shared
   snapshot for the current graph state already exists, instead of
   rebuilding from scratch and producing identical bytes. Adds `--force`
   for corruption recovery.

2. New `SnapshotBuildLock` (per snapshot key, exclusive file lock under
   `snapshots/.locks/`) serializes concurrent rebuilds. A second worktree
   racing on the same never-seen graph state waits for the first to
   publish, then short-circuits through path 1.

3. New `tempyr snapshot {prune,list}` commands implement Nix-style
   hybrid retention: pinned (live worktree cursors cross-checked against
   `git worktree list`) ∪ recent buffer (`--keep-recent`, default 20) ∪
   soft size cap (`--max-size`, default 500 MB). Two-phase rename-then-
   remove avoids open-file races on Windows. Auto-runs at the tail of
   `index rebuild` and `index update`.

`tempyr doctor` now reports snapshot-store dir count + total bytes with
a `consider \`tempyr snapshot prune\`` hint above the configurable
thresholds. `is_snapshot_key` and `short_path_hash` are exposed from
`tempyr-core::project` so health and prune use identical filtering.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 3, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 94d2bb06-4379-4f1e-8039-0eb8a1f166bd

📥 Commits

Reviewing files that changed from the base of the PR and between b7406e8 and fe40e73.

📒 Files selected for processing (2)
  • crates/tempyr-core/src/project.rs
  • crates/tempyr-index/src/health.rs

📝 Walkthrough

Walkthrough

Adds snapshot-store probing and CLI snapshot commands (prune/list), introduces per-snapshot build locks and bounded rebuild-slot negotiation to reuse or rebuild shared snapshots, wires snapshot metrics into health/doctor output, and updates CLI and ProjectContext wiring to support forced rebuilds and post-operation pruning.

Changes

Snapshot store management, rebuild coordination & CLI wiring

Layer / File(s) Summary
Core primitives / Layout
crates/tempyr-core/src/project.rs
Adds SnapshotBuildLock with non-blocking try_acquire, path helpers on CacheLayout (snapshots_root, snapshot_locks_dir, snapshot_build_lock_path, worktrees_root), IndexLayout::try_acquire_snapshot_build_lock, makes short_path_hash public, and adds is_snapshot_key.
Health probe & Index section
crates/tempyr-index/src/health.rs
Adds snapshot_store_count: Option<usize> and snapshot_store_bytes: Option<u64> to IndexSection; introduces probe_snapshot_store and wires it into build_index_section on both success and error paths.
CLI snapshot commands
crates/tempyr-cli/src/commands/snapshot_cmd.rs
New run_prune and run_list with parse_size, PruneOptions/PruneReport/EvictedEntry/EvictionFailure, three-tier retention (pinned, recent buffer, size cap), two-phase .gc-* removal, orphan sweep, enumeration and pin-collection helpers, and unit tests.
Index rebuild negotiation & behavior
crates/tempyr-cli/src/commands/index_cmd.rs
run_rebuild gains force; adds negotiate_rebuild_slot with bounded publish polling and timed blocking lock acquisition, wait_for_snapshot and acquire_build_lock_blocking helpers, seed_and_report reuse path, scratch rebuild path, emits "source" in JSON, performs best-effort snapshot pruning, and adds unit tests covering slot-selection and edge cases.
ProjectContext / CLI wiring
crates/tempyr-cli/src/config.rs, crates/tempyr-cli/src/main.rs, crates/tempyr-cli/src/commands/mod.rs, crates/tempyr-cli/src/commands/init.rs
ProjectContext adds shared_snapshot_index_path(), try_acquire_snapshot_build_lock(), cache_layout() and removes refresh_index_for_current_snapshot(). CLI adds Snapshot subcommand and SnapshotAction (Prune/List); IndexAction::Rebuild gains force; callsites updated.
Doctor / Health output
crates/tempyr-cli/src/commands/doctor.rs
Renders snapshot-store metrics when present as {count} dirs, {human_bytes(bytes)} with threshold-based prune hint; adds human_bytes helper and tests; health fixture defaults snapshot fields to None.
Dependencies
crates/tempyr-cli/Cargo.toml, crates/tempyr-index/Cargo.toml
Adds filetime = "0.2" to CLI dev-dependencies and walkdir = { workspace = true } to index dependencies.

Sequence Diagram

sequenceDiagram
    actor User as User / CLI
    participant Main as CLI Dispatcher
    participant Neg as negotiate_rebuild_slot
    participant Lock as SnapshotBuildLock
    participant Snapshot as Snapshot Store
    participant Reuse as seed_and_report
    participant Rebuild as rebuild_from_scratch
    participant Prune as run_prune

    User->>Main: tempyr index rebuild [--force]
    Main->>Neg: negotiate_rebuild_slot(ctx, force)

    alt shared snapshot exists & not forced
        Neg->>Snapshot: check shared snapshot presence
        Snapshot-->>Neg: found
        Neg-->>Main: RebuildSlot::UseExisting
        Main->>Reuse: seed_and_report()
        Reuse->>Snapshot: open index, read stats
        Reuse->>Reuse: optionally embed, publish key
        Reuse-->>Main: complete (source: reused)
    else needs rebuild
        Neg->>Neg: poll for late publish (bounded)
        Neg->>Lock: try_acquire_snapshot_build_lock()
        alt lock acquired
            Lock-->>Neg: Ok(Some(lock))
            Neg-->>Main: RebuildSlot::Build(lock)
            Main->>Rebuild: rebuild_from_scratch()
            Rebuild->>Rebuild: rebuild index, maybe embed
            Rebuild->>Snapshot: write/publish snapshot key
            Rebuild-->>Main: complete (source: rebuilt)
        else lock unavailable (timeout)
            Lock-->>Neg: Ok(None) / error
            Neg-->>Main: error (lock timeout)
        end
    end

    Main->>Prune: run_prune(..., best-effort)
    Prune->>Snapshot: enumerate snapshots, compute pinned set
    Prune->>Snapshot: evict per retention, two-phase remove, sweep `.gc-*`
    Prune-->>Main: done
    Main-->>User: print result / JSON
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰
I nibbled logs by moonlit beam,
Locked a snapshot, shared the dream,
Pruned the burrows, kept the best,
Reused a build and saved the rest,
A tidy warren, fast and clean!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the three main changes: deduping rebuild work (short-circuit when snapshot exists), per-key locking (exclusive build lock), and snapshot store pruning (garbage collection).
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/snapshot-store-fixes

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get your free trial and get 200 agent minutes per Slack user (a $50 value).


Review rate limit: 4/5 reviews remaining, refill in 12 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

…re-fixes

# Conflicts:
#	crates/tempyr-cli/src/commands/mod.rs
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/tempyr-cli/src/commands/index_cmd.rs`:
- Around line 217-222: The code opens the local queryable index
(ctx.queryable_index_path() + Index::open) before ensuring the worktree has
copied the latest published snapshot; call the routine that seeds or syncs the
local queryable index from the shared snapshot (e.g., a
ctx.seed_queryable_from_shared_snapshot(snapshot_key) or similar existing
helper) immediately after computing index_path and before calling Index::open so
the local DB exists and stats are current; leave
ctx.write_active_snapshot_key(snapshot_key) and maybe_embed calls in their
current logical order after the seeding step.
- Around line 44-56: The current `if let RebuildSlot::UseExisting { .. } = slot`
drops `slot` (and its `Build(lock)`) before calling `rebuild_from_scratch()`,
releasing the build lock prematurely; replace the `if let` with a `match` on the
result of `negotiate_rebuild_slot(ctx, &shared, force, BUILD_LOCK_WAIT)?` so you
can bind the `Build(lock)` arm to a local name (e.g., `Build(lock)`) and keep
that binding in scope while you call `rebuild_from_scratch(ctx, &graph,
&snapshot_key, json, skip_embeddings)?`, and have the `UseExisting {
built_by_other }` arm immediately return via `seed_and_report(...)`; this
ensures the lock value lives until after `rebuild_from_scratch()` completes.

In `@crates/tempyr-core/src/project.rs`:
- Around line 794-796: is_snapshot_key currently accepts uppercase A-F because
it uses is_ascii_hexdigit(); tighten it to only accept lowercase hex to match
graph_snapshot_key() by validating each byte against '0'..'9' or 'a'..'f'.
Update the is_snapshot_key function (refer to is_snapshot_key and
graph_snapshot_key) so the predicate becomes something like checking each byte
matches b'0'..=b'9' or b'a'..=b'f' (instead of is_ascii_hexdigit()), and
run/update relevant tests expecting lowercase-only snapshot keys.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f0eada8a-40e2-4a0d-840a-3f03955151b4

📥 Commits

Reviewing files that changed from the base of the PR and between 65c62e0 and 5a53322.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (11)
  • crates/tempyr-cli/Cargo.toml
  • crates/tempyr-cli/src/commands/doctor.rs
  • crates/tempyr-cli/src/commands/index_cmd.rs
  • crates/tempyr-cli/src/commands/init.rs
  • crates/tempyr-cli/src/commands/mod.rs
  • crates/tempyr-cli/src/commands/snapshot_cmd.rs
  • crates/tempyr-cli/src/config.rs
  • crates/tempyr-cli/src/main.rs
  • crates/tempyr-core/src/project.rs
  • crates/tempyr-index/Cargo.toml
  • crates/tempyr-index/src/health.rs

Comment on lines +44 to +56
let slot = negotiate_rebuild_slot(ctx, &shared, force, BUILD_LOCK_WAIT)?;
if let RebuildSlot::UseExisting { built_by_other } = slot {
return seed_and_report(
ctx,
&graph,
&snapshot_key,
json,
skip_embeddings,
built_by_other,
);
}

rebuild_from_scratch(ctx, &graph, &snapshot_key, json, skip_embeddings)?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Find and examine the RebuildSlot enum definition
rg -t rs "enum RebuildSlot" -A 10

Repository: cleak/tempyr

Length of output: 83


🏁 Script executed:

# Read the actual code at lines 44-56 in index_cmd.rs
sed -n '40,65p' crates/tempyr-cli/src/commands/index_cmd.rs

Repository: cleak/tempyr

Length of output: 934


🏁 Script executed:

# Check the broader context of the run_rebuild function
sed -n '1,100p' crates/tempyr-cli/src/commands/index_cmd.rs | head -80

Repository: cleak/tempyr

Length of output: 3077


🏁 Script executed:

rg "enum RebuildSlot" -A 12 --type-list | head -50

Repository: cleak/tempyr

Length of output: 446


🏁 Script executed:

# Search correctly across rust files
find . -name "*.rs" -path "*/commands/index_cmd.rs" | head -1 | xargs grep -A 12 "enum RebuildSlot"

Repository: cleak/tempyr

Length of output: 608


🏁 Script executed:

# Get the full context of the run_rebuild function
sed -n '30,75p' crates/tempyr-cli/src/commands/index_cmd.rs

Repository: cleak/tempyr

Length of output: 1508


🏁 Script executed:

# Check how SnapshotBuildLock is defined - this is critical for understanding the lock lifetime
rg "struct SnapshotBuildLock|pub struct SnapshotBuildLock" -A 5

Repository: cleak/tempyr

Length of output: 412


🏁 Script executed:

# Check if there are any tests for run_rebuild to understand expected behavior
rg "run_rebuild|negotiate_rebuild_slot" -B 2 -A 8 | head -100

Repository: cleak/tempyr

Length of output: 7185


🏁 Script executed:

# Verify the return type of negotiate_rebuild_slot
rg "fn negotiate_rebuild_slot" -A 10

Repository: cleak/tempyr

Length of output: 824


Keep the build lock alive through rebuild_from_scratch().

When if let RebuildSlot::UseExisting { .. } = slot doesn't match, Rust drops the entire slot value (including RebuildSlot::Build(lock)) at the end of the if let statement. The build lock is released before rebuild_from_scratch() runs, allowing competing worktrees to acquire the lock and rebuild the same snapshot concurrently.

Use a match statement to extract the lock into a binding, keeping it alive until after rebuild_from_scratch() completes.

Suggested fix
-    let slot = negotiate_rebuild_slot(ctx, &shared, force, BUILD_LOCK_WAIT)?;
-    if let RebuildSlot::UseExisting { built_by_other } = slot {
-        return seed_and_report(
-            ctx,
-            &graph,
-            &snapshot_key,
-            json,
-            skip_embeddings,
-            built_by_other,
-        );
-    }
-
-    rebuild_from_scratch(ctx, &graph, &snapshot_key, json, skip_embeddings)?;
+    let build_lock = match negotiate_rebuild_slot(ctx, &shared, force, BUILD_LOCK_WAIT)? {
+        RebuildSlot::UseExisting { built_by_other } => {
+            return seed_and_report(
+                ctx,
+                &graph,
+                &snapshot_key,
+                json,
+                skip_embeddings,
+                built_by_other,
+            );
+        }
+        RebuildSlot::Build(lock) => lock,
+    };
+
+    rebuild_from_scratch(ctx, &graph, &snapshot_key, json, skip_embeddings)?;
+    drop(build_lock);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/tempyr-cli/src/commands/index_cmd.rs` around lines 44 - 56, The
current `if let RebuildSlot::UseExisting { .. } = slot` drops `slot` (and its
`Build(lock)`) before calling `rebuild_from_scratch()`, releasing the build lock
prematurely; replace the `if let` with a `match` on the result of
`negotiate_rebuild_slot(ctx, &shared, force, BUILD_LOCK_WAIT)?` so you can bind
the `Build(lock)` arm to a local name (e.g., `Build(lock)`) and keep that
binding in scope while you call `rebuild_from_scratch(ctx, &graph,
&snapshot_key, json, skip_embeddings)?`, and have the `UseExisting {
built_by_other }` arm immediately return via `seed_and_report(...)`; this
ensures the lock value lives until after `rebuild_from_scratch()` completes.

Comment thread crates/tempyr-cli/src/commands/index_cmd.rs
Comment on lines +217 to +231
let kept_bytes: u64 = entries
.iter()
.filter(|e| kept.contains(&e.snapshot_key))
.map(|e| e.bytes)
.sum();
let mut running = kept_bytes;
let mut kept_under_cap = 0;
let mut evict: Vec<&SnapshotEntry> = Vec::new();
let mut unkept: Vec<&SnapshotEntry> = entries
.iter()
.filter(|e| !kept.contains(&e.snapshot_key))
.collect();
unkept.sort_by_key(|e| std::cmp::Reverse(e.modified_secs));
for entry in &unkept {
if running + entry.bytes <= opts.max_size_bytes {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Exclude pinned snapshots from cap accounting.

Line 217 initializes the cap budget from all kept entries, so pinned snapshots still consume --max-size headroom. That contradicts the CLI contract for snapshot prune and will evict unpinned history too aggressively when several live worktrees have pins.

Suggested fix
-    let kept_bytes: u64 = entries
+    let pinned_bytes: u64 = entries
+        .iter()
+        .filter(|e| pinned.contains(&e.snapshot_key))
+        .map(|e| e.bytes)
+        .sum();
+    let counted_bytes: u64 = entries
         .iter()
-        .filter(|e| kept.contains(&e.snapshot_key))
+        .filter(|e| kept.contains(&e.snapshot_key) && !pinned.contains(&e.snapshot_key))
         .map(|e| e.bytes)
         .sum();
-    let mut running = kept_bytes;
+    let mut running = counted_bytes;
@@
-        total_bytes_after_estimate: running,
+        total_bytes_after_estimate: pinned_bytes + running,

Comment thread crates/tempyr-core/src/project.rs
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/tempyr-index/src/health.rs`:
- Around line 442-469: The current traversal in health.rs silently ignores IO
errors (using read.flatten() and multiple continue paths) which yields
misleading partial snapshot totals; change the logic so any failure reading the
snapshots_root entries, calling entry.file_type(), iterating walkdir::WalkDir,
or obtaining sub.metadata() causes an early return of (None, None) (rather than
continue), and only accumulate count/bytes when all operations succeed; update
the iteration over read (the result of std::fs::read_dir(snapshots_root)),
checks around entry.file_type(), the WalkDir iteration, and the meta =
sub.metadata()/meta.is_file() branch to detect and propagate errors back as
(None, None) while keeping the snapshots_root, read, entry, walkdir::WalkDir,
meta.is_file(), bytes and count variables as referenced.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 830f8a67-400b-40eb-9efc-c19c93b3ee35

📥 Commits

Reviewing files that changed from the base of the PR and between 5a53322 and 1338884.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • crates/tempyr-cli/Cargo.toml
  • crates/tempyr-cli/src/commands/mod.rs
  • crates/tempyr-index/Cargo.toml
  • crates/tempyr-index/src/health.rs

Comment thread crates/tempyr-index/src/health.rs Outdated
cleak and others added 2 commits May 2, 2026 17:46
Three fixes from review on PR #45, plus a regression test pinning the
if-let drop semantics that one piece of feedback misidentified as a bug.

- `seed_and_report` now re-seeds the local active.db from the shared
  snapshot before opening it. Without this, a `built_by_other = true`
  outcome (concurrent builder published during our wait) would update
  the worktree cursor to the new key while leaving a stale local index
  on disk. A subsequent query that fell through to the active path
  (e.g. after the shared snapshot was pruned) would then read stale
  bytes against a cursor that claimed they were current. Narrow window
  but real.

- `project::is_snapshot_key` now rejects uppercase A–F. Snapshot keys
  are produced by BLAKE3 `to_hex()` which always emits lowercase, so
  any uppercase-named directory under `snapshots/` was created by some
  foreign process. Treating it as a snapshot would let `tempyr snapshot
  prune` evict it. Added a paired test that the predicate accepts
  whatever `graph_snapshot_key()` produces, so the two can't drift.

- `health::probe_snapshot_store` now returns `(None, None)` on any IO
  failure during the walk instead of silently undercounting via
  `read.flatten()` and per-entry `continue`s. An undercount that hides
  a multi-GB snapshot store from the doctor's prune hint is worse than
  a clear "unavailable" signal. Concurrent prune (rename-then-remove)
  can briefly trigger this; a re-run after prune completes succeeds.

- New `build_lock_survives_if_let_match_against_use_existing_arm` test
  empirically pins the Rust drop semantics in `run_rebuild`: the
  `if let RebuildSlot::UseExisting { .. } = slot { return ... }`
  pattern leaves `slot` (and any `Build(SnapshotBuildLock)` it carries)
  alive for the rest of the function, so the lock is held through
  `rebuild_from_scratch`. A future refactor that moves out of `slot`
  in the matched arm would fail this test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The .agents/ dir was already untracked before this branch — it's local
agent state, not part of the snapshot store work. Got swept in by a
`git add -A` in the prior commit; removing here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/tempyr-core/src/project.rs`:
- Around line 98-100: The snapshot_build_lock_path public helper must reject
non-canonical keys: call is_snapshot_key(snapshot_key) at the top of
snapshot_build_lock_path and return an error instead of blindly joining
unvalidated input (or make the helper non-public); change the signature to
return Result<PathBuf, E> (use the crate's existing error type) and return Err
when validation fails, and apply the same validation/signature change to the
other public lock/path helpers referenced around the same area (the functions
that call snapshot_locks_dir and build lock filenames) so no unvalidated
snapshot_key can produce a path outside the locks directory.

In `@crates/tempyr-index/src/health.rs`:
- Around line 443-446: The probe_snapshot_store function currently returns
(None, None) when snapshots_root.is_dir() is false, which makes a missing
snapshots directory look like a probe failure; change probe_snapshot_store to
treat a missing snapshots_root as an empty store and return (Some(0), Some(0))
instead of (None, None). Locate the probe_snapshot_store function and update the
early-return branch that checks snapshots_root.is_dir() so it returns counts of
zero (0 dirs and 0 bytes) for a non-existent directory while keeping the
existing logic for when the directory exists.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9ed1bebc-c260-426e-8285-2f465803e338

📥 Commits

Reviewing files that changed from the base of the PR and between 1338884 and b7406e8.

📒 Files selected for processing (3)
  • crates/tempyr-cli/src/commands/index_cmd.rs
  • crates/tempyr-core/src/project.rs
  • crates/tempyr-index/src/health.rs

Comment on lines +84 to +125
fn negotiate_rebuild_slot(
ctx: &ProjectContext,
shared: &Path,
force: bool,
wait_budget: Duration,
) -> anyhow::Result<RebuildSlot> {
if !force && shared.exists() {
return Ok(RebuildSlot::UseExisting {
built_by_other: false,
});
}

if let Some(lock) = ctx.try_acquire_snapshot_build_lock()? {
if !force && shared.exists() {
// Another process won the race and published while we were
// taking the lock. Release and use what they built.
drop(lock);
return Ok(RebuildSlot::UseExisting {
built_by_other: true,
});
}
return Ok(RebuildSlot::Build(lock));
}

// Lock is held by another rebuild. Wait for them to publish.
wait_for_snapshot(shared, wait_budget);
if !force && shared.exists() {
return Ok(RebuildSlot::UseExisting {
built_by_other: true,
});
}

// Other builder didn't publish in time. Block on the lock so the work
// is still serialized — better to wait than to double-build.
let lock = acquire_build_lock_blocking(ctx, wait_budget)?;
if !force && shared.exists() {
drop(lock);
return Ok(RebuildSlot::UseExisting {
built_by_other: true,
});
}
Ok(RebuildSlot::Build(lock))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

--force still leaves the old shared snapshot in place.

The new force path correctly bypasses reuse, but it still publishes through the normal snapshot publisher. In crates/tempyr-core/src/project.rs, publish_snapshot_from() returns immediately when the shared snapshot already exists, so this rebuild only repairs the worktree-local DB. The shared snapshot that other worktrees reuse remains unchanged, which breaks the advertised corruption-recovery behavior.

You need a force-aware publish/replace path under the held per-key lock so the rebuilt index actually replaces the existing shared snapshot atomically.

Also applies to: 128-149

Comment thread crates/tempyr-core/src/project.rs
Comment thread crates/tempyr-index/src/health.rs
Two more fixes from review on PR #45.

- `SnapshotBuildLock::try_acquire` now validates `snapshot_key` against
  `is_snapshot_key()` before doing IO. The path helpers (`snapshot_locks_dir`,
  `snapshot_build_lock_path`) are pure path construction and would happily
  produce a path outside the locks directory if given input like
  `"../../etc/passwd"`. All current internal callers source the key from
  `IndexLayout::snapshot_key()` (always BLAKE3 hex), but the function is
  `pub` so external code could pass arbitrary input. Validation at the IO
  boundary closes that surface without splaying `Result` returns across
  every path helper. Test: `snapshot_build_lock_rejects_path_traversal_keys`.

- `health::probe_snapshot_store` now distinguishes "no snapshot store yet"
  from "probe failed". A missing snapshots directory returns
  `(Some(0), Some(0))` so `tempyr doctor` shows
  `snapshot store: 0 dirs, 0 B (ok)` on a fresh project instead of silently
  suppressing the line. Real anomalies (path exists but is not a directory,
  IO errors mid-walk) still return `(None, None)` so doctor's render can
  signal "unavailable". Tests: missing-dir, non-directory, and a fresh test
  that the canonical-snapshot filter excludes `.locks` and `.gc-*`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cleak cleak merged commit ef0d861 into master May 3, 2026
5 checks passed
@cleak cleak deleted the claude/snapshot-store-fixes branch May 3, 2026 04:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant