Skip to content

fix(storage): ServerCredentialStore put-race + vault orphan on rollback failure (sec)#540

Merged
sroussey merged 4 commits into
mainfrom
claude/beautiful-mayer-iDxau
May 28, 2026
Merged

fix(storage): ServerCredentialStore put-race + vault orphan on rollback failure (sec)#540
sroussey merged 4 commits into
mainfrom
claude/beautiful-mayer-iDxau

Conversation

@sroussey
Copy link
Copy Markdown
Collaborator

Summary

Closes two correctness/safety bugs in ServerCredentialStore (packages/storage/src/credentials/ServerCredentialStore.ts).

Bug 1 — put-during-get race for new entries

The previous put() wrote the vault first, then the metadata row. A concurrent get() for a brand-new key could observe metadata.get(id) === undefined and return undefined, while has() would shortly afterwards report the same key as present — but more importantly, for an UPDATE concurrent with a get(), the metadata row pointed at the OLD logical value while the vault had ALREADY been overwritten with the NEW value, so get() could return the new ciphertext under the old updatedAt — a torn read. For a brand-new entry, the inverse hole existed: the vault had bytes but no metadata, so the value was unreachable and orphaned if the metadata write then failed.

Bug 2 — silent vault orphan on rollback failure

The previous rollback path used await this.vault.deleteSecret(id).catch(() => undefined), silently swallowing rollback errors. The vault could be left holding plaintext for an id with no metadata pointing at it, with no signal to operators.

Chosen scheme

  1. Metadata-first put() for new entries, with a pending: true flag on the metadata row:
    • metadata.put(id, { ...row, pending: true })
    • await vault.setSecret(id, value)
    • On success, clear the flag: metadata.put(id, { ...row, pending: false }).
    • All readers (get, has, listMetadata, keys) treat pending === true as absent, so the in-flight row is invisible until committed.
  2. Surface rollback failure. If vault.setSecret throws, attempt metadata.delete(id); if THAT also throws, persist { pending: true, orphanedAt: new Date().toISOString() } so the row remains invisible to readers but is discoverable by an operator/reconciler, and throw a wrapped Error("ServerCredentialStore.put: vault write failed and rollback may have orphaned vault id <id>", { cause: vaultError }).
  3. Updates are best-effort, non-atomic, documented in a code comment. A concurrent get() may observe either the old or new value, but never a torn read of vault+metadata (get() reads metadata first, then the vault id — both writes are for the same id).
  4. deleteAll() scope-tightened. The internal scopedIds() helper now filters via entry.key.startsWith(this.prefix) first (cheap string check) and re-asserts userId/projectId from the row as defence-in-depth. Pending/orphaned rows in the current scope are picked up by deleteAll().

Compatibility

No migration needed — pending and orphanedAt are optional fields, absent on existing rows; readers treat absent/false as committed.

Deferred follow-ups

  • Versioned vault ids to close the update-during-get window for the rare case where readers need a strict snapshot. The current scheme already prevents torn reads (get() resolves a single id and reads the value for that id), but a strict snapshot would require write-new-id-then-flip-metadata semantics. Out of scope here.
  • Prefix-scan primitive on IKvStorage so deleteAll() is O(scope) instead of O(all rows) across all KV implementations. Today scopedIds() calls getAll() and filters in memory; that's fine for the in-memory and small-row backends but suboptimal for SQL-backed KVs.

Test plan

  • bun test packages/storage/src/credentials/ServerCredentialStore.test.ts — existing 10 cases plus 5 new ones (concurrent put, metadata-failure rollback, vault+rollback failure with orphan marker, pending-row invisibility, deleteAll scope isolation).
  • Manual: verify no consumer outside packages/storage/src/credentials/ was touched (constraint).
  • Manual: verify no IKvStorage or SecretVault.ts change (constraint).

Generated by Claude Code

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 28, 2026

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 61.97% 24614 / 39717
🔵 Statements 61.83% 25468 / 41187
🔵 Functions 62.93% 4662 / 7408
🔵 Branches 50.67% 12065 / 23809
File CoverageNo changed files found.
Generated in workflow #2452 for commit 4aedc14 by the Vitest Coverage Report Action

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses correctness/safety issues in ServerCredentialStore by preventing visibility of mid-flight new credential writes and by improving rollback/orphan detection paths when vault writes fail. It also tightens deleteAll() scoping behavior and adds tests for the new semantics.

Changes:

  • Add pending / orphanedAt metadata fields and treat pending rows as absent in readers (get, has, listMetadata, keys).
  • Rework put() for new entries to be metadata-first with a pending flag, plus an orphan-marker fallback when rollback metadata deletion fails.
  • Tighten deleteAll() to filter by key prefix first and add new tests for concurrency, pending invisibility, orphan marker behavior, and scope isolation.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
packages/storage/src/credentials/ServerCredentialStore.ts Introduces pending/orphan metadata, reorders new-entry put() to metadata-first, adjusts readers and scope filtering.
packages/storage/src/credentials/ServerCredentialStore.test.ts Adds tests for pending-row invisibility, rollback/orphan marker behavior, concurrency, and deleteAll scope isolation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +140 to +141
// Commit: clear the pending flag.
await this.metadata.put(id, { ...baseRow, pending: false });
Comment on lines +145 to +150
// Update is not atomic; concurrent get() may observe either old or new
// value but never a torn read. We overwrite the vault first; if the
// metadata update then fails we rethrow without rolling the vault back
// because the prior metadata row still points at the same vault id.
await this.vault.setSecret(id, value);
await this.metadata.put(id, baseRow);
Comment on lines +188 to +192
// Defensive: the rollback path went through vault.deleteSecret OR the
// vault was never written (depending on which write failed first).
// The current implementation writes metadata first, so vault is never
// touched if metadata.put throws — accept either outcome.
expect(deleteSpy.mock.calls.length >= 0).toBe(true);
Comment on lines +154 to +156
// committed value or undefined — never the not-yet-committed "v2".
const observed = await store.get("k");
expect(observed === "v1" || observed === undefined).toBe(true);
sroussey added 2 commits May 28, 2026 09:25
…ate-path comment

Addresses Copilot review on PR #540:

- New-entry put(): if the commit-step `metadata.put({pending:false})` throws
  after a successful vault.setSecret, the row stayed `pending:true` (invisible
  to readers) while the vault held bytes — same orphan failure mode as the
  vault-rollback-fails path, with no operator signal. Wrap the commit write,
  best-effort persist an orphan marker, and throw a wrapped error.

- Update path comment overclaimed "never a torn read". The actual window is a
  stale-metadata read: a concurrent get() between vault.setSecret and
  metadata.put returns the new vault value alongside the OLD metadata
  (updatedAt, expiresAt, label, provider). Vault reads themselves are atomic.
  Comment rewritten to describe this honestly and to point at versioned vault
  ids as the clean fix.
…t-step orphan path

Addresses Copilot review on PR #540:

- Rename and tighten the in-flight concurrency test to assert the prior
  committed value EXACTLY (the second put() is an update, so metadata stays
  non-pending and get() must return "v1"). The old wording allowed undefined,
  which would mask a regression that treats committed rows as absent.

- Replace the trivially-true `deleteSpy.mock.calls.length >= 0` with explicit
  `setSpy.not.toHaveBeenCalled()` / `deleteSpy.not.toHaveBeenCalled()` to
  pin the metadata-first behavior (vault is never touched when the pending
  metadata write throws).

- Add a regression test for the commit-step orphan path: first metadata.put
  (pending:true) succeeds, vault.setSecret succeeds, second metadata.put
  (commit pending:false) throws. Assert the wrapped "commit failed" error,
  that the vault retains the bytes, and that the metadata row is a sticky
  orphan marker.
@sroussey sroussey merged commit 3db52f6 into main May 28, 2026
13 checks passed
@sroussey sroussey deleted the claude/beautiful-mayer-iDxau branch May 28, 2026 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants