Skip to content

feat(driver-web-cache): optimistic concurrency support (putIf + onChange)#27348

Draft
anthony-murphy wants to merge 9 commits into
microsoft:mainfrom
anthony-murphy:optimistic-concurrency-driver-cache
Draft

feat(driver-web-cache): optimistic concurrency support (putIf + onChange)#27348
anthony-murphy wants to merge 9 commits into
microsoft:mainfrom
anthony-murphy:optimistic-concurrency-driver-cache

Conversation

@anthony-murphy
Copy link
Copy Markdown
Contributor

@anthony-murphy anthony-murphy commented May 19, 2026

Description

Adds two related primitives to FluidCache so that multiple browser tabs sharing the
same origin's IndexedDB can coordinate on persisted cache state. The motivating
scenario is offline pending-state storage where multiple tabs may race to persist
their version of the same key and the host wants to favor the active/dirty tab.

1. putIf(entry, value, shouldWrite) -- compare-and-swap write

The read of the existing entry and the conditional write happen in a single IndexedDB
readwrite transaction, so the decision sees a consistent view across tabs.

  • shouldWrite(existing, proposed) is synchronous by design. IndexedDB
    transactions close automatically on any non-IDB await, which would silently break the
    atomicity that makes the compare-and-swap correct. This is documented in the JSDoc.
  • existing is undefined when no entry exists for the key, when the existing
    entry belongs to a different partition, or when the entry is older than
    maxCacheItemAge -- matching the semantics of get.
  • Returns true iff the write committed. Errors are logged and not thrown,
    matching put.
  • Predicate exceptions are logged under a dedicated FluidCachePutIfPredicateError
    telemetry event (distinct from IDB-write failures under FluidCachePutError) and
    surfaced as false, after the transaction is aborted so the existing row is preserved.

2. events.on("change", listener) => unsubscribe -- cross-instance notifications

FluidCache instances now broadcast their mutations over a BroadcastChannel
(fluid-driver-cache:<guid>), so other FluidCache instances in the same browsing context
(typically other tabs) can observe changes. events is a standard Fluid Framework
Listenable; subscribe via cache.events.on("change", listener).

Event shape:

type FluidCacheChangeEvent =
    | { type: "put" | "remove"; partitionKey: string | null; fileId: string; entryType: string; cacheItemId: string }
    | { type: "removeFile"; fileId: string };
  • put/remove events are filtered to the listener's partition (consistent with get).
  • removeFile events are delivered unconditionally because removeEntries deletes
    rows regardless of partition.
  • Cross-partition broadcast routing: the IDB key omits partitionKey, so a
    partition-A put/putIf/removeEntry can displace a partition-B row.
    put/putIf/removeEntry now read the existing record's partitionKey
    inside the same readwrite transaction, and when the row belonged to a different
    partition, also broadcast a remove event stamped with the displaced
    partition's key so its listeners are notified.
  • BroadcastChannel does not echo to the sender, so a write does not invoke its own
    cache's listeners. Other instances (including in the same tab) do.
  • If BroadcastChannel is unavailable, the events subscription is a no-op and writes
    do not broadcast.

3. dispose() -- lifecycle (strict contract)

Closes the BroadcastChannel, any open IndexedDB connection, and clears the close timer.
Idempotent. After dispose returns:

  • Every public method (get, put, putIf, removeEntry, removeEntries)
    throws UsageError.
  • Operations already in flight when dispose was called also reject with UsageError.
  • The underlying IndexedDB connection is not lazily reopened by any in-flight call.
  • events subscription remains valid but no events fire.

The two constructor scheduleIdleTask callbacks (storage estimate + delete-old-entries
maintenance) also short-circuit on disposed so a construct-then-immediately-dispose
sequence does not open IDB or perform maintenance writes.

The production BroadcastChannel is unref()-ed so an idle FluidCache does not
keep Node's event loop alive (no-op in browsers). This lets the package's mocha
config keep PR #27188's exit: true-free invariant.

All API surface is additive on the existing @legacy @beta class. Per-package type
tests mark Class_FluidCache forwardCompat: false to acknowledge that adding
methods to a class is a forward-compat type-break by definition.

Reviewer Guidance

The review process is outlined on this wiki page.

Specific things I'd appreciate eyes on:

  • The IDB-transaction atomicity argument for putIf. The crucial invariant is that
    shouldWrite runs synchronously between tx.store.get and tx.store.put, so the
    transaction never yields to the event loop between the read and the decision. If you
    see a way that any await sneaks in there (e.g. via the predicate, the idb library's
    promise wrapping, or the existing?.partitionKey access), please flag it.
  • The cross-partition broadcast policy (put/putIf/removeEntry emit an extra
    remove event stamped with the displaced partition's key). Confirm this matches the
    expectations of hosts that run multiple partitions per process.
  • dispatchChangeEvent partition filtering: removeFile is delivered unconditionally,
    everything else gates on event.partitionKey === this.partitionKey. null === null
    is intentional.

anthony-murphy and others added 2 commits May 19, 2026 15:23
Add FluidCache.putIf(entry, value, shouldWrite) which performs the
read of the existing entry and the conditional write inside a single
IndexedDB readwrite transaction. This gives callers compare-and-swap
semantics across consumers sharing the same underlying IndexedDB
instance (e.g. multiple browser tabs racing to persist offline
pending state, where the caller wants to favor the active/dirty tab).

The predicate receives (existing, proposed) and must be synchronous:
IndexedDB transactions auto-close on any non-IDB await, which would
silently break the atomicity that makes the compare-and-swap correct.
Cross-partition existing entries surface as undefined to the predicate,
consistent with the semantics of get(). Errors are logged and swallowed,
matching put().

- Refactor put() to share a buildRecord helper with putIf.
- Mark Class_FluidCache forwardCompat:false in typeValidation.broken
  (adding a method is a forward-compat break for the type test).
- Regenerate api-report and the type-test baseline.
- Document the new method in the README and add a changeset.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nstances

Add cross-instance change notifications backed by BroadcastChannel.
FluidCache instances in the same browsing context (typically other
tabs of the same origin) now observe each other's mutations through
the new FluidCache.onChange(listener) API, returning an unsubscribe
function.

Event shape (FluidCacheChangeEvent):
- { op: "put" | "remove", partitionKey, fileId, type, cacheItemId }
  fired from put, a successful putIf, and removeEntry. Listeners
  only receive events whose partitionKey matches their FluidCache's
  partition, consistent with the semantics of get().
- { op: "removeFile", fileId } fired from removeEntries. Delivered to
  all listeners regardless of partition because removeEntries deletes
  rows regardless of partition.

A single BroadcastChannel name ("fluid-driver-cache") is used; partition
scoping is applied by the receiving listener. BroadcastChannel does not
echo a message back to the poster, so self-writes do not invoke their
own listeners — other instances (including those in the same tab) do.
If BroadcastChannel is unavailable, onChange becomes a no-op subscription
and writes simply do not broadcast.

Also add FluidCache.dispose() which closes the BroadcastChannel, the
open IndexedDB connection, and clears the close timer. dispose is
idempotent; calling onChange after dispose throws UsageError. In Node,
unref() is called on the channel when available so test processes and
short-lived scripts can exit without explicit disposal.

- Update test afterEach to use the new dispose() entry point instead
  of poking at private fields.
- Add FluidCacheBroadcastError and FluidCacheChangeListenerError
  telemetry events for the broadcast and listener-invocation paths.
- Regenerate api-report.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

Hi! Thank you for opening this PR. Want me to review it?

Based on the diff (1348 lines, 11 files), I've queued these reviewers:

  • Correctness — logic errors, race conditions, lifecycle issues
  • Security — vulnerabilities, secret exposure, injection
  • API Compatibility — breaking changes, release tags, type design
  • Performance — algorithmic regressions, memory leaks
  • Testing — coverage gaps, hollow tests

How this works

  • Adjust the reviewer set by ticking/unticking boxes above. Reviewer toggles alone don't trigger anything.

  • Tick Start review below to dispatch the review fleet.

  • After review finishes, tick Start review again to request another run — it auto-resets after each dispatch.

  • This comment updates as new commits land; your reviewer selections are preserved.

  • Start review

Comment thread packages/drivers/driver-web-cache/src/FluidCache.ts Outdated
Address PR microsoft#27348 deep-review feedback:

1. Resurrection race in openDb past dispose. After awaiting
   getFluidCacheIndexedDbInstance, check this.disposed; if so, close the
   just-acquired connection and throw UsageError so the in-flight call
   does not assign this.db or arm a fresh close timer.

2. Post-dispose contract is now uniform. Add throwIfDisposed guards at
   the top of get, put, putIf, removeEntry, removeEntries (joining
   onChange). The catch blocks rethrow UsageError on dispose-mid-flight
   so callers always see a clean rejection rather than a silent
   undefined/void/false. Update dispose() JSDoc to document the
   contract.

3. putIf staleness parity with get. putIf now treats existing rows
   older than maxCacheItemAge as undefined to the predicate, matching
   the view get applies. JSDoc and README updated to call this out
   explicitly, along with the cross-partition overwrite behavior.

4. Tests for every public method's post-dispose throw (both
   immediateClose modes), the in-flight dispose race against an
   awaiting put, and putIf's stale-row handling. 74 passing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@anthony-murphy
Copy link
Copy Markdown
Contributor Author

Pushed d4e7a62 addressing the deep-review feedback.

Decisions made on the open questions:

  1. Unified post-dispose contract (S2): all public methods (get, put, putIf, removeEntry, removeEntries, onChange) now throw UsageError after dispose. I went strict on get rather than lenient ΓÇö returning a sentinel undefined from a disposed cache is hard to distinguish from a normal miss, and the symmetric throw makes use-after-dispose bugs visible.

  2. Resurrection race (S1): openDb checks this.disposed immediately after awaiting getFluidCacheIndexedDbInstance (both in the closeDbImmediately and persistent-db branches) and closes the just-acquired connection + throws. Combined with throwIfDisposed at the top of each public method and a re-check in the catch block, an in-flight put racing against dispose now always rejects with UsageError and the DB is never lazily reopened. Test added.

  3. putIf cross-partition overwrite (Q3): kept the current ΓÇ£predicate sees undefined, true overwrites unconditionallyΓÇ¥ semantics ΓÇö putIf is a guarded put, not a more conservative one. put already overwrites foreign-partition rows; making putIf more cautious would create an asymmetry where consumers couldnΓÇÖt reliably write through putIf after a cross-partition cache pollution (e.g. user account switch). JSDoc and README now call this out explicitly.

  4. putIf maxCacheItemAge (Q4): filtered to match get. Existing rows older than maxCacheItemAge are now surfaced to the predicate as undefined, so the predicateΓÇÖs view of ΓÇ£is there an existing value?ΓÇ¥ is consistent with get+put. Test added.

  5. JSDoc at dispose(): updated to state the actual contract (throw on every other method, in-flight calls reject, no resurrection).

Build + lint + tests all green (74 passing, both immediateClose modes).

@anthony-murphy
Copy link
Copy Markdown
Contributor Author

Deep Review: Thanks for the d4e7a62 push — the unified throwIfDisposed + the post-await re-check inside openDb() cleanly close the resurrection race and the asymmetric-contract concern, and the four answers (strict get, putIf cross-partition overwrite, maxCacheItemAge filtering, JSDoc rewrite) all make sense.

Two Tier 2 gaps in the same dispose-contract family survive on this SHA — posted as new inline threads rather than as replies to the resolved thread because they're different code anchors:

  • S1 (success path): put / removeEntry / removeEntries set wrote/removed = true and broadcast after the awaited IDB write succeeds with no post-await throwIfDisposed. The catch-block rethrow you added covers errors, but a dispose() that lands mid-await db.put(...) lets the write commit and emits a "put"/"remove" event after teardown. The new "in-flight dispose race against an awaiting put" test exercises the erroring path, not the success path.
  • S2 (constructor idle task): the existing scheduleIdleTask cleanup callback calls getFluidCacheIndexedDbInstance(this.logger) directly, bypassing the new openDb() dispose guard. Construct → immediate-dispose() → wait-for-idle still triggers an IDB open + maintenance write, contradicting the "not lazily reopened" guarantee.

Both are localized — throwIfDisposed() immediately after the awaited write in S1, and an if (this.disposed) return; guard (or routing through openDb()) for S2. Details and suggested patches inline.

Address second round of PR microsoft#27348 deep-review feedback:

1. Success-path dispose race in put / putIf / removeEntry /
   removeEntries (S1). After the awaited IDB write resolved
   successfully, the methods set wrote=true / removed=true and emitted
   a broadcast even if dispose had landed mid-await. With
   closeDbImmediately=true the local db handle isn't closed by
   dispose, so the write committed and a 'put' / 'remove' / 'removeFile'
   event leaked past teardown. throwIfDisposed is now also called
   immediately after each successful await, so the call rejects with
   UsageError and no broadcast is emitted. broadcast() also short-
   circuits on disposed for defense in depth.

2. Constructor scheduleIdleTask cleanup task bypassed the openDb
   dispose guard by calling getFluidCacheIndexedDbInstance directly,
   contradicting the documented 'not lazily reopened' invariant. The
   storage-estimate idle task and the delete-old-entries idle task
   now both early-return on this.disposed (top of callback and after
   every await).

3. New test 'does not broadcast a change event when dispose runs
   mid-flight' verifies the success-path race no longer leaks events
   to other instances. 76 passing total.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@anthony-murphy
Copy link
Copy Markdown
Contributor Author

Pushed 4b389ab addressing both Tier 2 gaps from the d4e7a62 deep review:

S1 (success-path dispose race) ΓÇö after each successful awaited IDB write in put, putIf, removeEntry, and removeEntries we now call throwIfDisposed() before setting wrote / removed and before the bottom-of-method broadcast. The previously-leaky path was specifically closeDbImmediately=true: dispose() does not close the local db handle in that mode, so the IDB write committed and the change event leaked past teardown. broadcast() itself now also early-returns on this.disposed for defence in depth. New test does not broadcast a change event when dispose runs mid-flight verifies no event is delivered to other instances.

S2 (constructor idle cleanup task) ΓÇö the scheduleIdleTask cleanup callback calls getFluidCacheIndexedDbInstance directly, bypassing the new openDb() guard. Both idle callbacks (storage estimate + delete-old-entries) now check this.disposed at the top and after every await, so a construct-then-immediately-dispose() sequence no longer opens IDB or performs maintenance writes ΓÇö honoring the "not lazily reopened" invariant documented on dispose().

76 passing across both immediateClose modes. Build and lint green.

@anthony-murphy
Copy link
Copy Markdown
Contributor Author

Deep Review: Thanks for the 4b389ab push — the post-await throwIfDisposed() on put/putIf-success/removeEntry/removeEntries and the this.disposed guards on both idle callbacks close the previous two Tier 2 gaps cleanly. The new does not broadcast a change event when dispose runs mid-flight test covers the success-path race well.

Two new Tier 2 gaps in the same dispose-contract family survive on this SHA — posting as new inline threads since they're at different code anchors:

  • putIf rejected-write branch (FluidCache.ts:641-643) — after await transaction.done in the !shouldWrite(...) branch, the method returns false with no throwIfDisposed(). A caller whose predicate returned false while dispose() ran in another microtask sees false (a valid contract value meaning predicate rejected) instead of the promised UsageError. The symmetric successful-putIf branch already calls throwIfDisposed() after await transaction.done.
  • get/getItemFromCache success path (FluidCache.ts:480-549)get() checks throwIfDisposed() at entry and then await this.getItemFromCache(cacheEntry), but never re-checks before returning cachedItem?.cachedObject. Inside getItemFromCache, only the catch block calls throwIfDisposed(); the success returns do not. If dispose() lands during await this.openDb() or await db.get(...), get resolves with the cached value rather than rejecting. Every other IDB-resolving await in the file now has the post-await guard — the read success path is the lone holdout.

Details and recommended fixes inline.

Comment thread packages/drivers/driver-web-cache/src/FluidCache.ts
Two Tier 2 gaps in the dispose-contract family survived the previous
round of fixes:

- `putIf`'s rejected-write branch awaited `transaction.done` and
  returned `false` without checking `throwIfDisposed()`. A caller whose
  predicate returned `false` while `dispose()` ran in another microtask
  saw `false` (a valid contract value meaning "predicate rejected")
  instead of the promised `UsageError`.
- `get` / `getItemFromCache` checked `throwIfDisposed()` at entry but
  never re-checked after `await db.get(...)`. The success path resolved
  with the cached value rather than rejecting when dispose landed
  during the read. Every other IDB-resolving await in the file already
  had a post-await guard; the read path was the lone holdout.

Both fixes are a one-line `throwIfDisposed()` immediately after the
awaited IDB operation, mirroring the post-await guards already present
on `put` / `putIf`-success / `removeEntry` / `removeEntries`.

New tests cover both races by starting the in-flight call and
synchronously calling `dispose()` to land mid-flight:

- `putIf rejects with UsageError when dispose runs mid-flight and the
  predicate would reject` — predicate returns `false`, must throw, not
  resolve `false`.
- `get rejects with UsageError when dispose runs mid-flight` — seeds a
  value with a separate cache so the read success path is reached,
  must throw, not resolve the cached value.

Also expanded the `existing === undefined` enumeration in the `putIf`
changeset to call out the `maxCacheItemAge` staleness case alongside
the "no entry" and "cross-partition" cases, matching the actual filter
applied to `existingValue` and the JSDoc on `putIf`.

80 passing across both `immediateClose` modes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread packages/drivers/driver-web-cache/api-report/driver-web-cache.legacy.beta.api.md Outdated
Comment thread packages/drivers/driver-web-cache/api-report/driver-web-cache.legacy.beta.api.md Outdated
Comment thread packages/drivers/driver-web-cache/src/FluidCache.ts Outdated
Comment thread packages/drivers/driver-web-cache/src/FluidCache.ts Outdated
Comment thread packages/drivers/driver-web-cache/src/FluidCache.ts Outdated
Comment thread packages/drivers/driver-web-cache/src/FluidCache.ts Outdated
Comment thread packages/drivers/driver-web-cache/src/FluidCache.ts Outdated
Comment thread packages/drivers/driver-web-cache/src/FluidCache.ts
Comment thread packages/drivers/driver-web-cache/src/FluidCache.ts
Comment thread packages/drivers/driver-web-cache/src/FluidCache.ts
Comment thread packages/drivers/driver-web-cache/src/FluidCache.ts Outdated
Human comments:
- Rename FluidCacheChangeEvent discriminator `op` -> `type`; rename inner
  cache-entry field `type` -> `entryType` to avoid clash with the union
  discriminator. Tests, README, changeset, and api-report regenerated.
- Append a static UUID suffix to the BroadcastChannel name to harden
  against name collisions with other libraries on the same origin.
- Drop the `.unref()` cast from the constructor. Tests already dispose
  every FluidCache via afterEach; the only remaining concern was Node's
  BroadcastChannel keeping the event loop alive, which is now handled by
  setting `exit: true` in .mocharc.cjs (test-only, prod code is clean).
- `onChange` uses `throwIfDisposed()` for symmetry with other public
  methods.
- `throwIfDisposed(cleanup?)` accepts an optional cleanup lambda; the
  in-flight `openDb` callers now use it to close the just-acquired DB
  handle if dispose ran during open.
- Catch arms in get/put/putIf/removeEntry/removeEntries now log the
  original IDB error *before* calling throwIfDisposed(), preserving
  diagnostics on dispose-race paths.
- putIf wraps the predicate in its own try/catch: predicate exceptions
  abort the transaction and are logged under a dedicated
  FluidCachePutIfPredicateError event so host-callback bugs do not get
  conflated with IDB write failures.
- JSDoc on putIf explains the transaction-vs-cursor choice for the
  single-key compare-and-swap pattern.

Deep Review:
- One-shot FluidCacheBroadcastChannelUnavailable telemetry event when
  BroadcastChannel is missing so hosts can detect the degraded mode.

Deferred (will respond in PR comment):
- Listenable / createEmitter refactor (would change public API).
- Cross-partition broadcast routing for `put` that overwrites a row
  written by another partition (needs design/perf discussion).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread packages/drivers/driver-web-cache/.mocharc.cjs Outdated
anthony-murphy and others added 2 commits May 20, 2026 15:04
…r change events

The internal eventing library (createEmitter / Listenable) replaces the
hand-rolled Set<Listener> + onChange method, addressing the PR reviewer's
suggestion to re-use the standard rather than rolling our own.

Public API change (PR is still in review; no shipped consumers):
- Replace `FluidCache.onChange(listener): Off` with the standard
  `FluidCache.events: Listenable<FluidCacheEvents>` getter.
- Subscribe via `cache.events.on("change", listener)` -> Off.
- Add `FluidCacheEvents` to the public API surface.

Behavior changes follow from using the standard emitter:
- Drop per-listener try/catch isolation and the
  `FluidCacheChangeListenerError` telemetry event. Listener exceptions
  follow standard emitter semantics; consumers are expected to wrap their
  own handlers if needed (matches every other Listenable user in the
  framework).
- Subscribing on a disposed cache is now permitted; no events will fire
  because dispose closes the BroadcastChannel and the disposed flag is
  checked in `broadcast()` before any local dispatch.

Other changes:
- Add `@fluid-internal/client-utils: workspace:~` dependency.
- Rename changeset to `driver-web-cache-events.md` to reflect the new
  API name.
- Update README, changeset, tests, and api-report.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses third round of deep-review feedback on microsoft#27348:

* Cross-partition broadcast routing: put/putIf/removeEntry now read the
  existing record's partitionKey inside their readwrite transaction and,
  when the row belonged to a different partition than this cache, emit
  an additional remove event stamped with the displaced partition's key.
  This fixes the silent-invalidation gap where a partition-A write would
  overwrite a partition-B row in IDB without notifying B's listeners.
  removeEntry also stops broadcasting when no row was actually deleted.

* Production BroadcastChannel unref(): the production channel now calls
  unref() so Node's event loop does not stay alive on account of an idle
  FluidCache. config.exit = true has been removed from .mocharc.cjs,
  restoring PR microsoft#27188's deliberate test-hygiene invariant. Tests still
  pass (88 total) and the process exits cleanly.

* Vale prose-lint: .changeset/optimistic-driver-web-cache.md and
  .changeset/driver-web-cache-events.md rewritten with shorter sentences,
  no spaced em-dashes, no e.g./i.e. abbreviations, and no auto-close
  hyphenation. Unblocks the failing vale check.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@anthony-murphy
Copy link
Copy Markdown
Contributor Author

Pushed 9ea3fa2912 addressing the third round of deep-review feedback and the vale failure on aa8ed9f. Per-thread replies posted on each inline thread; high-level summary:

  • Cross-partition broadcast routing (FluidCache.ts:493). put / putIf / removeEntry each now wrap the read+write/delete in a single readwrite transaction, capture existing.partitionKey when it differs from this.partitionKey, and emit an additional remove event stamped with the displaced partition's key after the primary broadcast. removeEntry also no longer broadcasts when no row existed at the key (previously over-broadcast). Three new tests cover the routing for all three call sites plus a no-row-no-broadcast test.
  • Production BroadcastChannel.unref() (.mocharc.cjs:13). Option (1) from the suggested fixes: production channel is now unref()-ed via a typed cast (browsers no-op via the optional chain). config.exit = true dropped from .mocharc.cjs, restoring test(driver-web-cache): Migrate from jest to mocha #27188's invariant. 88 tests pass and mocha exits cleanly without the flag.
  • Cursor vs. transaction (FluidCache.ts:673). Sticking with the readwrite transaction; rationale captured in the putIf JSDoc (get/put pair inside one transaction has the same atomicity as a cursor for a single-key CAS, simpler to reason about).
  • Catch-arm error preservation (FluidCache.ts:728). Already addressed on aa8ed9f — all five catch arms call logger.sendErrorEvent(..., error) before throwIfDisposed(), so the original IDB error survives the dispose-race path via telemetry.
  • Vale. .changeset/optimistic-driver-web-cache.md + .changeset/driver-web-cache-events.md rewritten with shorter sentences, no spaced em-dashes, no e.g./i.e. abbreviations, and no auto-close hyphenation. CI vale check should now pass.
  • PR description. dispose() semantics bullet replaced with the settled contract; added a cross-partition broadcast policy bullet for reviewer attention.

88 passing across both immediateClose modes. Build, lint, biome all green locally.

Comment thread packages/drivers/driver-web-cache/src/FluidCache.ts Outdated
…rop mis-categorized telemetry

When closeDbImmediately is true (the default), every getItemFromCache call
opens a fresh per-call IDBPDatabase that the cache must close itself. The
catch arm previously logged the error then re-threw via throwIfDisposed
before reaching closeDb, leaking the connection on every disposed-mid-get
race. Refactor getItemFromCache to try/catch/finally so closeDb always
runs, and drop the redundant inline closeDb calls in the success returns.

Also reorder the catch arms of get/put/putIf/removeEntry/removeEntries to
throw the UsageError before logging when this.disposed is set. The error
in that branch is the UsageError thrown by our own post-await
throwIfDisposed -- it is not an IDB read/write/delete failure, so logging
it under FluidCache*Error mis-categorizes the dispose race in telemetry.
Real IDB errors still log normally.

Adds a regression test that races dispose against several gets and then
calls indexedDB.deleteDatabase, which fake-indexeddb resolves only when
no open connections remain.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@anthony-murphy
Copy link
Copy Markdown
Contributor Author

Round 4 — fixed in 53a815d.

Tier 2: getItemFromCache IDB connection leak on dispose race

  • Refactored getItemFromCache to try / catch / finally. closeDb(db) now lives in finally and always runs. Removed four redundant inline closeDb(db) calls in the success returns and one in the catch arm.
  • Under closeDbImmediately: true (the default), every get opens a fresh per-call IDBPDatabase. The previous ordering (catch → log → throwIfDisposedcloseDb) short-circuited at throwIfDisposed, leaking the connection on every disposed-mid-get race.

Tier 3: catch-arm telemetry mis-categorization

  • All five catch arms (get / put / putIf / removeEntry / removeEntries) now call throwIfDisposed() before sendErrorEvent. The error in that branch is the UsageError our own post-await throwIfDisposed threw — not an IDB failure — so logging it under FluidCache*Error mis-categorized the dispose race. Real IDB errors still log normally (post-throwIfDisposed is a no-op when disposed is false).

New test

  • get rejected mid-flight closes its IDB connection (no leak) — seeds a row, races dispose() against five back-to-back get calls, then calls indexedDB.deleteDatabase. fake-indexeddb resolves deleteDatabase only when no open connections remain (a leak fires blocked). Passes in closeDbImmediately: true, skipped in shared-connection mode (dispose() closes this.db there).

89 passing, 1 properly skipped. Build, biome, eslint all clean.

@github-actions
Copy link
Copy Markdown
Contributor

🔗 No broken links found! ✅

Your attention to detail is admirable.

linkcheck output


> fluid-framework-docs-site@0.0.0 ci:check-links /home/runner/work/FluidFramework/FluidFramework/docs
> start-server-and-test "npm run serve -- --no-open" 3000 check-links

1: starting server using command "npm run serve -- --no-open"
and when url "[ 'http://127.0.0.1:3000' ]" is responding with HTTP status code 200
running tests using command "npm run check-links"


> fluid-framework-docs-site@0.0.0 serve
> docusaurus serve --no-open

[SUCCESS] Serving "build" directory at: http://localhost:3000/

> fluid-framework-docs-site@0.0.0 check-links
> linkcheck http://localhost:3000 --skip-file skipped-urls.txt

Crawling...

Stats:
  288859 links
    1925 destination URLs
    2175 URLs ignored
       0 warnings
       0 errors


@anthony-murphy
Copy link
Copy Markdown
Contributor Author

Deep Review

Reviewed commit 53a815d on 2026-05-21.

Readiness: 8/10 — ALMOST READY

The getItemFromCache catch-arm IDB-connection leak that drove the prior 5/10 has landed cleanly on 53a815dtry / catch / finally ownership of closeDb(db) matches the established put / putIf / removeEntry / removeEntries shape, and the new get rejected mid-flight closes its IDB connection (no leak) regression test pins the contract. No Tier 1–2 issues survive on this SHA. Three Tier 3 polish items remain (one one-line JSDoc fix with a suggested-change block, one design-discussion item on putIf's CAS shape, one low-probability BroadcastChannel.unref() test-harness gap). Settling at 8 (not 9) anchors conservatively against the 5 → 5 → 5 → 7 → 6 → 7 → 5 → 8 trajectory — putIf shape concerns have historically come back on this file.

Path to Ready

  • Resolve inline threads
  • Decide on putIf CAS shape (see inline): either add an additive update(entry, updater) overload now while the Class_FluidCache forwardCompat: false break is already being taken, or document in the putIf JSDoc that the predicate-only shape is intentional and an updater overload is a planned follow-up. Both Phase 3 blind designs picked the updater shape independently — worth surfacing before the @legacy @beta surface solidifies.

Context for Reviewers

For human reviewer
  • Needs human judgmentputIf CAS shape: keep narrow predicate or add transform-shape overload before shipping @legacy @beta. Owners: shubhi1092 / markfields.
  • Needs human judgment — Cross-partition overwrite policy on putIf (settled in inline discussion as "guarded put, unconditional overwrite") and the cross-partition broadcast-routing semantic (extra synthesized remove event stamped with the displaced partition's key). Author explicitly asks domain owners (vladsud / christiango) to confirm against hosts running multiple partitions per process.
  • Needs human judgmentdispatchChangeEvent partition-filter semantics: removeFile unconditional, others gated on event.partitionKey === this.partitionKey with null === null intentional. Author explicitly asks reviewers to confirm.
  • Needs human judgment — Whether to lift dispose? (and possibly putIf / events) onto IPersistedCache as optional members in a follow-up, matching the removeEntry? precedent from Add api to remove a single entry from FluidCache #26065. Phase 3 evaluators converged that dispose? is the universal one worth lifting; putIf / events are IDB-specific and reasonable to keep class-local.
  • Needs human judgment — Owner sign-off from Josmithr that the new BroadcastChannel.unref() + afterEach-dispose() test hygiene preserves PR test(driver-web-cache): Migrate from jest to mocha #27188's exit: true-free invariant.
  • Cannot be assessed by the pipelineidb same-microtask-tick atomicity for putIf's transaction.store.get(key) → predicate → transaction.store.put(...) chain. Author explicitly asks reviewers to verify against idb source that no await sneaks in between read and write. Best fit: vladsud / jatgarg.
  • Cannot be assessed by the pipeline — Confirm the test process exits cleanly under mocha now that BroadcastChannel.unref() is in production code and config.exit = true has been dropped, including parallel-run.
Review history (7 prior reviews)
  • 9ea3fa2 2026-05-21 · 5/10 — IDB connection leak in getItemFromCache catch arm flagged
  • aa8ed9f 2026-05-20 · 7/10 — Tier 1–2 dispose-contract concerns closed; .mocharc.cjs config.exit = true only Tier 3 holdout
  • a54d602 2026-05-20 · 6/10.mocharc.cjs config.exit = true reverses test(driver-web-cache): Migrate from jest to mocha #27188's leak-clean invariant
  • 658765a 2026-05-20 · 7/10 — both Tier 2 dispose-contract gaps closed; six polish items remained
  • 4b389ab 2026-05-19 · 5/10putIf reject branch + get success path flagged as new Tier 2 dispose-contract holdouts
  • d4e7a62 2026-05-19 · 5/10 — uniform throwIfDisposed + openDb() post-await re-check landed; success-path race + constructor idle task flagged
  • 0bd441e 2026-05-19 · 5/10 — in-flight openDb resurrection + asymmetric post-dispose contract flagged

* tool if we needed to iterate or range-scan; for a known key we don't.
*
* @param entry - cache entry; identifies the file and the key within that file.
* @param value - the proposed JSON-serializable value to write if `shouldWrite` returns true.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: putIf's @param value JSDoc narrows the value contract to "JSON-serializable", but the actual storage path stores it via IDB structured clone — same requirements as put.

The JSDoc reads:

* @param value - the proposed JSON-serializable value to write if shouldWrite returns true.

But value: unknown flows through buildRecord (FluidCache.ts:712-716, 776-777) into an IDB schema where cachedObject: any (FluidCacheIndexedDb.ts:111-114). That is structured-clone storage with no JSON constraint, and matches put exactly. The README's putIf section (README.md:69-92) describes the proposed value with no JSON-serializability restriction, which is consistent with the actual code but contradicts this line.

Historical context: PR #15550 (vladsud, 2023-05-12) explicitly pushed back on the JSON-serializable framing for FluidCache values, citing MFS-reported non-serializable content (array buffers, blobs). The same framing is wrong for the same reason here.

Suggested change
* @param value - the proposed JSON-serializable value to write if `shouldWrite` returns true.
* @param value - the proposed value to write if `shouldWrite` returns true. Stored via IndexedDB structured clone, with the same value requirements as {@link FluidCache.put}.

* @remarks
* The implementation uses `transaction.store.get` + `transaction.store.put` rather
* than an IDB cursor. Both run inside the same `readwrite` transaction, so the
* atomicity guarantee is identical, and the get/put pair is materially simpler
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: putIf's CAS API shape cannot express read-modify-write — flagging as a design discussion before this @legacy @beta surface solidifies, not as a blocking concern.

The current signature is putIf(entry, value, shouldWrite(existing, proposed): boolean): Promise<boolean> — the proposed value is fixed at call time and the predicate can only accept or reject it. That covers the revision-gating use case in this PR ("active tab wins") but cannot express the canonical CAS shape where the new value is derived from the persisted value — e.g. a monotonic sequence-number bump, or merging the existing record into the new one. The two independent blind redesigns run during review both landed on an updater-callback shape (update(entry, updater) / updateUsingPrevious(entry, update(previous): unknown | noUpdate)) where the callback returns the value to write inside the transaction, and the cross-evaluator scoring put the updater shape at least equal to the predicate on expressiveness (5 vs. 3 on edge cases).

This matters now because the class is @legacy @beta and Class_FluidCache forwardCompat: false is already being taken in validateDriverWebCachePrevious.generated.ts. Reshaping after ship costs both backward- and forward-compat; reshaping while the break is already being taken is much cheaper. An additive update(entry, updater) overload alongside the existing predicate putIf remains forward-compat-compatible as a later addition, so this is not a now-or-never decision — but the next CAS consumer requirement is the natural time to take it, and it is worth surfacing before that consumer needs to work around a missing shape.

Resolutions either way close this:

  • (a) Add a sibling update(entry, updater) overload now while the type-validation break is already being taken; or
  • (b) Document in the putIf JSDoc that the predicate-only shape is intentional and an updater variant is a planned follow-up if needed.

Flagging shubhi1092 / markfields for the human design call.

// `BroadcastChannel` interface has no `unref`; Node's implementation does, and
// the cast is intentional. In browsers this is a no-op (the property is absent
// and the optional-chained call is skipped).
(this.broadcastChannel as unknown as { unref?: () => void }).unref?.();
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: The BroadcastChannel.unref() reliance on an optional chain is not asserted anywhere — if Node ever stops exposing unref on BroadcastChannel, the call silently no-ops and the package's mocha config tips back into needing exit: true.

The call is (this.broadcastChannel as unknown as { unref?: () => void }).unref?.(); — the optional chain is necessary in browsers, but it also masks regressions on Node. PR #27188 (Josmithr, 2026-04-30) migrated driver-web-cache from jest to mocha and explicitly preserved an exit: true-free invariant — "This was needed in order to not specify exit: true in the mocha config." The PR description acknowledges that dependency. The new test file adds 647 lines that construct many FluidCache instances under mocha-on-Node; the afterEach cleanup at FluidCache.test.ts:73-88 disposes every cache (closing each BroadcastChannel via dispose), which mitigates the symptom but does not lock in the contract.

Resolutions either way close this:

  • (a) Add a Node-only typeof (channel as { unref?: unknown }).unref === "function" check around the call that asserts/logs when absent — locks in the contract PR test(driver-web-cache): Migrate from jest to mocha #27188 deliberately preserved; or
  • (b) Add a smoke test asserting the mocha process exits cleanly without exit: true after constructing and disposing a representative FluidCache.

Low-probability gap on this SHA — flagging Josmithr (PR #27188 owner) as the minimum reviewer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant