feat(driver-web-cache): optimistic concurrency support (putIf + onChange)#27348
feat(driver-web-cache): optimistic concurrency support (putIf + onChange)#27348anthony-murphy wants to merge 9 commits into
Conversation
Add FluidCache.putIf(entry, value, shouldWrite) which performs the read of the existing entry and the conditional write inside a single IndexedDB readwrite transaction. This gives callers compare-and-swap semantics across consumers sharing the same underlying IndexedDB instance (e.g. multiple browser tabs racing to persist offline pending state, where the caller wants to favor the active/dirty tab). The predicate receives (existing, proposed) and must be synchronous: IndexedDB transactions auto-close on any non-IDB await, which would silently break the atomicity that makes the compare-and-swap correct. Cross-partition existing entries surface as undefined to the predicate, consistent with the semantics of get(). Errors are logged and swallowed, matching put(). - Refactor put() to share a buildRecord helper with putIf. - Mark Class_FluidCache forwardCompat:false in typeValidation.broken (adding a method is a forward-compat break for the type test). - Regenerate api-report and the type-test baseline. - Document the new method in the README and add a changeset. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nstances
Add cross-instance change notifications backed by BroadcastChannel.
FluidCache instances in the same browsing context (typically other
tabs of the same origin) now observe each other's mutations through
the new FluidCache.onChange(listener) API, returning an unsubscribe
function.
Event shape (FluidCacheChangeEvent):
- { op: "put" | "remove", partitionKey, fileId, type, cacheItemId }
fired from put, a successful putIf, and removeEntry. Listeners
only receive events whose partitionKey matches their FluidCache's
partition, consistent with the semantics of get().
- { op: "removeFile", fileId } fired from removeEntries. Delivered to
all listeners regardless of partition because removeEntries deletes
rows regardless of partition.
A single BroadcastChannel name ("fluid-driver-cache") is used; partition
scoping is applied by the receiving listener. BroadcastChannel does not
echo a message back to the poster, so self-writes do not invoke their
own listeners — other instances (including those in the same tab) do.
If BroadcastChannel is unavailable, onChange becomes a no-op subscription
and writes simply do not broadcast.
Also add FluidCache.dispose() which closes the BroadcastChannel, the
open IndexedDB connection, and clears the close timer. dispose is
idempotent; calling onChange after dispose throws UsageError. In Node,
unref() is called on the channel when available so test processes and
short-lived scripts can exit without explicit disposal.
- Update test afterEach to use the new dispose() entry point instead
of poking at private fields.
- Add FluidCacheBroadcastError and FluidCacheChangeListenerError
telemetry events for the broadcast and listener-invocation paths.
- Regenerate api-report.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Hi! Thank you for opening this PR. Want me to review it? Based on the diff (1348 lines, 11 files), I've queued these reviewers:
How this works
|
Address PR microsoft#27348 deep-review feedback: 1. Resurrection race in openDb past dispose. After awaiting getFluidCacheIndexedDbInstance, check this.disposed; if so, close the just-acquired connection and throw UsageError so the in-flight call does not assign this.db or arm a fresh close timer. 2. Post-dispose contract is now uniform. Add throwIfDisposed guards at the top of get, put, putIf, removeEntry, removeEntries (joining onChange). The catch blocks rethrow UsageError on dispose-mid-flight so callers always see a clean rejection rather than a silent undefined/void/false. Update dispose() JSDoc to document the contract. 3. putIf staleness parity with get. putIf now treats existing rows older than maxCacheItemAge as undefined to the predicate, matching the view get applies. JSDoc and README updated to call this out explicitly, along with the cross-partition overwrite behavior. 4. Tests for every public method's post-dispose throw (both immediateClose modes), the in-flight dispose race against an awaiting put, and putIf's stale-row handling. 74 passing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Pushed d4e7a62 addressing the deep-review feedback. Decisions made on the open questions:
Build + lint + tests all green (74 passing, both |
|
Deep Review: Thanks for the d4e7a62 push — the unified Two Tier 2 gaps in the same dispose-contract family survive on this SHA — posted as new inline threads rather than as replies to the resolved thread because they're different code anchors:
Both are localized — |
Address second round of PR microsoft#27348 deep-review feedback: 1. Success-path dispose race in put / putIf / removeEntry / removeEntries (S1). After the awaited IDB write resolved successfully, the methods set wrote=true / removed=true and emitted a broadcast even if dispose had landed mid-await. With closeDbImmediately=true the local db handle isn't closed by dispose, so the write committed and a 'put' / 'remove' / 'removeFile' event leaked past teardown. throwIfDisposed is now also called immediately after each successful await, so the call rejects with UsageError and no broadcast is emitted. broadcast() also short- circuits on disposed for defense in depth. 2. Constructor scheduleIdleTask cleanup task bypassed the openDb dispose guard by calling getFluidCacheIndexedDbInstance directly, contradicting the documented 'not lazily reopened' invariant. The storage-estimate idle task and the delete-old-entries idle task now both early-return on this.disposed (top of callback and after every await). 3. New test 'does not broadcast a change event when dispose runs mid-flight' verifies the success-path race no longer leaks events to other instances. 76 passing total. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Pushed 4b389ab addressing both Tier 2 gaps from the d4e7a62 deep review: S1 (success-path dispose race) ΓÇö after each successful awaited IDB write in S2 (constructor idle cleanup task) ΓÇö the 76 passing across both |
|
Deep Review: Thanks for the 4b389ab push — the post-await Two new Tier 2 gaps in the same dispose-contract family survive on this SHA — posting as new inline threads since they're at different code anchors:
Details and recommended fixes inline. |
Two Tier 2 gaps in the dispose-contract family survived the previous round of fixes: - `putIf`'s rejected-write branch awaited `transaction.done` and returned `false` without checking `throwIfDisposed()`. A caller whose predicate returned `false` while `dispose()` ran in another microtask saw `false` (a valid contract value meaning "predicate rejected") instead of the promised `UsageError`. - `get` / `getItemFromCache` checked `throwIfDisposed()` at entry but never re-checked after `await db.get(...)`. The success path resolved with the cached value rather than rejecting when dispose landed during the read. Every other IDB-resolving await in the file already had a post-await guard; the read path was the lone holdout. Both fixes are a one-line `throwIfDisposed()` immediately after the awaited IDB operation, mirroring the post-await guards already present on `put` / `putIf`-success / `removeEntry` / `removeEntries`. New tests cover both races by starting the in-flight call and synchronously calling `dispose()` to land mid-flight: - `putIf rejects with UsageError when dispose runs mid-flight and the predicate would reject` — predicate returns `false`, must throw, not resolve `false`. - `get rejects with UsageError when dispose runs mid-flight` — seeds a value with a separate cache so the read success path is reached, must throw, not resolve the cached value. Also expanded the `existing === undefined` enumeration in the `putIf` changeset to call out the `maxCacheItemAge` staleness case alongside the "no entry" and "cross-partition" cases, matching the actual filter applied to `existingValue` and the JSDoc on `putIf`. 80 passing across both `immediateClose` modes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Human comments: - Rename FluidCacheChangeEvent discriminator `op` -> `type`; rename inner cache-entry field `type` -> `entryType` to avoid clash with the union discriminator. Tests, README, changeset, and api-report regenerated. - Append a static UUID suffix to the BroadcastChannel name to harden against name collisions with other libraries on the same origin. - Drop the `.unref()` cast from the constructor. Tests already dispose every FluidCache via afterEach; the only remaining concern was Node's BroadcastChannel keeping the event loop alive, which is now handled by setting `exit: true` in .mocharc.cjs (test-only, prod code is clean). - `onChange` uses `throwIfDisposed()` for symmetry with other public methods. - `throwIfDisposed(cleanup?)` accepts an optional cleanup lambda; the in-flight `openDb` callers now use it to close the just-acquired DB handle if dispose ran during open. - Catch arms in get/put/putIf/removeEntry/removeEntries now log the original IDB error *before* calling throwIfDisposed(), preserving diagnostics on dispose-race paths. - putIf wraps the predicate in its own try/catch: predicate exceptions abort the transaction and are logged under a dedicated FluidCachePutIfPredicateError event so host-callback bugs do not get conflated with IDB write failures. - JSDoc on putIf explains the transaction-vs-cursor choice for the single-key compare-and-swap pattern. Deep Review: - One-shot FluidCacheBroadcastChannelUnavailable telemetry event when BroadcastChannel is missing so hosts can detect the degraded mode. Deferred (will respond in PR comment): - Listenable / createEmitter refactor (would change public API). - Cross-partition broadcast routing for `put` that overwrites a row written by another partition (needs design/perf discussion). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…r change events
The internal eventing library (createEmitter / Listenable) replaces the
hand-rolled Set<Listener> + onChange method, addressing the PR reviewer's
suggestion to re-use the standard rather than rolling our own.
Public API change (PR is still in review; no shipped consumers):
- Replace `FluidCache.onChange(listener): Off` with the standard
`FluidCache.events: Listenable<FluidCacheEvents>` getter.
- Subscribe via `cache.events.on("change", listener)` -> Off.
- Add `FluidCacheEvents` to the public API surface.
Behavior changes follow from using the standard emitter:
- Drop per-listener try/catch isolation and the
`FluidCacheChangeListenerError` telemetry event. Listener exceptions
follow standard emitter semantics; consumers are expected to wrap their
own handlers if needed (matches every other Listenable user in the
framework).
- Subscribing on a disposed cache is now permitted; no events will fire
because dispose closes the BroadcastChannel and the disposed flag is
checked in `broadcast()` before any local dispatch.
Other changes:
- Add `@fluid-internal/client-utils: workspace:~` dependency.
- Rename changeset to `driver-web-cache-events.md` to reflect the new
API name.
- Update README, changeset, tests, and api-report.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses third round of deep-review feedback on microsoft#27348: * Cross-partition broadcast routing: put/putIf/removeEntry now read the existing record's partitionKey inside their readwrite transaction and, when the row belonged to a different partition than this cache, emit an additional remove event stamped with the displaced partition's key. This fixes the silent-invalidation gap where a partition-A write would overwrite a partition-B row in IDB without notifying B's listeners. removeEntry also stops broadcasting when no row was actually deleted. * Production BroadcastChannel unref(): the production channel now calls unref() so Node's event loop does not stay alive on account of an idle FluidCache. config.exit = true has been removed from .mocharc.cjs, restoring PR microsoft#27188's deliberate test-hygiene invariant. Tests still pass (88 total) and the process exits cleanly. * Vale prose-lint: .changeset/optimistic-driver-web-cache.md and .changeset/driver-web-cache-events.md rewritten with shorter sentences, no spaced em-dashes, no e.g./i.e. abbreviations, and no auto-close hyphenation. Unblocks the failing vale check. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Pushed
88 passing across both |
…rop mis-categorized telemetry When closeDbImmediately is true (the default), every getItemFromCache call opens a fresh per-call IDBPDatabase that the cache must close itself. The catch arm previously logged the error then re-threw via throwIfDisposed before reaching closeDb, leaking the connection on every disposed-mid-get race. Refactor getItemFromCache to try/catch/finally so closeDb always runs, and drop the redundant inline closeDb calls in the success returns. Also reorder the catch arms of get/put/putIf/removeEntry/removeEntries to throw the UsageError before logging when this.disposed is set. The error in that branch is the UsageError thrown by our own post-await throwIfDisposed -- it is not an IDB read/write/delete failure, so logging it under FluidCache*Error mis-categorizes the dispose race in telemetry. Real IDB errors still log normally. Adds a regression test that races dispose against several gets and then calls indexedDB.deleteDatabase, which fake-indexeddb resolves only when no open connections remain. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Round 4 — fixed in 53a815d. Tier 2:
Tier 3: catch-arm telemetry mis-categorization
New test
89 passing, 1 properly skipped. Build, biome, eslint all clean. |
|
🔗 No broken links found! ✅ Your attention to detail is admirable. linkcheck output |
Deep ReviewReviewed commit Readiness: 8/10 — ALMOST READY The Path to Ready
Context for Reviewers
For human reviewer
Review history (7 prior reviews)
|
| * tool if we needed to iterate or range-scan; for a known key we don't. | ||
| * | ||
| * @param entry - cache entry; identifies the file and the key within that file. | ||
| * @param value - the proposed JSON-serializable value to write if `shouldWrite` returns true. |
There was a problem hiding this comment.
Deep Review: putIf's @param value JSDoc narrows the value contract to "JSON-serializable", but the actual storage path stores it via IDB structured clone — same requirements as put.
The JSDoc reads:
* @param value - the proposed JSON-serializable value to write if shouldWrite returns true.
But value: unknown flows through buildRecord (FluidCache.ts:712-716, 776-777) into an IDB schema where cachedObject: any (FluidCacheIndexedDb.ts:111-114). That is structured-clone storage with no JSON constraint, and matches put exactly. The README's putIf section (README.md:69-92) describes the proposed value with no JSON-serializability restriction, which is consistent with the actual code but contradicts this line.
Historical context: PR #15550 (vladsud, 2023-05-12) explicitly pushed back on the JSON-serializable framing for FluidCache values, citing MFS-reported non-serializable content (array buffers, blobs). The same framing is wrong for the same reason here.
| * @param value - the proposed JSON-serializable value to write if `shouldWrite` returns true. | |
| * @param value - the proposed value to write if `shouldWrite` returns true. Stored via IndexedDB structured clone, with the same value requirements as {@link FluidCache.put}. |
| * @remarks | ||
| * The implementation uses `transaction.store.get` + `transaction.store.put` rather | ||
| * than an IDB cursor. Both run inside the same `readwrite` transaction, so the | ||
| * atomicity guarantee is identical, and the get/put pair is materially simpler |
There was a problem hiding this comment.
Deep Review: putIf's CAS API shape cannot express read-modify-write — flagging as a design discussion before this @legacy @beta surface solidifies, not as a blocking concern.
The current signature is putIf(entry, value, shouldWrite(existing, proposed): boolean): Promise<boolean> — the proposed value is fixed at call time and the predicate can only accept or reject it. That covers the revision-gating use case in this PR ("active tab wins") but cannot express the canonical CAS shape where the new value is derived from the persisted value — e.g. a monotonic sequence-number bump, or merging the existing record into the new one. The two independent blind redesigns run during review both landed on an updater-callback shape (update(entry, updater) / updateUsingPrevious(entry, update(previous): unknown | noUpdate)) where the callback returns the value to write inside the transaction, and the cross-evaluator scoring put the updater shape at least equal to the predicate on expressiveness (5 vs. 3 on edge cases).
This matters now because the class is @legacy @beta and Class_FluidCache forwardCompat: false is already being taken in validateDriverWebCachePrevious.generated.ts. Reshaping after ship costs both backward- and forward-compat; reshaping while the break is already being taken is much cheaper. An additive update(entry, updater) overload alongside the existing predicate putIf remains forward-compat-compatible as a later addition, so this is not a now-or-never decision — but the next CAS consumer requirement is the natural time to take it, and it is worth surfacing before that consumer needs to work around a missing shape.
Resolutions either way close this:
- (a) Add a sibling
update(entry, updater)overload now while the type-validation break is already being taken; or - (b) Document in the
putIfJSDoc that the predicate-only shape is intentional and an updater variant is a planned follow-up if needed.
Flagging shubhi1092 / markfields for the human design call.
| // `BroadcastChannel` interface has no `unref`; Node's implementation does, and | ||
| // the cast is intentional. In browsers this is a no-op (the property is absent | ||
| // and the optional-chained call is skipped). | ||
| (this.broadcastChannel as unknown as { unref?: () => void }).unref?.(); |
There was a problem hiding this comment.
Deep Review: The BroadcastChannel.unref() reliance on an optional chain is not asserted anywhere — if Node ever stops exposing unref on BroadcastChannel, the call silently no-ops and the package's mocha config tips back into needing exit: true.
The call is (this.broadcastChannel as unknown as { unref?: () => void }).unref?.(); — the optional chain is necessary in browsers, but it also masks regressions on Node. PR #27188 (Josmithr, 2026-04-30) migrated driver-web-cache from jest to mocha and explicitly preserved an exit: true-free invariant — "This was needed in order to not specify exit: true in the mocha config." The PR description acknowledges that dependency. The new test file adds 647 lines that construct many FluidCache instances under mocha-on-Node; the afterEach cleanup at FluidCache.test.ts:73-88 disposes every cache (closing each BroadcastChannel via dispose), which mitigates the symptom but does not lock in the contract.
Resolutions either way close this:
- (a) Add a Node-only
typeof (channel as { unref?: unknown }).unref === "function"check around the call that asserts/logs when absent — locks in the contract PR test(driver-web-cache): Migrate fromjesttomocha#27188 deliberately preserved; or - (b) Add a smoke test asserting the mocha process exits cleanly without
exit: trueafter constructing and disposing a representativeFluidCache.
Low-probability gap on this SHA — flagging Josmithr (PR #27188 owner) as the minimum reviewer.
Description
Adds two related primitives to
FluidCacheso that multiple browser tabs sharing thesame origin's IndexedDB can coordinate on persisted cache state. The motivating
scenario is offline pending-state storage where multiple tabs may race to persist
their version of the same key and the host wants to favor the active/dirty tab.
1.
putIf(entry, value, shouldWrite)-- compare-and-swap writeThe read of the existing entry and the conditional write happen in a single IndexedDB
readwritetransaction, so the decision sees a consistent view across tabs.shouldWrite(existing, proposed)is synchronous by design. IndexedDBtransactions close automatically on any non-IDB await, which would silently break the
atomicity that makes the compare-and-swap correct. This is documented in the JSDoc.
existingisundefinedwhen no entry exists for the key, when the existingentry belongs to a different partition, or when the entry is older than
maxCacheItemAge-- matching the semantics ofget.trueiff the write committed. Errors are logged and not thrown,matching
put.FluidCachePutIfPredicateErrortelemetry event (distinct from IDB-write failures under
FluidCachePutError) andsurfaced as
false, after the transaction is aborted so the existing row is preserved.2.
events.on("change", listener) => unsubscribe-- cross-instance notificationsFluidCacheinstances now broadcast their mutations over aBroadcastChannel(
fluid-driver-cache:<guid>), so otherFluidCacheinstances in the same browsing context(typically other tabs) can observe changes.
eventsis a standard Fluid FrameworkListenable; subscribe viacache.events.on("change", listener).Event shape:
put/removeevents are filtered to the listener's partition (consistent withget).removeFileevents are delivered unconditionally becauseremoveEntriesdeletesrows regardless of partition.
partitionKey, so apartition-A
put/putIf/removeEntrycan displace a partition-B row.put/putIf/removeEntrynow read the existing record'spartitionKeyinside the same
readwritetransaction, and when the row belonged to a differentpartition, also broadcast a
removeevent stamped with the displacedpartition's key so its listeners are notified.
BroadcastChanneldoes not echo to the sender, so a write does not invoke its owncache's listeners. Other instances (including in the same tab) do.
BroadcastChannelis unavailable, theeventssubscription is a no-op and writesdo not broadcast.
3.
dispose()-- lifecycle (strict contract)Closes the
BroadcastChannel, any open IndexedDB connection, and clears the close timer.Idempotent. After
disposereturns:get,put,putIf,removeEntry,removeEntries)throws
UsageError.disposewas called also reject withUsageError.eventssubscription remains valid but no events fire.The two constructor
scheduleIdleTaskcallbacks (storage estimate + delete-old-entriesmaintenance) also short-circuit on
disposedso a construct-then-immediately-disposesequence does not open IDB or perform maintenance writes.
The production
BroadcastChannelisunref()-ed so an idleFluidCachedoes notkeep Node's event loop alive (no-op in browsers). This lets the package's mocha
config keep PR #27188's
exit: true-free invariant.All API surface is additive on the existing
@legacy @betaclass. Per-package typetests mark
Class_FluidCacheforwardCompat: falseto acknowledge that addingmethods to a class is a forward-compat type-break by definition.
Reviewer Guidance
The review process is outlined on this wiki page.
Specific things I'd appreciate eyes on:
putIf. The crucial invariant is thatshouldWriteruns synchronously betweentx.store.getandtx.store.put, so thetransaction never yields to the event loop between the read and the decision. If you
see a way that any await sneaks in there (e.g. via the predicate, the
idblibrary'spromise wrapping, or the
existing?.partitionKeyaccess), please flag it.put/putIf/removeEntryemit an extraremoveevent stamped with the displaced partition's key). Confirm this matches theexpectations of hosts that run multiple partitions per process.
dispatchChangeEventpartition filtering:removeFileis delivered unconditionally,everything else gates on
event.partitionKey === this.partitionKey.null === nullis intentional.