Skip to content

CS-11077: in-memory parsed module-definition cache#4690

Closed
habdelra wants to merge 3 commits intocs-10952-cross-process-invalidation-broadcastfrom
cs-11077-parsed-module-cache
Closed

CS-11077: in-memory parsed module-definition cache#4690
habdelra wants to merge 3 commits intocs-10952-cross-process-invalidation-broadcastfrom
cs-11077-parsed-module-cache

Conversation

@habdelra
Copy link
Copy Markdown
Contributor

@habdelra habdelra commented May 6, 2026

Summary

Today, every warm-cache loadLinks GET against a card whose link tree adopts from a large software-factory module re-fetches and re-parses the same modules.definitions jsonb row 5–11 times per request. This PR adds a per-process, parsed-form working set above readFromDatabaseCache so each realm-server fetches+parses each row at most once between invalidations. The shared DB row stays the authoritative source of truth — nothing is being duplicated, and no per-process state is being introduced that peers need to know about.

Where state lives — and what this PR does and does not change

It's worth being careful about the term "in-memory cache" given the direction of the DB-Authoritative Realm Registry project. So, concretely:

Authoritative state, shared across all realm-server processes (DB):

  • modules.definitions jsonb — the canonical persisted output of a module prerender. Written once, read by every realm-server process. This PR does not change how it's written, what's stored in it, or who reads it.
  • boxel_index, realm_registry, queue tables — unchanged.

Per-process state, already in CachingDefinitionLookup today:

  • #moduleGenerations / #realmGenerations / #globalGeneration — three counters bumped synchronously by invalidate / clearRealmCache / clearAllModules before the DB DELETE. Used today only to discard mid-flight prerender results that started against a now-stale view.
  • #inFlight — coalesces concurrent same-key callers within a process so a single SQL round-trip is shared by all waiters; evicts on completion.

Per-process state added by this PR:

  • #parsedCache — a Map of parsed ModuleCacheEntry keyed by (canonicalModuleURL, cacheScope, cacheUserId, resolvedRealmURL). Each entry holds the result of one readFromDatabaseCache call along with the generation-counter snapshot that was current at insert time. Bounded by total parsed-bytes (PARSED_CACHE_BUDGET_CHARS = 256 MB JSON-text proxy for ~0.5–0.75 GB heap). LRU via Map insertion order.

The cache is purely a parsed-form working set of the DB row. It is reconcilable from the DB at any time: drop the entire Map and the next lookup re-reads + re-parses the DB row, exactly like today. No process needs to know what's in any other process's #parsedCache. There is no new shared state.

Findings — what we measured

Instrumented warm-cache GET against e2e-1/Validations/eval_sticky-note-1. Every relevant software-factory module was already populated in the modules table; no prerender fired during the measured GET. The realm-server process held a fresh CachingDefinitionLookup (in-memory generation counters all zero, #inFlight empty between layers).

total=28.1s  idxWait=0ms  cardDoc=28.1s
  layer 0 (Validations card)         layerMs=2.9s
  layer 1 (Issues/sticky-note)       layerMs=10.1s
  layer 2 (Project + 3 KAs)          layerMs=15.1s

darkfactory.gts                      ~11 reads, ~22s of read+parse total
eval-result.gts                       1 read,    1.3s
loadModuleCacheEntry events          MISS=14   COALESCE=7   (≈ 33% coalesce rate)
readFromDatabaseCache (per call)     sql ~1.5s   parse ~0.4s   blob ~60 MB

#inFlight does its job for concurrent same-key callers (layer 2's 4-card Promise.all collapses to 1 read), but evicts on completion. Sequential calls within a single GET — across layer boundaries, after each layer's Promise.all settles, and inside executeQueryForField follow-ups — each re-fetch and re-parse the same 60 MB DB row from scratch. Two-thirds of loadModuleCacheEntry invocations did not coalesce.

The DB row sizes are large by design (the dominant module re-exports the entire factory schema, and one-time-cost trimming is tracked separately):

eval-result.gts        ~41 MB    instantiate-result.gts ~41 MB
lint-result.gts        ~41 MB    parse-result.gts       ~41 MB
test-results.gts       ~41 MB    darkfactory.gts        ~62 MB

A direct DB-only bench (no realm-server, single PG connection) put the floor at 8.7s of read+parse work per such GET; observed end-to-end was 20–28s in production-shape conditions. Five user/cs-11003-e2e-{1..5} realms exhibited the same behavior:

e2e-1  eval/instantiate         20–22s   (subtype's module already cached)
e2e-1  lint/parse/test (first)  40–43s   (subtype's module is the only cold one)
e2e-1  lint/parse/test (next)   19–23s   (warm)
e2e-2..3                        20–30s
e2e-4..5                        11–22s   (PG + ts-node fully warm)

Eliminating the redundancy collapses 28s → ~5s. The residual is executeQueryForField issues (4.7s in this GET) + cloneDeep + per-resource search — separate concerns, tracked elsewhere.

Fix

Front-end readFromDatabaseCache with #parsedCache. Both lookupDefinition and lookupCachedDefinition paths benefit because both flow through readFromDatabaseCache.

  • On lookup: snapshot the three generation counters as the first synchronous step. Look up the key in #parsedCache. If a hit and the cached snapshot equals the current snapshot, LRU-touch it (delete + re-insert at end of Map iteration order) and return the parsed entry. If snapshots disagree (a local invalidation or peer NOTIFY has bumped a counter since insert), drop the entry, decrement byte-count, fall through to the SQL+parse path.
  • On insert: re-snapshot before writing. If a counter bumped between the lookup snapshot and the post-SQL snapshot — i.e. an invalidation arrived during the SQL round-trip — skip the cache write rather than insert an entry that's stale at birth. Subtract any existing entry's sizeChars before overwriting (see thread for the concurrent-replace race).
  • Errors are not cached: the existing isExpiredErrorEntry TTL path already handles them in DB; in-memory caching errors would either return a stale-but-still-cached error or require a TTL-aware in-memory eviction path. Not worth either.
  • Bounded by total parsed-bytes. Map insertion order = iteration order in JS, so re-inserting on hit gives LRU. Eviction loops while bytes exceed budget.

Why this stacks on cs-10952-cross-process-invalidation-broadcast (#4644)

CS-10952 is the piece that makes a per-process parsed-form cache safe to add at all in a multi-process realm-server topology. Two reasons, both load-bearing:

1. The bump helpers it makes public are exactly what my snapshot mechanism keys on. CS-10952 promotes bumpModuleGeneration / bumpRealmGeneration / bumpGlobalGeneration from private to public on CachingDefinitionLookup so a sibling ModuleCacheInvalidationListener can call them. My fix uses those same three counters as its invalidation contract. Landing this PR on top of CS-10952 means I get the public API for free; landing it independently would mean either duplicating the promotion or merging out of order.

2. The cross-process invalidation channel it adds is what closes the multi-process correctness gap. At N=1 (today's prod), local invalidate / clearRealmCache / clearAllModules already bump counters synchronously before the DB DELETE — so my snapshot check is sufficient: the next lookup sees the bump and self-evicts. At N>1, that's no longer true. Process A's local bump doesn't reach process B's counters. Without CS-10952, B's #parsedCache would happily serve a parsed entry that A has already deleted from the DB.

CS-10952 closes that gap end-to-end:

  • Each invalidation path (invalidate / clearRealmCache / clearAllModules) emits NOTIFY module_cache_invalidated with a structured payload (URL fan-out / realm scope / global) inside the same transaction as the DB DELETE, so peers only see the signal on commit.
  • Each realm-server runs a ModuleCacheInvalidationListener (subscribed via the multiplexed PgAdapter.subscribe from PgAdapter.subscribe: multiplex LISTEN over a shared client #4660) that parses incoming payloads and replays them by calling the now-public bump helpers on the local CachingDefinitionLookup.

So with both PRs in place, the chain on a peer invalidation is: A invalidates → A bumps locally → A's DELETE commits → NOTIFY fires → B's listener wakes → B's listener calls bumpModuleGeneration (etc.) on B's CachingDefinitionLookup → B's #parsedCache snapshot for the affected entry is now stale → B's next readFromDatabaseCache lookup observes the snapshot mismatch on its in-memory entry → B drops the entry and falls through to the (now-empty) DB → returns undefined (or the freshly-prerendered row if a peer raced to re-populate). Same persist-after-invalidate protection that already exists for in-flight prerenders today, just extended to longer-lived parsed entries.

End result: this cache ships safely under multi-process from day one, layered cleanly above the shared DB row without introducing any new cross-process state. Landing CS-11077 directly on main (without CS-10952 first) would be the failure mode worth being concerned about — that would be a per-process cache without a cross-process invalidation contract. With the stack, the contract is intact.

Out of scope, tracked separately

  • The definitions jsonb is read whole even though populateQueryFields only needs the fields slice for one specific card type. A jsonb path read or a sibling slim column would shrink the read from ~60 MB to a few KB and would help every realm-server, not just ones with a warm #parsedCache.
  • darkfactory.gts's definitions['…/Issue' | '…/Project' | '…/DarkFactory'] are 20+ MB each because of transitive schema embedding in definition.fields. Trimming would help every lookup of that module.
  • executeQueryForField for Project.issues linksToMany was 4.7s on its own in the measured GET — separate index-search question, not module-cache-related.

Test plan

  • pnpm test-module for definition-lookup-test.ts — all 28 tests green, including 3 new ones:
    • parsed-entry cache short-circuits the DB read on repeat lookups
    • bumpModuleGeneration invalidates the parsed-entry cache for that module (the cross-process invalidation arrival from CS-10952's listener is structurally identical to a direct call here)
    • bumpGlobalGeneration invalidates the parsed-entry cache for every entry
  • pnpm lint clean for both runtime-common and realm-server
  • End-to-end check: re-run the warm-cache Validations GET measurement against a local stack with this branch — expect 28s → ~5s

Front-end readFromDatabaseCache with a parsed-entry cache so warm-cache
loadLinks GETs stop re-fetching and re-parsing the same modules.definitions
jsonb on every populateQueryFields call. Each cached entry snapshots the
three generation counters at insertion; lookups validate against current
counters and self-evict on mismatch — so a local invalidate / clearRealmCache
/ clearAllModules, or a remote bump arriving via the cross-process NOTIFY
listener, drops the entry on next access.

Bounded by total parsed-bytes (PARSED_CACHE_BUDGET_CHARS, 256 MB JSON-text
proxy for ~0.5–0.75 GB of resident heap). Map insertion order = iteration
order, so re-inserting on hit gives LRU.

Error entries are not cached: they have a TTL and the SQL path already
handles them correctly. Re-snapshots before insert so a generation bump
during the SQL round-trip skips the cache write.

Tests:
- parsed-entry cache short-circuits the DB read on repeat lookups
- bumpModuleGeneration invalidates the parsed-entry cache for that module
- bumpGlobalGeneration invalidates the parsed-entry cache for every entry

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 557b7e4e0d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1103 to +1108
this.#parsedCache.set(cacheKey, {
entry,
snapshot: snapshotAtPersist,
sizeChars,
});
this.#parsedCacheBytes += sizeChars;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Subtract replaced parsed-cache entries before adding new ones

When concurrent cache-only lookups for the same module both miss #parsedCache, they can each finish the SQL read and execute this set for the same cacheKey; Map#set replaces the first entry, but #parsedCacheBytes is incremented again without subtracting the replaced entry. I checked lookupCachedDefinition, and it calls readFromDatabaseCache directly rather than going through #inFlight, so this is reachable for parallel cache-only definition reads. Once the inflated counter crosses the budget, eviction subtracts only entries still present in the map and can leave #parsedCacheBytes positive even with an empty map, causing unnecessary eviction/cache thrash thereafter.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Real bug — confirmed reachable via lookupCachedDefinition, which calls readFromDatabaseCache directly and bypasses #inFlight, so two concurrent same-key cache-only readers can both finish the SQL round-trip and both reach the set. Map#set overwrites the value but the bytes counter was previously double-counted; over time it drifts above actual map weight and forces spurious evictions.

Fixed in af9405c: subtract the replaced entry's sizeChars before the overwrite. Comment in the code points back to this thread's diagnosis.

When two concurrent cache-only callers both miss `#parsedCache` for the
same key and both finish the SQL round-trip, the second `Map#set`
overwrites the first but `#parsedCacheBytes` was previously incremented
twice. Over time the counter drifted above actual map weight and forced
spurious LRU evictions.

Reachable via `lookupCachedDefinition`, which calls `readFromDatabaseCache`
directly — bypassing `#inFlight`, so two concurrent same-key cache-only
reads don't coalesce.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an in-memory, LRU-bounded cache of parsed ModuleCacheEntry objects in CachingDefinitionLookup to avoid repeated DB fetch + JSON parsing of large modules.definitions rows during warm-cache requests, with generation-snapshot validation to self-evict on invalidations (local or cross-process via generation bumps).

Changes:

  • Add #parsedCache in CachingDefinitionLookup that short-circuits readFromDatabaseCache when generation snapshots still match, plus LRU eviction by approximate entry size.
  • Track per-entry generation snapshots (module/realm/global) and drop cache entries on mismatch before falling back to SQL.
  • Add tests covering cache short-circuiting and invalidation via bumpModuleGeneration / bumpGlobalGeneration.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
packages/runtime-common/definition-lookup.ts Introduces in-memory parsed-entry cache in front of DB reads with snapshot validation and LRU eviction.
packages/realm-server/tests/definition-lookup-test.ts Adds regression tests for parsed-cache hits and generation-bump invalidation behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +150 to +164
// amortized over every subsequent hit. JSON.stringify can throw on
// unexpected shapes (cycles, BigInts); fall back to 0 so a single
// pathological insert can't poison the cache.
function approximateEntrySizeChars(
entry: ModuleCacheEntry,
rawDefinitions: unknown,
): number {
if (typeof rawDefinitions === 'string') {
return rawDefinitions.length;
}
try {
return JSON.stringify(entry.definitions).length;
} catch (_err) {
return 0;
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair — the safety hatch was leaky. Fixed in 3eb6b3a: approximateEntrySizeChars still returns 0 as the "size unknown" sentinel, but the caller now treats 0 as "skip caching" so a pathological insert can't bypass the budget cap. Realistic prerender output won't trip this (module definitions are plain JSON-shaped), but the safety hatch needed to stay closed.

Comment on lines +283 to +292
// Parsed-entry cache. Front-ends `readFromDatabaseCache` so successful
// cache reads avoid re-running the SQL fetch + JSON.parse of the
// `modules.definitions` jsonb column on every lookup. Each entry
// captures the three-counter generation snapshot at insertion; lookups
// validate against the current snapshot and drop stale entries — so a
// local invalidate/clearRealmCache/clearAllModules (or a remote bump
// arriving via the cross-process NOTIFY listener) self-evicts on next
// access. Map insertion order is iteration order in JS, so re-inserting
// on hit gives LRU eviction. Bounded by `PARSED_CACHE_BUDGET_CHARS`
// (JSON-text-size proxy for heap).
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — snapshot validation alone leaves stale entries resident until they're touched (or LRU-pressured out under unrelated work). Fixed in 3eb6b3a by adding eager-evict inside the bump helpers themselves:

  • bumpGlobalGeneration clears the entire #parsedCache and resets #parsedCacheChars.
  • bumpRealmGeneration evicts every entry whose key starts with ${resolvedRealmURL}|.
  • bumpModuleGeneration evicts every entry whose key starts with ${resolvedRealmURL}|${moduleURL}| (narrows to the affected module across all (scope, user) pairs).

Doing it inside the bump helpers covers both directions: local invalidate / clearRealmCache / clearAllModules already funnel through these, and so does CS-10952's ModuleCacheInvalidationListener when it replays a peer NOTIFY. So a peer's wholesale invalidation now frees this process's parsed-cache memory promptly without waiting for access pressure.

sizeChars: number;
}
>();
#parsedCacheBytes = 0;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed in 3eb6b3a: #parsedCacheBytes#parsedCacheChars. Matches the units of sizeChars and PARSED_CACHE_BUDGET_CHARS and avoids the implication that we're tracking a JS-heap byte count (we're not — it's the JSON-text character count, used as a rough proxy for heap weight).

…t on bump; rename chars counter

Three Copilot review threads on the parsed-entry cache:

1. `approximateEntrySizeChars` returns 0 when JSON.stringify throws on a
   pathological shape (cycles, BigInts). The previous code still cached
   the entry, so a zero-weight insert wouldn't count against the
   eviction budget — silently disabling the cap for any pathological
   row that landed first. Skip the insert when sizeChars === 0 instead.
2. `bumpGlobalGeneration` / `bumpRealmGeneration` / `bumpModuleGeneration`
   leave snapshot-stale parsed entries resident until they're touched
   (or LRU-pressured out). Eager-evict matching entries inside the bump
   methods so both local invalidations and CS-10952's cross-process
   NOTIFY-replay free memory immediately.
3. `#parsedCacheBytes` actually accumulates `sizeChars` (character
   counts) and is compared against `PARSED_CACHE_BUDGET_CHARS`. Renamed
   to `#parsedCacheChars` so the unit is obvious.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@habdelra
Copy link
Copy Markdown
Contributor Author

habdelra commented May 7, 2026

Closing in favor of CS-11079 + a request-scoped memoization pass.

The work in this PR is correct, but post-instrumentation we found the underlying modules.definitions blob is ~99.96% redundant — Issue's definition.fields has 74,977 entries that resolve to only 28 distinct values, with the most common (a StringField contains field-def) literally byte-identical at 19,290 different keys. Fixing that at the source (CS-11079) shrinks darkfactory.gts from ~62 MB → ~12 MB and the per-blob read+parse from ~2s → ~0.4s.

Once that lands, the per-GET cost of the original problem (28s) is dominated not by the per-blob parse but by re-reading the same module 11× within one GET. That's better solved with a request-scoped Map<canonicalModuleURL, Promise<ModuleCacheEntry>> passed through loadLinks / populateQueryFields — no long-lived in-process state, no cross-process invalidation contract, and steady-state perf equivalent to this PR.

Net result: the long-lived parsed-entry cache adds complexity (cross-process bump-via-NOTIFY contract, byte-bounded LRU) for a benefit that disappears once CS-11079 makes parsing cheap. Strictly aligned with the DB-Authoritative Realm Registry direction to keep per-process state minimal.

Tracking the new approach at CS-11079.

@habdelra habdelra closed this May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants