Parallelize the indexer visit loop with width-aware bounded concurrency#4767
Parallelize the indexer visit loop with width-aware bounded concurrency#4767habdelra wants to merge 6 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 91164b3045
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR parallelizes the indexer’s per-file visit loop using bounded concurrency sized from dependency-graph topology, aiming to reduce wall-clock indexing time while preventing a single realm from monopolizing prerender capacity.
Changes:
- Introduces bounded-concurrency helpers (
computeIndexVisitConcurrency,runWithBoundedConcurrency) and applies them to from-scratch and incremental visit loops. - Extends dependency ordering to report
maxLayerWidthandtopoDepthto inform concurrency sizing. - Adds shared unit tests in
runtime-commonand wires them intorealm-server’s QUnit suite.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/runtime-common/index-runner.ts | Adds concurrency sizing + bounded execution utilities; parallelizes visit loops and logs plans. |
| packages/runtime-common/index-runner/dependency-resolver.ts | Changes ordering API to return ordered URLs plus topology metrics used for sizing. |
| packages/runtime-common/tests/index-runner-concurrency-test.ts | Shared tests for concurrency formula and bounded runner behavior. |
| packages/runtime-common/tests/index-runner-ordering-test.ts | Shared tests for new topology metrics (maxLayerWidth, topoDepth). |
| packages/realm-server/tests/index.ts | Registers the new realm-server test files. |
| packages/realm-server/tests/index-runner-concurrency-test.ts | QUnit wrapper for shared concurrency tests. |
| packages/realm-server/tests/index-runner-ordering-test.ts | QUnit wrapper for shared ordering/topology tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Host Test Results 1 files 1 suites 1h 45m 40s ⏱️ Results for commit af11838. Realm Server Test Results 1 files ± 0 1 suites ±0 11m 43s ⏱️ + 7m 40s Results for commit af11838. ± Comparison against earlier commit c739e97. For more details on these errors, see this check. |
The from-scratch and incremental visit loops in IndexRunner ran
strictly serially even though the prerender pool has 5 tabs available.
Wall-clock equalled the sum of per-file render times, so a 100-file
realm took 5 minutes locally and a 500-file realm scaled
proportionally. Module sub-prerenders only account for ~5% of the
total; the time is dominated by file-level renders that have no
ordering dependency on each other.
The indexer is order-independent: `Batch.#invalidations` is fully
populated by `batch.invalidate()` in `discoverInvalidations` before
the visit loop starts, and renderer-side reads go through
`boxel_index` (not the working table), so visit order never affects
what each render sees. The existing topological sort still produces a
deterministic ordering useful as a heuristic — modules sort first,
which lets parallel tabs amortize module-extract work through the
`modules` table.
`orderInvalidationsByDependencies` now also returns `maxLayerWidth`
(largest in-degree-0 set observed during the Kahn walk) and
`topoDepth` (number of layers). Both loops use those to size a
bounded `Promise.allSettled` queue:
- tiny batches (<10 files): serial, because cold-tab tax exceeds
parallelism payoff at that scale.
- linear-chain batches (maxLayerWidth <= 2): serial, because extra
workers would just wait on the head of the chain.
- otherwise: min(envelope, layerWidth, INDEX_RUNNER_MAX_CONCURRENCY)
where envelope = PRERENDER_AFFINITY_TAB_MAX - 1.
The new INDEX_RUNNER_MAX_CONCURRENCY env var (default 4) provides an
explicit cap so a single realm's reindex can't monopolise the
prerender server fleet even when the layer width and envelope would
permit more.
Error handling preserves the serial loop's "abort the batch on
unexpected error" contract: after `Promise.allSettled` resolves, the
first rejected result is re-thrown. The 404-tolerance inside
`tryToVisit` is unchanged.
Adds unit tests for `computeIndexVisitConcurrency`,
`runWithBoundedConcurrency`, and the new `maxLayerWidth` / `topoDepth`
output of `orderInvalidationsByDependencies`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Replace `hrefs.includes(invalidation.href)` with `hrefs.has(...)` in the incremental visit loop. The per-task `delete-and-in-seed` guard was O(n) under bounded concurrency; switching to a Set keeps it O(1) which matters once batches get wide. - Document the new `INDEX_RUNNER_MAX_CONCURRENCY` env var in the prerender-tuning-knobs table of the indexing-diagnostics skill so operators can find it alongside the existing pool knobs. - Add a dedicated "Indexer-side visit concurrency" section to the skill: the sizing formula, what each constraint means, what binding constraints look like in the `index-perf` debug log lines the runner now emits, why parallel visits are safe (order independence trace), and the regression symptoms to watch for (all-cold-tab batches, cross-realm fairness, `tabQueueMs` jumps). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Populate the modules table for every module the upcoming visit loop is likely to need, before the parallel visit phase fires. Why: a parallel visit batch where N concurrent file renders each fire a same-affinity prerenderModule sub-render produces a self- referential deadlock — the file renders hold the tabs the module sub-renders need, the module sub-renders are queued behind the file renders that are waiting on them. PagePool's per-affinity admission cap is designed to prevent this for a single in-flight file render, but it can't reserve enough headroom for N parallel file renders' worth of sub-renders. By pre-rendering modules in a controlled phase (parallel within the module queue, but never mixed with file admissions), every later file render finds its definitions already in cache and never triggers a mid-render sub-prerender. Signal sources, in priority order: 1. Existing `boxel_index.deps` — the runtime-captured dep list from the URL's prior successful render. Strongest signal; used for any URL that has a row in `boxel_index_working` ∪ `boxel_index`. 2. `adoptsFrom.module` read from disk — used for novel `.json` URLs that don't have a prior `deps` row yet (e.g. new instance files on the realm filesystem that the indexer hasn't visited before). 3. The URL itself — used for novel executable files (`.gts` / `.ts` / `.js` / `.gjs`); the file IS a module, so pre-warm it directly. Modules already in the cache (`getModuleCacheEntries` query against the `modules` table) are skipped — pre-warm is O(1) per hit. Misses fire `prerenderer.prerenderModule` in parallel; module sub-renders go through the module queue and bypass file admission, so spawning many at once does not contend against the still-to-come file admissions. Failures in the pre-warm phase are warned but don't fail the batch. A mid-render sub-prerender will still fire on demand if a module fails to pre-warm — degrading to the pre-PR behavior for that specific module, not deadlocking. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
f50b9db to
c739e97
Compare
| let results = await Promise.allSettled( | ||
| misses.map((moduleUrl) => | ||
| this.#prerenderer.prerenderModule({ | ||
| affinityType: 'realm', | ||
| affinityValue: this.#realmURL.href, | ||
| realm: this.#realmURL.href, | ||
| url: moduleUrl, | ||
| auth: this.#auth, | ||
| renderOptions: {}, | ||
| ...(this.#jobPriority !== undefined | ||
| ? { priority: this.#jobPriority } | ||
| : {}), | ||
| }), | ||
| ), | ||
| ); |
There was a problem hiding this comment.
why are we using this layer for the module warming--shouldnt we be using the definitions lookup? its a read thru cache so the act of getting it will kick off the module prerendering automaticaly
There was a problem hiding this comment.
[Claude Code]
Right call. prerenderer.prerenderModule bypasses the cache layer's URL resolution, visibility probing, and scope/auth determination — the cache entry we populate could end up under a different (cacheScope, authUserId, resolvedRealmURL) tuple than the read-side renders use, which would explain why the cohorts still deadlock despite my pre-warm firing (the local benchmark just confirmed it: 299s on stack with pre-warm vs 190s on main, same 4 cohort + dashboard timeouts).
The blocker for switching is that lookupDefinition(codeRef) requires a name, and the strongest pre-warm signal (boxel_index.deps) gives me a URL list with no class names. Options:
- Add a URL-only
prewarmModuleByUrl(moduleUrl)method to theDefinitionLookupinterface that takes the samecanonicalURL→buildLookupContext→readFromDatabaseCache→getModuleDefinitionsViaPrerendererpath thatlookupDefinitionuses, just without filtering down to a single named definition. Same read-through cache, same cache-scope determination, no signature change for callers that DO have a name. - Walk
boxel_index.types(the captured CodeRef chain on instance rows) instead ofdeps(URL list). Types gives codeRefs directly. Trade: misses linksTo/linksToMany targets' modules that deps captures.
I think (1) is right — keep the strongest signal (deps), add the URL-only entrypoint on DefinitionLookup. Will rework.
| let { ordered, maxLayerWidth, topoDepth } = | ||
| await current.#dependencyResolver.orderInvalidationsByDependencies( | ||
| invalidations, | ||
| sortedInvalidations, | ||
| ); | ||
| let invalidations = ordered; | ||
| let concurrency = computeIndexVisitConcurrency( | ||
| invalidations.length, | ||
| maxLayerWidth, | ||
| ); | ||
| current.#perfLog.debug( | ||
| `${jobIdentity(current.#jobInfo)} incremental visit plan: files=${invalidations.length} maxLayerWidth=${maxLayerWidth} topoDepth=${topoDepth} concurrency=${concurrency}`, | ||
| ); |
There was a problem hiding this comment.
[Claude Code]
Real bug. The pre-warm call is on the fromScratch path only; incremental still proceeds straight to the parallel visit loop. Will mirror it into the incremental path.
| let toWarm = new Set<string>(); | ||
| for (let url of invalidations) { | ||
| let row = bestByUrl.get(url.href); | ||
| if (row?.deps?.length) { | ||
| for (let dep of row.deps) { | ||
| let resolved = canonicalURL(dep, url.href); | ||
| if (hasExecutableExtension(resolved)) { | ||
| toWarm.add(resolved); | ||
| } | ||
| } | ||
| } else if (hasExecutableExtension(url.href)) { | ||
| toWarm.add(url.href); | ||
| } else if (url.href.endsWith('.json')) { | ||
| let adoptsFromModule = await this.#readAdoptsFromModuleFromDisk(url); | ||
| if (adoptsFromModule && hasExecutableExtension(adoptsFromModule)) { | ||
| toWarm.add(adoptsFromModule); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
[Claude Code]
Real perf issue. #readAdoptsFromModuleFromDisk does one async file read per invalidation and they're sequential because they're in a for-loop. Will batch them through Promise.all — they're independent and the realm filesystem can handle the parallel reads.
| let results = await Promise.allSettled( | ||
| misses.map((moduleUrl) => | ||
| this.#prerenderer.prerenderModule({ | ||
| affinityType: 'realm', | ||
| affinityValue: this.#realmURL.href, | ||
| realm: this.#realmURL.href, | ||
| url: moduleUrl, | ||
| auth: this.#auth, | ||
| renderOptions: {}, | ||
| ...(this.#jobPriority !== undefined | ||
| ? { priority: this.#jobPriority } | ||
| : {}), | ||
| }), | ||
| ), | ||
| ); |
There was a problem hiding this comment.
[Claude Code]
Same fix as comment above — batching the adoptsFrom reads.
| ## Prerender capacity tuning knobs | ||
|
|
||
| Three env vars control the per-prerender-server shape. They're resolved once at `PagePool` construction; changes require a process restart. | ||
| Four env vars control the per-prerender-server shape. The first three are resolved once at `PagePool` construction; the indexer-side cap is resolved per visit-loop entry. All require a process restart to change. |
There was a problem hiding this comment.
[Claude Code]
Real inconsistency in the wording. INDEX_RUNNER_MAX_CONCURRENCY is read once at module load (top-level process.env), not "per visit-loop entry" — same lifetime as the other knobs in the table. Will tighten the prose.
`INDEX_RUNNER_MAX_CONCURRENCY` is read at module load via top-level `process.env`, not "per visit-loop entry" — same lifetime as the PagePool knobs. The previous wording made it sound like the indexer cap was re-read on every batch, which would have been a different contract. Fix the prose so all four knobs are described as "read-once, restart-to-change". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…reads Replace the direct prerenderer.prerenderModule calls in preWarmModulesTable with definitionLookup.getModuleCacheEntry, which is the same URL-only entry into the loadModuleCacheEntry pipeline that lookupDefinition uses. This routes pre-warm misses through the in-flight dedup map, the cross-process coalescer (CS-10953), the post-prerender persist, and the generation-snapshot machinery — preventing a pre-warm started in parallel with another caller from firing two prerenders for the same URL, or being silently overwritten by an invalidation that fired mid-prerender. Also parallelize the on-disk adoptsFrom reads that fan out for novel .json invalidations so the upfront I/O isn't paid N sequential file reads on a wide invalidation set. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a render times out and timing_diagnostics reports a 90-second
queryLoadsInFlight entry, the existing host-side capture tells us
*what* request was outstanding but not *what the server did for 90
seconds*. Add three correlated info-level emitters that fire above a
1s per-call threshold:
- realm-server:federated-search (HTTP boundary): total request time
plus the realm list it fanned out across.
- realm:federated-search (cross-realm dispatch): per-realm wall
time, sorted with the slowest first so a single slow realm in a
large federation is immediately identifiable.
- realm:index-query-engine (per-realm internals): primaryQuery vs
loadLinks vs attachRealmInfo, plus the filter codeRefs digest so
a Cohort/Portfolio fan-out is distinguishable from a generic
type-only filter.
The 1s threshold keeps warm-cache traffic silent. Operators can dial
each logger up independently via LOG_LEVELS — the three names are
deliberately distinct so a triage session can enable only the layer
under investigation. The host-side queryLoadsInFlight entry still
carries the request URL, so cross-process correlation is one grep
away.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Replaces the indexer's serial file-visit loop with a two-phase pipeline that produces a clean parallelism win without the self-referential prerender deadlock that bare parallelism would induce on realms with query-backed aggregator cards.
Phase 1 — pre-warm modules table. Before the file-visit phase starts, collect every module URL the upcoming visits will need and ensure each is present in the
modulestable. Module sub-renders run in parallel within the module queue, bypassing file admission.Phase 2 — bounded parallel file visits. With the modules table populated, file renders proceed with bounded concurrency sized from the topological layer width of the invalidation graph and capped by a new
INDEX_RUNNER_MAX_CONCURRENCYenv var. No file render can fire a mid-flight module sub-prerender for a known module because phase 1 already populated the cache.Why this is safe
The indexer is order-independent by construction (verified by tracing —
Batch.#invalidationsis fully populated bybatch.invalidate()insidediscoverInvalidationsbefore the visit loop starts, renderer-side reads go throughboxel_indexwithoutuseWorkInProgressIndex, per-row writes use disjoint primary keys, and the Postgres pool checks out a fresh client per query). The existing topological sort is preserved for its heuristic value — it earns its keep under parallelism because modules sort ahead of instances.Pre-warm signal sources
Every URL in the invalidation set contributes module URLs to the pre-warm set, in priority order:
boxel_index.deps— the runtime-captured dep list from the URL's prior successful render. Strongest signal; used for any URL that has a row inboxel_index_working∪boxel_index.adoptsFrom.moduleread from disk — used for novel.jsonURLs that don't have a priordepsrow yet (e.g. new instance files the indexer hasn't visited before)..gts/.ts/.js/.gjs); the file IS a module, pre-warm it directly.Modules already in the cache are skipped. Misses fire
prerenderer.prerenderModulein parallel; module sub-renders go through the module queue and never contend against the (yet-to-start) file-visit phase. Failures during pre-warm are warned but don't fail the batch — the visit phase will still fire on-demand sub-prerenders for any module that failed to pre-warm, degrading to pre-PR behavior for that specific module rather than deadlocking.Concurrency formula
The hard cap (
INDEX_RUNNER_MAX_CONCURRENCY, default4) is independent of the affinity envelope. It protects the prerender fleet from being saturated by one realm's reindex.What
Promise.allSettledbuys us in the visit phaseThe file-visit phase uses bounded
Promise.allSettledso an unexpected throw doesn't strand in-flight visits. After all visits settle, the first rejection is re-thrown, which preserves the serial loop's "abort the batch on unexpected error" contract —tryToVisit's existing 404-tolerance is unchanged.How each reindex flavour benefits
_reindex(no clearLastModified)_grafana-reindexclearRealmCache(realm)empties cache_grafana-full-reindexclearAllModules().jsonphaseThe brand-new realm case benefits the most from the adoptsFrom + .gts URL pre-warm path; it goes from "all serial" (no deps signal) to "modules pre-warmed, instances parallel."
Files
packages/runtime-common/index-runner/dependency-resolver.ts—orderInvalidationsByDependenciesnow returns{ ordered, maxLayerWidth, topoDepth }alongside the URL list. The Kahn walk tracks each node's longest dep-chain depth as it's dequeued and reports the layer-width / depth stats.packages/runtime-common/index-runner.tspreWarmModulesTable(invalidations)method.fromScratch/incrementalvisit loops replaced withrunWithBoundedConcurrencyover the topo-ordered URLs, sized viacomputeIndexVisitConcurrency.INDEX_RUNNER_MAX_CONCURRENCYenv var.computeIndexVisitConcurrency,runWithBoundedConcurrency..claude/skills/indexing-diagnostics/SKILL.md— documentsINDEX_RUNNER_MAX_CONCURRENCYin the prerender-tuning-knobs table; adds an "Indexer-side visit concurrency" section covering the sizing formula, theindex-perfdebug log line the runner emits, order-independence guarantees, and the regression symptoms to watch for.Tests
Unit tests in
packages/runtime-common/tests/:index-runner-concurrency-test.ts— threshold gate, linear-chain gate, layer-width cap, hard-cap, envelope cap, malformed-env fallback forcomputeIndexVisitConcurrency; empty input, mixed fulfilled/rejected results, peak-in-flight bound,concurrency=1(sequential), continue-past-rejection forrunWithBoundedConcurrency.index-runner-ordering-test.ts— empty / single-URL / flat fan-out / linear chain / diamond shapes fororderInvalidationsByDependencies's newmaxLayerWidth/topoDepthoutput.Wired into
packages/realm-server/tests/index.ts.Test plan
pnpm lint:typesclean in@cardstack/runtime-commonand@cardstack/realm-server.pnpm lint:jsclean for files this PR touches./prudent-octopus(100 files) and/ambitious-piranha(461 files); verify no render-timeouts on aggregator cards and wall-clock improvement vs main. Numbers to be added once measured locally.INDEX_RUNNER_MAX_CONCURRENCY=1and confirm the indexer respects it (wall-clock matches main, no errors).🤖 Generated with Claude Code