Skip to content

Parallelize the indexer visit loop with width-aware bounded concurrency#4767

Draft
habdelra wants to merge 6 commits into
mainfrom
indexer-perf/parallel-visit-loop
Draft

Parallelize the indexer visit loop with width-aware bounded concurrency#4767
habdelra wants to merge 6 commits into
mainfrom
indexer-perf/parallel-visit-loop

Conversation

@habdelra
Copy link
Copy Markdown
Contributor

@habdelra habdelra commented May 11, 2026

Summary

Replaces the indexer's serial file-visit loop with a two-phase pipeline that produces a clean parallelism win without the self-referential prerender deadlock that bare parallelism would induce on realms with query-backed aggregator cards.

Phase 1 — pre-warm modules table. Before the file-visit phase starts, collect every module URL the upcoming visits will need and ensure each is present in the modules table. Module sub-renders run in parallel within the module queue, bypassing file admission.

Phase 2 — bounded parallel file visits. With the modules table populated, file renders proceed with bounded concurrency sized from the topological layer width of the invalidation graph and capped by a new INDEX_RUNNER_MAX_CONCURRENCY env var. No file render can fire a mid-flight module sub-prerender for a known module because phase 1 already populated the cache.

Why this is safe

The indexer is order-independent by construction (verified by tracing — Batch.#invalidations is fully populated by batch.invalidate() inside discoverInvalidations before the visit loop starts, renderer-side reads go through boxel_index without useWorkInProgressIndex, per-row writes use disjoint primary keys, and the Postgres pool checks out a fresh client per query). The existing topological sort is preserved for its heuristic value — it earns its keep under parallelism because modules sort ahead of instances.

Pre-warm signal sources

Every URL in the invalidation set contributes module URLs to the pre-warm set, in priority order:

  1. Existing boxel_index.deps — the runtime-captured dep list from the URL's prior successful render. Strongest signal; used for any URL that has a row in boxel_index_workingboxel_index.
  2. adoptsFrom.module read from disk — used for novel .json URLs that don't have a prior deps row yet (e.g. new instance files the indexer hasn't visited before).
  3. The URL itself — used for novel executable files (.gts / .ts / .js / .gjs); the file IS a module, pre-warm it directly.

Modules already in the cache are skipped. Misses fire prerenderer.prerenderModule in parallel; module sub-renders go through the module queue and never contend against the (yet-to-start) file-visit phase. Failures during pre-warm are warned but don't fail the batch — the visit phase will still fire on-demand sub-prerenders for any module that failed to pre-warm, degrading to pre-PR behavior for that specific module rather than deadlocking.

Concurrency formula

let totalWork    = invalidations.length;
let envelopeMax  = max(1, PRERENDER_AFFINITY_TAB_MAX - 1);
let hardCap      = parseInt(INDEX_RUNNER_MAX_CONCURRENCY ?? '4', 10) || 4;

if (totalWork < 10)        return 1;   // overhead-dominated, stays serial
if (maxLayerWidth <= 2)    return 1;   // near-linear chain, parallelism can't help
return min(envelopeMax, maxLayerWidth, hardCap);
  • Below 10 files → serial. Cold-tab tax exceeds parallelism payoff at that scale.
  • Linear-chain dep graphs → serial. Extra workers would all wait on the head of the chain.
  • Otherwise → bounded by the per-affinity envelope, the widest topological layer, and the hard cap.

The hard cap (INDEX_RUNNER_MAX_CONCURRENCY, default 4) is independent of the affinity envelope. It protects the prerender fleet from being saturated by one realm's reindex.

What Promise.allSettled buys us in the visit phase

The file-visit phase uses bounded Promise.allSettled so an unexpected throw doesn't strand in-flight visits. After all visits settle, the first rejection is re-thrown, which preserves the serial loop's "abort the batch on unexpected error" contract — tryToVisit's existing 404-tolerance is unchanged.

How each reindex flavour benefits

Trigger Modules-table state Effect of phase 1
Incremental on warm realm Mostly populated Pre-warm is trivial cache hits
_reindex (no clearLastModified) Populated Same — trivial
_grafana-reindex clearRealmCache(realm) empties cache Pre-warm fires serially-parallel module renders, exactly what would have happened mid-render anyway, but on our terms
_grafana-full-reindex clearAllModules() Same as above, per realm
First-ever from-scratch on a brand-new realm Empty Pre-warm uses adoptsFrom + .gts file URLs; modules table is populated before parallel .json phase

The brand-new realm case benefits the most from the adoptsFrom + .gts URL pre-warm path; it goes from "all serial" (no deps signal) to "modules pre-warmed, instances parallel."

Files

  • packages/runtime-common/index-runner/dependency-resolver.tsorderInvalidationsByDependencies now returns { ordered, maxLayerWidth, topoDepth } alongside the URL list. The Kahn walk tracks each node's longest dep-chain depth as it's dequeued and reports the layer-width / depth stats.
  • packages/runtime-common/index-runner.ts
    • New preWarmModulesTable(invalidations) method.
    • fromScratch / incremental visit loops replaced with runWithBoundedConcurrency over the topo-ordered URLs, sized via computeIndexVisitConcurrency.
    • New INDEX_RUNNER_MAX_CONCURRENCY env var.
    • Two new exported helpers: computeIndexVisitConcurrency, runWithBoundedConcurrency.
  • .claude/skills/indexing-diagnostics/SKILL.md — documents INDEX_RUNNER_MAX_CONCURRENCY in the prerender-tuning-knobs table; adds an "Indexer-side visit concurrency" section covering the sizing formula, the index-perf debug log line the runner emits, order-independence guarantees, and the regression symptoms to watch for.

Tests

Unit tests in packages/runtime-common/tests/:

  • index-runner-concurrency-test.ts — threshold gate, linear-chain gate, layer-width cap, hard-cap, envelope cap, malformed-env fallback for computeIndexVisitConcurrency; empty input, mixed fulfilled/rejected results, peak-in-flight bound, concurrency=1 (sequential), continue-past-rejection for runWithBoundedConcurrency.
  • index-runner-ordering-test.ts — empty / single-URL / flat fan-out / linear chain / diamond shapes for orderInvalidationsByDependencies's new maxLayerWidth / topoDepth output.

Wired into packages/realm-server/tests/index.ts.

Test plan

  • pnpm lint:types clean in @cardstack/runtime-common and @cardstack/realm-server.
  • pnpm lint:js clean for files this PR touches.
  • Unit tests cover the sizing helpers, the concurrency primitive, and the dep-resolver layer stats.
  • Full from-scratch reindex on /prudent-octopus (100 files) and /ambitious-piranha (461 files); verify no render-timeouts on aggregator cards and wall-clock improvement vs main. Numbers to be added once measured locally.
  • Lower INDEX_RUNNER_MAX_CONCURRENCY=1 and confirm the indexer respects it (wall-clock matches main, no errors).

🤖 Generated with Claude Code

@habdelra habdelra requested a review from Copilot May 11, 2026 19:33
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 91164b3045

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/runtime-common/index-runner.ts
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR parallelizes the indexer’s per-file visit loop using bounded concurrency sized from dependency-graph topology, aiming to reduce wall-clock indexing time while preventing a single realm from monopolizing prerender capacity.

Changes:

  • Introduces bounded-concurrency helpers (computeIndexVisitConcurrency, runWithBoundedConcurrency) and applies them to from-scratch and incremental visit loops.
  • Extends dependency ordering to report maxLayerWidth and topoDepth to inform concurrency sizing.
  • Adds shared unit tests in runtime-common and wires them into realm-server’s QUnit suite.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
packages/runtime-common/index-runner.ts Adds concurrency sizing + bounded execution utilities; parallelizes visit loops and logs plans.
packages/runtime-common/index-runner/dependency-resolver.ts Changes ordering API to return ordered URLs plus topology metrics used for sizing.
packages/runtime-common/tests/index-runner-concurrency-test.ts Shared tests for concurrency formula and bounded runner behavior.
packages/runtime-common/tests/index-runner-ordering-test.ts Shared tests for new topology metrics (maxLayerWidth, topoDepth).
packages/realm-server/tests/index.ts Registers the new realm-server test files.
packages/realm-server/tests/index-runner-concurrency-test.ts QUnit wrapper for shared concurrency tests.
packages/realm-server/tests/index-runner-ordering-test.ts QUnit wrapper for shared ordering/topology tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/runtime-common/index-runner.ts
Comment thread packages/runtime-common/index-runner/dependency-resolver.ts
Comment thread packages/runtime-common/index-runner.ts
Comment thread packages/runtime-common/index-runner.ts
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

Host Test Results

    1 files      1 suites   1h 45m 40s ⏱️
2 654 tests 2 639 ✅ 15 💤 0 ❌
2 673 runs  2 658 ✅ 15 💤 0 ❌

Results for commit af11838.

Realm Server Test Results

    1 files  ±  0      1 suites  ±0   11m 43s ⏱️ + 7m 40s
1 337 tests +880  1 328 ✅ +871  0 💤 ±0  9 ❌ +9 
1 416 runs  +953  1 407 ✅ +944  0 💤 ±0  9 ❌ +9 

Results for commit af11838. ± Comparison against earlier commit c739e97.

For more details on these errors, see this check.

habdelra and others added 3 commits May 11, 2026 18:06
The from-scratch and incremental visit loops in IndexRunner ran
strictly serially even though the prerender pool has 5 tabs available.
Wall-clock equalled the sum of per-file render times, so a 100-file
realm took 5 minutes locally and a 500-file realm scaled
proportionally. Module sub-prerenders only account for ~5% of the
total; the time is dominated by file-level renders that have no
ordering dependency on each other.

The indexer is order-independent: `Batch.#invalidations` is fully
populated by `batch.invalidate()` in `discoverInvalidations` before
the visit loop starts, and renderer-side reads go through
`boxel_index` (not the working table), so visit order never affects
what each render sees. The existing topological sort still produces a
deterministic ordering useful as a heuristic — modules sort first,
which lets parallel tabs amortize module-extract work through the
`modules` table.

`orderInvalidationsByDependencies` now also returns `maxLayerWidth`
(largest in-degree-0 set observed during the Kahn walk) and
`topoDepth` (number of layers). Both loops use those to size a
bounded `Promise.allSettled` queue:

  - tiny batches (<10 files): serial, because cold-tab tax exceeds
    parallelism payoff at that scale.
  - linear-chain batches (maxLayerWidth <= 2): serial, because extra
    workers would just wait on the head of the chain.
  - otherwise: min(envelope, layerWidth, INDEX_RUNNER_MAX_CONCURRENCY)
    where envelope = PRERENDER_AFFINITY_TAB_MAX - 1.

The new INDEX_RUNNER_MAX_CONCURRENCY env var (default 4) provides an
explicit cap so a single realm's reindex can't monopolise the
prerender server fleet even when the layer width and envelope would
permit more.

Error handling preserves the serial loop's "abort the batch on
unexpected error" contract: after `Promise.allSettled` resolves, the
first rejected result is re-thrown. The 404-tolerance inside
`tryToVisit` is unchanged.

Adds unit tests for `computeIndexVisitConcurrency`,
`runWithBoundedConcurrency`, and the new `maxLayerWidth` / `topoDepth`
output of `orderInvalidationsByDependencies`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Replace `hrefs.includes(invalidation.href)` with `hrefs.has(...)` in
  the incremental visit loop. The per-task `delete-and-in-seed` guard
  was O(n) under bounded concurrency; switching to a Set keeps it
  O(1) which matters once batches get wide.
- Document the new `INDEX_RUNNER_MAX_CONCURRENCY` env var in the
  prerender-tuning-knobs table of the indexing-diagnostics skill so
  operators can find it alongside the existing pool knobs.
- Add a dedicated "Indexer-side visit concurrency" section to the
  skill: the sizing formula, what each constraint means, what
  binding constraints look like in the `index-perf` debug log lines
  the runner now emits, why parallel visits are safe (order
  independence trace), and the regression symptoms to watch for
  (all-cold-tab batches, cross-realm fairness, `tabQueueMs` jumps).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Populate the modules table for every module the upcoming visit loop
is likely to need, before the parallel visit phase fires.

Why: a parallel visit batch where N concurrent file renders each
fire a same-affinity prerenderModule sub-render produces a self-
referential deadlock — the file renders hold the tabs the module
sub-renders need, the module sub-renders are queued behind the file
renders that are waiting on them. PagePool's per-affinity admission
cap is designed to prevent this for a single in-flight file render,
but it can't reserve enough headroom for N parallel file renders'
worth of sub-renders.

By pre-rendering modules in a controlled phase (parallel within the
module queue, but never mixed with file admissions), every later
file render finds its definitions already in cache and never
triggers a mid-render sub-prerender.

Signal sources, in priority order:

1. Existing `boxel_index.deps` — the runtime-captured dep list from
   the URL's prior successful render. Strongest signal; used for any
   URL that has a row in `boxel_index_working` ∪ `boxel_index`.
2. `adoptsFrom.module` read from disk — used for novel `.json` URLs
   that don't have a prior `deps` row yet (e.g. new instance files
   on the realm filesystem that the indexer hasn't visited before).
3. The URL itself — used for novel executable files (`.gts` / `.ts`
   / `.js` / `.gjs`); the file IS a module, so pre-warm it directly.

Modules already in the cache (`getModuleCacheEntries` query against
the `modules` table) are skipped — pre-warm is O(1) per hit. Misses
fire `prerenderer.prerenderModule` in parallel; module sub-renders
go through the module queue and bypass file admission, so spawning
many at once does not contend against the still-to-come file
admissions.

Failures in the pre-warm phase are warned but don't fail the batch.
A mid-render sub-prerender will still fire on demand if a module
fails to pre-warm — degrading to the pre-PR behavior for that
specific module, not deadlocking.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@habdelra habdelra force-pushed the indexer-perf/parallel-visit-loop branch from f50b9db to c739e97 Compare May 11, 2026 22:11
@habdelra habdelra requested a review from Copilot May 11, 2026 22:14
Comment on lines +677 to +691
let results = await Promise.allSettled(
misses.map((moduleUrl) =>
this.#prerenderer.prerenderModule({
affinityType: 'realm',
affinityValue: this.#realmURL.href,
realm: this.#realmURL.href,
url: moduleUrl,
auth: this.#auth,
renderOptions: {},
...(this.#jobPriority !== undefined
? { priority: this.#jobPriority }
: {}),
}),
),
);
Copy link
Copy Markdown
Contributor Author

@habdelra habdelra May 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we using this layer for the module warming--shouldnt we be using the definitions lookup? its a read thru cache so the act of getting it will kick off the module prerendering automaticaly

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code]

Right call. prerenderer.prerenderModule bypasses the cache layer's URL resolution, visibility probing, and scope/auth determination — the cache entry we populate could end up under a different (cacheScope, authUserId, resolvedRealmURL) tuple than the read-side renders use, which would explain why the cohorts still deadlock despite my pre-warm firing (the local benchmark just confirmed it: 299s on stack with pre-warm vs 190s on main, same 4 cohort + dashboard timeouts).

The blocker for switching is that lookupDefinition(codeRef) requires a name, and the strongest pre-warm signal (boxel_index.deps) gives me a URL list with no class names. Options:

  1. Add a URL-only prewarmModuleByUrl(moduleUrl) method to the DefinitionLookup interface that takes the same canonicalURLbuildLookupContextreadFromDatabaseCachegetModuleDefinitionsViaPrerenderer path that lookupDefinition uses, just without filtering down to a single named definition. Same read-through cache, same cache-scope determination, no signature change for callers that DO have a name.
  2. Walk boxel_index.types (the captured CodeRef chain on instance rows) instead of deps (URL list). Types gives codeRefs directly. Trade: misses linksTo/linksToMany targets' modules that deps captures.

I think (1) is right — keep the strongest signal (deps), add the URL-only entrypoint on DefinitionLookup. Will rework.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Comment on lines +362 to +373
let { ordered, maxLayerWidth, topoDepth } =
await current.#dependencyResolver.orderInvalidationsByDependencies(
invalidations,
sortedInvalidations,
);
let invalidations = ordered;
let concurrency = computeIndexVisitConcurrency(
invalidations.length,
maxLayerWidth,
);
current.#perfLog.debug(
`${jobIdentity(current.#jobInfo)} incremental visit plan: files=${invalidations.length} maxLayerWidth=${maxLayerWidth} topoDepth=${topoDepth} concurrency=${concurrency}`,
);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code]

Real bug. The pre-warm call is on the fromScratch path only; incremental still proceeds straight to the parallel visit loop. Will mirror it into the incremental path.

Comment on lines +621 to +639
let toWarm = new Set<string>();
for (let url of invalidations) {
let row = bestByUrl.get(url.href);
if (row?.deps?.length) {
for (let dep of row.deps) {
let resolved = canonicalURL(dep, url.href);
if (hasExecutableExtension(resolved)) {
toWarm.add(resolved);
}
}
} else if (hasExecutableExtension(url.href)) {
toWarm.add(url.href);
} else if (url.href.endsWith('.json')) {
let adoptsFromModule = await this.#readAdoptsFromModuleFromDisk(url);
if (adoptsFromModule && hasExecutableExtension(adoptsFromModule)) {
toWarm.add(adoptsFromModule);
}
}
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code]

Real perf issue. #readAdoptsFromModuleFromDisk does one async file read per invalidation and they're sequential because they're in a for-loop. Will batch them through Promise.all — they're independent and the realm filesystem can handle the parallel reads.

Comment on lines +677 to +691
let results = await Promise.allSettled(
misses.map((moduleUrl) =>
this.#prerenderer.prerenderModule({
affinityType: 'realm',
affinityValue: this.#realmURL.href,
realm: this.#realmURL.href,
url: moduleUrl,
auth: this.#auth,
renderOptions: {},
...(this.#jobPriority !== undefined
? { priority: this.#jobPriority }
: {}),
}),
),
);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code]

Same fix as comment above — batching the adoptsFrom reads.

## Prerender capacity tuning knobs

Three env vars control the per-prerender-server shape. They're resolved once at `PagePool` construction; changes require a process restart.
Four env vars control the per-prerender-server shape. The first three are resolved once at `PagePool` construction; the indexer-side cap is resolved per visit-loop entry. All require a process restart to change.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Code]

Real inconsistency in the wording. INDEX_RUNNER_MAX_CONCURRENCY is read once at module load (top-level process.env), not "per visit-loop entry" — same lifetime as the other knobs in the table. Will tighten the prose.

habdelra and others added 3 commits May 11, 2026 18:28
`INDEX_RUNNER_MAX_CONCURRENCY` is read at module load via top-level
`process.env`, not "per visit-loop entry" — same lifetime as the
PagePool knobs. The previous wording made it sound like the indexer
cap was re-read on every batch, which would have been a different
contract. Fix the prose so all four knobs are described as
"read-once, restart-to-change".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…reads

Replace the direct prerenderer.prerenderModule calls in
preWarmModulesTable with definitionLookup.getModuleCacheEntry, which
is the same URL-only entry into the loadModuleCacheEntry pipeline
that lookupDefinition uses. This routes pre-warm misses through the
in-flight dedup map, the cross-process coalescer (CS-10953), the
post-prerender persist, and the generation-snapshot machinery —
preventing a pre-warm started in parallel with another caller from
firing two prerenders for the same URL, or being silently overwritten
by an invalidation that fired mid-prerender.

Also parallelize the on-disk adoptsFrom reads that fan out for novel
.json invalidations so the upfront I/O isn't paid N sequential file
reads on a wide invalidation set.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a render times out and timing_diagnostics reports a 90-second
queryLoadsInFlight entry, the existing host-side capture tells us
*what* request was outstanding but not *what the server did for 90
seconds*. Add three correlated info-level emitters that fire above a
1s per-call threshold:

  - realm-server:federated-search (HTTP boundary): total request time
    plus the realm list it fanned out across.
  - realm:federated-search (cross-realm dispatch): per-realm wall
    time, sorted with the slowest first so a single slow realm in a
    large federation is immediately identifiable.
  - realm:index-query-engine (per-realm internals): primaryQuery vs
    loadLinks vs attachRealmInfo, plus the filter codeRefs digest so
    a Cohort/Portfolio fan-out is distinguishable from a generic
    type-only filter.

The 1s threshold keeps warm-cache traffic silent. Operators can dial
each logger up independently via LOG_LEVELS — the three names are
deliberately distinct so a triage session can enable only the layer
under investigation. The host-side queryLoadsInFlight entry still
carries the request URL, so cross-process correlation is one grep
away.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants