Skip to content

Realm-server: parallelize cross-realm fetches in loadLinks (CS-11039)#4670

Merged
habdelra merged 7 commits intomainfrom
cs-11039-parallelize-loadlinks
May 6, 2026
Merged

Realm-server: parallelize cross-realm fetches in loadLinks (CS-11039)#4670
habdelra merged 7 commits intomainfrom
cs-11039-parallelize-loadlinks

Conversation

@habdelra
Copy link
Copy Markdown
Contributor

@habdelra habdelra commented May 5, 2026

Summary

Closes CS-11039.

Builds on the BFS rewrite from #4668 (CS-11038) — that work is included in this PR's diff.

RealmIndexQueryEngine.loadLinks was the dominant cost behind _federated-search 90s timeouts on staging. PR #4668 collapsed in-realm relationship resolution to one batched DB query per BFS depth and ran the in-realm instance + file-meta queries concurrently. Cross-realm fetches were still serialized — Step 4 of the BFS awaited this.#fetch(entry.linkURL) per entry, so a layer with N cross-realm links did N sequential network round-trips on the realm-server's event loop while every other request waited.

This PR folds cross-realm fetches into Step 3's existing Promise.all so they run concurrently with the batched in-realm DB queries. Step 4 then resolves cross-realm linkResources from a prefetched map. Wall-clock cross-realm resolution at one BFS depth collapses from N sequential fetches to one event-loop tick.

Why this matters (staging data anchor, 2026-04-28 → 2026-05-05)

The 30+ concurrent _federated-search flood at UTC 13:24:25 produced response times spanning <500ms to >90s. The long-tail responses are the ones that issued N×M sequential awaits while the event loop serviced other requests. With #4668 alone, in-realm link traversal collapses; with this PR, cross-realm link traversal also collapses. Together they're multiplicative:

Per BFS layer Before #4668 With #4668 alone With #4668 + this PR
In-realm DB lookups N×M sequential 1 + 1 batched 1 + 1 batched
Cross-realm fetches K sequential K sequential K concurrent (Promise.all)
Wall-clock per layer sum of all (1 + 1) + Σ cross-realm (1 + 1) ∥ all-cross-realm

For a search whose results reference K unique cross-realm cards at one BFS depth, this PR turns K sequential awaits into one event-loop tick, bounded by HTTP keep-alive on the cross-realm fetch path. The PG connection pool and HTTP keep-alive pool are the natural backpressure surfaces — no explicit semaphore needed.

What changes

All in packages/runtime-common/realm-index-query-engine.ts:

  1. Step 2 (realm-index-query-engine.ts:1004, :1096) — while walking each entry's relationships, in addition to growing inRealmCardURLs / inRealmFileURLs, also grow a deduped crossRealmURLs Set. Dedup is strictly better than the prior code, which would re-fetch the same cross-realm URL once per entry that referenced it (sequentially!).

  2. Step 3 (realm-index-query-engine.ts:1117-1137) — the existing Promise.all over instance + file-meta DB queries gains a third arm:

    crossRealmURLs.size > 0
      ? this.fetchCrossRealmLinks([...crossRealmURLs])
      : Promise.resolve(new Map<string, CardResource<Saved>>())
  3. fetchCrossRealmLinks(urls) (realm-index-query-engine.ts:878) — new private helper. Issues this.#fetch per unique URL inside Promise.all, validates with isSingleCardDocument, surfaces failures via CardError.fromFetchResponse (same error path as before), and returns Map<urlHref, CardResource<Saved>>.

  4. Step 4 (realm-index-query-engine.ts:1160) — the cross-realm branch becomes a synchronous map lookup: linkResource = crossRealmMap.get(entry.linkURL.href). The rest of Step 4 (visited, descendStack, included push, relationship.data rewrite) is unchanged.

Relationship to sister tickets

Acceptance criteria (from ticket)

  • loadLinks outer caller issues per-resource link resolution in parallel — already true after Realm-server: batch link resolution in loadLinks (CS-11038) #4668's BFS; the outer for...await over doc.data was removed when loadLinks switched to taking rootResources: doc.data.
  • Cross-realm link fetches inside loadLinks issue in parallel within a single layer — this PR.
  • Recursion depth still bounded by visited; cycles still detected — visited semantics unchanged.
  • Cross-realm error path preserved — CardError.fromFetchResponse still thrown; Promise.all surfaces the first failure exactly as the original sequential code did.
  • No regression on existing test suite — pnpm lint clean on runtime-common; loadLinks: true integration tests in indexing-test.ts:2829, 2973 exercise cross-realm; load-links-batching-test.ts exercises in-realm batching.

Out of scope

  • Caching across requests.
  • Changing the loadLinks: true default.
  • A parallelization-specific assertion test (would require instrumenting the fetch hook to count concurrent in-flight calls; existing functional tests cover correctness).

Test plan

  • pnpm lint clean on runtime-common.
  • CI green on rebased SHA.

🤖 Generated with Claude Code

@habdelra habdelra force-pushed the cs-11039-parallelize-loadlinks branch from 860402e to c6d4f10 Compare May 5, 2026 23:46
@habdelra habdelra requested a review from Copilot May 5, 2026 23:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR parallelizes cross-realm relationship fetches during loadLinks BFS traversal so cross-realm link resolution happens concurrently with the batched in-realm DB queries, reducing per-layer network round-trips.

Changes:

  • Added fetchCrossRealmLinks() helper to fan out cross-realm card fetches and validate responses.
  • Collected cross-realm link URLs per BFS layer and fetched them in Step 3 via Promise.all.
  • Replaced per-entry cross-realm fetching in Step 4 with a synchronous lookup from a prefetched map.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/runtime-common/realm-index-query-engine.ts
Comment thread packages/runtime-common/realm-index-query-engine.ts Outdated
Comment thread packages/runtime-common/realm-index-query-engine.ts
Comment thread packages/runtime-common/realm-index-query-engine.ts
@habdelra habdelra requested a review from a team May 6, 2026 00:07
habdelra and others added 2 commits May 5, 2026 20:56
The batching BFS already collapses in-realm relationship resolution
to one batched DB query per recursion depth and runs the in-realm
instance + file-meta queries concurrently. Cross-realm links were
still serialized — Step 4 awaited each `this.#fetch(entry.linkURL)`
per entry.

Collect cross-realm URLs alongside the in-realm Sets in Step 2,
dedupe, and fold a `Promise.all`-fanout over those URLs into the
existing Step 3 `Promise.all` so they run concurrently with the
batched DB queries. Step 4 then resolves cross-realm linkResources
from a prefetched map rather than awaiting per-entry fetches.

Wall-clock cross-realm resolution at one BFS depth collapses from
N sequential fetches to one event-loop tick. Cycle / depth /
omit / data-rewrite semantics are preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`globalThis.fetch` accepts string URLs directly; the helper was
constructing a URL only to immediately consume one input that was
already a canonical `URL.href`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@habdelra habdelra force-pushed the cs-11039-parallelize-loadlinks branch from 7ab00c3 to aa222c7 Compare May 6, 2026 00:56
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Host Test Results

    1 files  +    1      1 suites  +1   2h 2m 6s ⏱️ + 2h 2m 6s
2 573 tests +2 573  2 558 ✅ +2 558  15 💤 +15  0 ❌ ±0 
2 592 runs  +2 592  2 577 ✅ +2 577  15 💤 +15  0 ❌ ±0 

Results for commit 85ad876. ± Comparison against earlier commit 5864c8d.

Realm Server Test Results

    1 files  ±    0      1 suites  +1   17m 4s ⏱️ + 17m 4s
1 256 tests +1 256  1 256 ✅ +1 256  0 💤 ±0  0 ❌ ±0 
1 334 runs  +1 334  1 334 ✅ +1 334  0 💤 ±0  0 ❌ ±0 

Results for commit 85ad876. ± Comparison against earlier commit 5864c8d.

habdelra added a commit that referenced this pull request May 6, 2026
PR #4668 has been seeing intermittent host-test failures with
generic "A network error occurred" or "Failed to fetch dynamically
imported module" messages — both are runner-level / teardown-time
flake signatures, and the same tests pass on PR #4670 (which
includes all the BFS code from #4668), so they look like
infrastructure flakes rather than code regressions. Adding
diagnostics anyway so the next CI run captures concrete data:

- An invocation id at loadLinks entry, repeated on every layer log
  line — lets us correlate logs from one loadLinks call across
  layers when investigating CI failures.
- Layer summaries (size, entry count, in-realm/cross-realm breakdown).
- Batch index lookup result counts + ms timing.
- try/catch around `Promise.all(populateQueryFields)` and the batched
  index lookup that logs the failing context before re-throwing.
- try/catch around the cross-realm fetch logging the URL and HTTP
  status (or thrown error message) before re-throwing.
- Completion log with layers visited / included size.

All under `this.#log.debug` (or `.warn` on the catch arms), so it
follows the existing logger semantics and stays out of production
noise unless the channel is enabled.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@habdelra habdelra changed the base branch from cs-11038-batch-load-links to main May 6, 2026 05:23
@habdelra habdelra merged commit c439b33 into main May 6, 2026
67 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants