Realm-server: batch link resolution in loadLinks (CS-11038)#4668
Realm-server: batch link resolution in loadLinks (CS-11038)#4668
Conversation
`RealmIndexQueryEngine.loadLinks` resolved every linked card with its own `getInstance(linkURL)` round-trip. For an N-result search with M average links per card, that produced N×M sequential `SELECT * FROM boxel_index WHERE i.url = $1` round-trips, which was the dominant cost behind `_federated-search` 90s timeouts on staging. Restructured loadLinks as a level-order BFS: - each layer collects every in-realm link URL into a Set - one batched `WHERE i.url IN (...)` query (and one for file-meta) per recursion depth resolves them all in one round-trip - per-entry hydration walks the resulting Map, with the same relationship.data rewrite + cycle/depth semantics as the original - cross-realm links keep the existing per-link fetch path (parallelizing those is a separate ticket) Added `IndexQueryEngine.getInstances` and `getFiles` as the batched counterparts of the existing single-URL methods, and refactored the single-URL methods to share row-mapping helpers. Regression test in `realm-server/tests/load-links-batching-test.ts` asserts that a 50-source/5-target search issues zero per-link lookups and ≤ 2 batched-link queries (1 per recursion depth). Closes CS-11038. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f20dc0201a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The original recursive `loadLinks` checked `stack.length <= maxLinkDepth` BEFORE pushing the current resource onto the stack for the recursive call, so at the boundary (stack.length == maxLinkDepth) it still recursed once more — depth-(maxLinkDepth+1) resources ended up in `included[]` with their `relationship.data` rewritten. The BFS rewrite was checking `descendStack.length` (which already includes the current parent), cutting traversal off one level early. Use `entry.item.stack.length` (ancestors only) for the gate; keep `descendStack` for the nextLayer push so children carry the right ancestor chain. Caught by Codex review on PR #4668. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR targets the _federated-search/loadLinks hot path in runtime-common by replacing per-link instance lookups with batched, breadth-first resolution. It fits into the realm/indexing layer where search results are hydrated into JSON:API documents with included[] relationships.
Changes:
- Refactors
RealmIndexQueryEngine.loadLinksfrom recursive per-resource traversal to level-order BFS across all root resources. - Adds batched
IndexQueryEngine.getInstances()andgetFiles()helpers to fetch many in-realm links in a single query per depth. - Adds a realm-server regression test intended to assert batched lookup behavior for a large
searchCards(..., { loadLinks: true })case.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
packages/runtime-common/realm-index-query-engine.ts |
Reworks link hydration to BFS/batched traversal and updates callers to pass root resource arrays. |
packages/runtime-common/index-query-engine.ts |
Introduces batched instance/file lookup helpers and shared row-mapping helpers. |
packages/realm-server/tests/load-links-batching-test.ts |
Adds a regression test that counts DB query shapes during loadLinks resolution. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Three fixes from Copilot review on PR #4668: * `getInstances` and `getFiles` now chunk URL lookups (450 sqlite / 2500 pg) so a search with many unique links cannot blow past the sqlite parameter limit. Each chunk emits 2*N+1 placeholders (url IN list + file_alias IN list + i.type), matching the existing index-writer.ts chunking discipline. * The regression test's SQL matcher was filtering on the literal string "'instance'" appearing in the rendered SQL, but `param('instance')` is parameterized as `$N`. Switched to inspecting `opts.bind` for the `'instance'` value — the matcher now correctly attributes link-resolution queries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Use a local `dbExecute` alias instead of the multi-line IIFE-style cast; prettier wanted the closing paren on the same line as the assignment. - Split compound `assert.ok(A && B, ...)` into separate asserts; qunit/no-assert-logical-expression flagged the && operator inside assertion arguments. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
qunit/no-assert-logical-expression also rejects `??` inside an assertion argument, not just `&&`. Compute the count first. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Host Test Results 1 files ± 0 1 suites ±0 2h 1m 45s ⏱️ + 1h 8m 35s Results for commit faba169. ± Comparison against earlier commit 914a5f0. Realm Server Test Results 1 files ± 0 1 suites ±0 16m 4s ⏱️ + 8m 38s Results for commit faba169. ± Comparison against earlier commit 914a5f0. |
The original recursive `loadLinks` cloned a child resource only AFTER
the recursive call had mutated its `relationship.data` fields — the
caller's `for (let includedResource of subInc) { ... cloneDeep ... }`
loop ran post-recursion. The BFS rewrite was cloning at discovery
time (in the parent's layer), so the snapshot in `included[]` saw
the pre-rewrite pristine_doc data. For a `Hassan ↔ Mango` cycle the
included Mango ended up with `friend.data` pointing back to whatever
the index had stored, not the rewritten `{ type, id }` from this
loadLinks invocation.
Defer the clone+absolutize+push to a Step 5 that runs at the END of
each layer's processing — once Step 4 has finished mutating
`item.resource.relationships.<rel>.data` for every entry in this
layer. The clone now captures those mutations.
Also: when a sibling references an already-visited target, the
original code's `included.find(target)` succeeded because the first
recursion had already added it; with deferred clones that's no
longer mid-layer-true. Mark `foundLinks = true` whenever
`linkResource` is `alreadyVisited` (it's either a root in `omit` or
queued for inclusion at its own layer's end — either way it's part
of our doc).
Caught by the `Integration | realm indexing: can index a field with
a cycle in the linksTo[Many] field` host tests on PR #4668.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Non-root layer items always originate from the batched index lookup or a cross-realm fetch, both of which produce CardResource<Saved> shapes — but the LayerItem.resource type is the wider LooseCardResource union (to also accept root resources). Cast at the push site to satisfy the included[] type signature. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Heads-up from CI investigation on the stack (#4668 → #4670): the loadLinks BFS rewrite in #4668 regresses two host integration tests that pass on main: Test File Notes Confirmed pre-existing: same exact tests fail on cs-11038's 6ddece7 (Host Tests 16/20 + 20/20). They pass on main (47e6884). #4670 inherits the failure since it's stacked on #4668; my change only touches the cross-realm branch and can't reach these in-realm cycle paths. This blocks #4668 → main, which in turn blocks #4670. The Codex maxLinkDepth boundary fix (e218d21) didn't catch this case. Likely candidates for the BFS bug: visited is added at root-resource enqueue and again at layer expansion; the else if (omitSet.has(linkResource.id) || included.find(...)) branch only sets foundLinks — it doesn't push to included[] when the cycle-target is the root (which is in omitSet, not included). The recursive original may have produced a different rewrite for that case. @habdelra Confirmed scope-isolation for #4667: Realm.search plumbing only added an opts.include arm; the existing loadLinks: true call path is byte-identical when include is absent. |
|
Thanks for the clean diagnosis — both candidate causes turned out to be real: Candidate #2 (clone happens before nested rewrites) — fixed in 427892c. The BFS now defers the Candidate #1 (cycle-target-is-root in the Lint type-check fix piggybacked in 914a5f0 (the LayerItem.resource union covers both root LooseCardResource and non-root CardResource; cast at the Will let CI cycle and check the cycle tests + the playground deepEqual on the new SHA. If anything subtler is still off, the per-SHA monitor will re-fire and I'll iterate. |
PR #4668 has been seeing intermittent host-test failures with generic "A network error occurred" or "Failed to fetch dynamically imported module" messages — both are runner-level / teardown-time flake signatures, and the same tests pass on PR #4670 (which includes all the BFS code from #4668), so they look like infrastructure flakes rather than code regressions. Adding diagnostics anyway so the next CI run captures concrete data: - An invocation id at loadLinks entry, repeated on every layer log line — lets us correlate logs from one loadLinks call across layers when investigating CI failures. - Layer summaries (size, entry count, in-realm/cross-realm breakdown). - Batch index lookup result counts + ms timing. - try/catch around `Promise.all(populateQueryFields)` and the batched index lookup that logs the failing context before re-throwing. - try/catch around the cross-realm fetch logging the URL and HTTP status (or thrown error message) before re-throwing. - Completion log with layers visited / included size. All under `this.#log.debug` (or `.warn` on the catch arms), so it follows the existing logger semantics and stays out of production noise unless the channel is enabled. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Closes CS-11038.
RealmIndexQueryEngine.loadLinkswas the dominant cost behind_federated-search90s timeouts on staging — it resolved every linked card with its owngetInstance(linkURL)round-trip. For an N-result search with M average links per card, that produced N×M sequentialSELECT * FROM boxel_index WHERE i.url = $1round-trips, all serialized on Node's event loop and contending for the shared PG pool with every other concurrent request.This PR replaces the recursive per-link traversal with a level-order BFS: each recursion depth issues at most one batched
WHERE i.url IN (...)query (plus one for file-meta) regardless of how many siblings reference links at that depth.Live data anchor (staging, 2026-04-28 → 2026-05-05)
_federated-search90s timeouts in 7d traced to N×M sequential link fan-out inloadLinks_federated-searchrequests piled up in one second; per-realm cards-grid latency varied by ~300× under the same load — the variance correlates with result-set size, consistent with N×M sequential awaits dominating response time_federated-search90s timeouts: ~2.75 worker-hours / 7dRelationship to PR #4667 (CS-11037)
Orthogonal — both ship independently:
loadLinksentirely (include: []). It's the cards-grid fast path: when the consumer doesn't traverse relationships, skip link loading altogether.No file overlap (#4667 doesn't touch
realm-index-query-engine.ts). Branched offmain.Implementation
Level-order BFS in
loadLinksThe original recursive
loadLinks(resource)was called once per root resource, and each invocation awaitedgetInstance(linkURL)per relationship. Restructured the function so each layer batches across siblings:omit/visitedURLs.await Promise.all([getInstances(cardURLs), getFiles(fileURLs)]). One round-trip per kind, regardless of how many siblings asked.linkResourcefrom the resulting Maps for in-realm; cross-realm still goes through the existing per-link fetch path (out of scope to parallelize per the ticket).linkResource; repeat.searchCardsnow passes the entiredoc.dataarray as the root layer (previously it looped per-resource), which is what enables the first layer to batch across all roots' relationships in one query.Preserved semantics
visited: Set<string>shared across all layers; eachlinkResourceis added tovisitedon first encounter; later encounters are short-circuited.descendStack.length <= maxLinkDepthmatches the originalstack.length <= maxLinkDepthcheck; nothing is added toincluded[]or scheduled for descent past the limit.linkFieldsfilter — applied only at the root layer (matching the original "linkFields only constrains the immediate relationships of the root card"). Carried via a per-layer-itemapplyLinkFieldsbool.fieldExpectsFileMetais still called when the relationship'sdata.typeis missing or stale, so file-meta links incorrectly typed as'card'still resolve correctly.relationship.datarewrite — same three-way condition (just-found / in-omit / in-included → rewrite with resolved type; otherwise fallback type whenlinkResourceis unavailable).cloneDeep+visitInstanceURLs+visitModuleDepsapplied to each newly-pushedincluded[]entry, identical to the original.New batch primitives in
IndexQueryEnginegetInstances(urls, opts)— returnsMap<lookupURL, InstanceOrError>. Handles bothi.url IN (...)andi.file_alias IN (...)matches in a single query so a row found via either column is keyed by whichever lookup URL the caller passed.getFiles(urls, opts)— analogous batch for file-meta lookups.getInstance/getFilewere refactored to share extracted row-mapping helpers (#rowToInstanceOrError,#rowToIndexedFile) with their batched siblings.Acceptance criteria
loadLinksfor a cards-grid-shape query issues 1 batched DB query per recursion depth, not N×M separate onesOut of scope
Promise.allon outbound HTTP fetches)Realm.search's defaultloadLinks: truebehavior (covered by Realm-server: make _federated-search JSON:API-compliant via include #4667)populateQueryFieldsfan-out (separate searchCards queries per query-field) — out of scope hereTest plan
realm-server/tests/load-links-batching-test.ts— wrapsdbAdapter.executeto count instance lookups againstboxel_index, runs a 50-source × 5-targetsearchCardswithloadLinks: true, asserts:i.url = \$Nper-link lookups: 0 (the old N×M path is dead)i.url IN (…)batched lookups: ≥ 1, ≤ 2 (one per recursion depth)result.data.length === 50andresult.included.length === 5loadLinks: truecoverage inindexing-test.ts(lines 2829, 2973) still passes under semantic equivalenceCode pointers
packages/runtime-common/realm-index-query-engine.ts—loadLinksrewritten as a BFS driver;searchCards/cardDocument/loadLinksForResourceupdated to passrootResources: [...]packages/runtime-common/index-query-engine.ts— newgetInstancesandgetFilesplus extracted#rowToInstanceOrError/#rowToIndexedFilerow-mapping helperspackages/realm-server/tests/load-links-batching-test.ts— regression test🤖 Generated with Claude Code