fix(retriever): applyMMRDiversity O(n³)→O(n²) Map lookup optimization (#763) by jlin53882 · Pull Request #776 · CortexReach/memory-lancedb-pro

jlin53882 · 2026-05-09T15:03:12Z

Issue

Issue #763 P2 — applyMMRDiversity O(n³)→O(n²) optimization

Problem

applyMMRDiversity() was calling Array.from() inside the inner loop on every similarity comparison:

Outer loop: O(n) candidates
Inner loop: O(n) selected items × O(n) Array.from() twice per check = O(n²) per candidate
Total: O(n³)

Fix

Three optimizations applied in applyMMRDiversity():

Pre-convert vectors once at entry — eliminates O(n) Array.from() cost from the inner loop; now O(n²) total for all conversions
Duplicate ID upfront detection via Set — O(n) single-pass scan; detects duplicates and routes to fallback before any similarity computation
Map-based O(1) id lookup — pre-converted vectors stored in Map<id, number[]>; replaces O(n) selected.findIndex() with O(1) Map.get()

When duplicate IDs are detected, applyMMRDiversity_Fallback() (findIndex-based O(n²)) is called to preserve original semantics for the ambiguous duplicate-ID case.

Complexity: O(n³) → O(n²)

Changes

src/retriever.ts (+81/-10): applyMMRDiversity() optimized + new applyMMRDiversity_Fallback()
test/mmr-tiny.test.mjs (+286/-0): 17 tests covering: edge cases (empty/single/null/undefined vectors), diversity filtering (orthogonal, identical, similar, mixed with null), Fallback routing for duplicate IDs, threshold boundaries (0.85/0.9/1.0), cross-validation of optimized vs Fallback paths, and performance sanity (n=50, n=100)
scripts/ci-test-manifest.mjs (+4/-3): registers mmr-tiny.test.mjs in core-regression + deduplicate issue-690 entry from merge conflict
index.ts: merge conflict resolution (upstream Phase 1 code retained, isCliRegistrationMode(api) preserved)
package.json, package-lock.json: upstream version updates

Verification

applyMMRDiversity — O(1) Map.get() for vector lookup, pre-conversion loop O(n²) total
applyMMRDiversity_Fallback — called only when duplicate IDs detected; preserves original findIndex semantics
Items without vectors (null/undefined) are always selected (no similarity comparison possible)
Cross-validation: optimized path produces identical results to Fallback for diverse, similar, and duplicate-ID inputs
Threshold boundaries tested: 0.85, 0.9, 1.0
Performance tests: n=50 < 1s, n=100 < 2s

…issue CortexReach#692) Adds tree-sitter-based chunking to prevent splitting code mid-function. Changes: - detectCodeLanguage(): identify JS/TS/Python/Go/Rust from code content - astChunk(): split code at declaration boundaries (function/class/method) - smartChunk(): route through astChunk when file is detected as code - 20 unit tests covering all destructive split scenarios Test: node --test test/ast-code-chunking.test.mjs (20/20 pass) Config: astAwareCodeSplit defaults to true

…apture architecture, upgrade isCliMode→isCliRegistrationMode(api)

…CortexReach#692)

TurboTheTurtle · 2026-05-11T21:11:57Z

Review finding:

P2: the new MMR tests do not exercise the production implementation

test/mmr-tiny.test.mjs copies the old and new MMR algorithms into local helper functions (applyMMRDiversity_NEW / applyMMRDiversity_OLD) instead of importing or invoking the retriever method changed in src/retriever.ts. That means the test can pass even if the production applyMMRDiversity() implementation diverges, regresses, or is not called at all. The test file also is not added to scripts/ci-test-manifest.mjs, so the current PR checks will not run it. Please wire the test into CI and exercise the actual retriever behavior, or expose a small test seam around the production method rather than testing a duplicate copy.

…I manifest Resolves PR CortexReach#776 reviewer comment: - test/mmr-tiny.test.mjs previously imported local copies of the MMR algorithm instead of exercising the production method, allowing the test to pass even if the actual retriever implementation diverged. Fix: rewrite test to call retriever.applyMMRDiversity() directly via jiti import, covering optimized path, fallback path (duplicate IDs), null/undefined vectors, and threshold boundaries. Also registers test/mmr-tiny.test.mjs in scripts/ci-test-manifest.mjs under core-regression so it runs in CI checks.

…nflicts - index.ts: keep upstream Phase 1 code, use isCliRegistrationMode(api) for logReg - verify-ci-test-manifest.mjs: use CI_TEST_MANIFEST variable (upstream refactor)

…CortexReach#763) Optimizations applied: 1. Pre-convert all vectors once at entry (eliminates O(n) Array.from() from inside the inner loop — was causing O(n³) total cost) 2. id→index Map for O(1) lookup — replaces O(n) selected.findIndex inside selected.some() 3. Duplicate ID detection — upfront Set check routes to applyMMRDiversity_Fallback() which preserves original semantics for duplicate-ID edge case applyMMRDiversity: O(n²) — pre-convert vectors + Map-based O(1) id lookup applyMMRDiversity_Fallback: O(n²) — findIndex-based (safe for small dup sets)

Merge of upstream/master introduced duplicate entry for test/issue-690-cross-call-batch.test.mjs. Remove second occurrence.

…Fallback Strengthen existing tests: - orthogonal vectors: add assert.deepStrictEqual for exact id order - similar vectors deferred: add explicit r1/r2/r3 position assertions + verify deferred portion contains r2 - null/undefined vector: assert returned id matches expected New cross-validation tests (3): - optimized path matches Fallback for diverse inputs (unique IDs) - optimized path matches Fallback for similar inputs (unique IDs) - optimized path routes to Fallback for duplicate IDs These tests directly verify that the O(n²) optimized path produces identical results to the O(n²) Fallback path, ensuring correctness of the Map-based optimization.

… Issue CortexReach#692 content)

…each#763)

rwmjhb

PR #776 Review: fix(retriever): applyMMRDiversity O(n³)→O(n²) Map lookup optimization (#763)

Verdict: REQUEST-CHANGES | 6 rounds completed | Value: 42% | Size: XL | Author: jlin53882

Value Assessment

Problem: applyMMRDiversity in the retriever performs repeated vector conversion and selected-result lookup inside nested loops, making larger retrieval result sets slower than necessary. The PR claims to reduce the MMR diversity pass from O(n³) to O(n²) while preserving behavior for duplicate IDs.

Dimension	Assessment
Value Score	42%
Value Verdict	review
Issue Linked	true
Project Aligned	true
Duplicate	false
AI Slop Score	1/6
User Impact	medium
Urgency	medium

Scope Drift: 3 flag(s)

src/chunker.ts and test/ast-code-chunking.test.mjs implement Issue #692 AST-based chunking, which is not justified by the #763 MMR optimization problem statement
package.json/package-lock.json add tree-sitter native dependencies for AST chunking, not for applyMMRDiversity
index.ts CLI registration-mode changes are unrelated to the retriever MMR performance fix

AI Slop Signals:

Claim/diff mismatch: PR title and body center on #763 applyMMRDiversity optimization, but the diff includes src/chunker.ts AST chunking and test/ast-code-chunking.test.mjs for #692.

Open Questions:

Is Issue #763 maintainer-acknowledged? The provided issue context has no labels, assignment, or maintainer comments visible.
Should the Issue #692 AST chunking changes and tree-sitter dependencies be split into a separate PR before reviewing #763?
Is the P3 event-loop-blocking portion of Issue #763 intentionally out of scope for this PR?

Summary

applyMMRDiversity in the retriever performs repeated vector conversion and selected-result lookup inside nested loops, making larger retrieval result sets slower than necessary. The PR claims to reduce the MMR diversity pass from O(n³) to O(n²) while preserving behavior for duplicate IDs.

Evaluation Signals

Signal	Value
Blockers	0
Warnings	1
PR Size	XL
Verdict Floor	approve
Risk Level	high
Value Model	codex
Primary Model	codex
Adversarial Model	claude

Must Fix

F3: Oversized code declarations violate chunk size limits

Nice to Have

F1: New any types in AST chunker
F2: JS and TS imports are classified as Python
F4: AST loader uses require in an ESM module
MR2: CLI mode detection semantic change rides along with no test coverage and no rationale tied to #763
MR3: No benchmark validates the claimed O(n³)→O(n²) improvement
MR4: Dead subChunk helper inside chunker.ts
MR5: detectCodeLanguage docstring/code mismatch (200 vs 400 chars)

Recommended Action

Author should address must-fix findings before merge.

Reviewed at 2026-05-12T17:31:38Z | 6 rounds | Value: codex | Primary: codex | Adversarial: claude

…R\n\n- Delete test/ast-code-chunking.test.mjs (not in upstream)\n- Remove ast-code-chunking.test.mjs from ci-test-manifest\n- Retain Issue CortexReach#763 MMR tests

jlin53882 added 3 commits May 5, 2026 00:34

merge origin/master: resolve conflict — take HEAD recallHistory/autoC…

44445dd

…apture architecture, upgrade isCliMode→isCliRegistrationMode(api)

fix(ci): register ast-code-chunking.test.mjs in verify baseline (issue …

6dcc894

…CortexReach#692)

jlin53882 mentioned this pull request May 9, 2026

perf(retriever): applyMMRDiversity O(n^3) to O(n^2) Map lookup optimization (#763 P2) #769

Closed

jlin53882 force-pushed the fix/mmr-diversity-optimization branch from 9cdc7cf to c26117b Compare May 12, 2026 11:16

jlin53882 added 6 commits May 12, 2026 21:04

Merge upstream/master into fix/mmr-diversity-optimization: resolve co…

09adb46

…nflicts - index.ts: keep upstream Phase 1 code, use isCliRegistrationMode(api) for logReg - verify-ci-test-manifest.mjs: use CI_TEST_MANIFEST variable (upstream refactor)

fix(ci): deduplicate issue-690 entry in ci-test-manifest.mjs

876537a

Merge of upstream/master introduced duplicate entry for test/issue-690-cross-call-batch.test.mjs. Remove second occurrence.

remove: drop docs/issue-692-ast-chunking-design.md from PR (unrelated…

0c9dc4d

… Issue CortexReach#692 content)

fix(ci): register mmr-tiny.test.mjs in core-regression group (CortexR…

e9aad9d

…each#763)

rwmjhb requested changes May 12, 2026

View reviewed changes

PR776 scope cleanup: remove Issue CortexReach#692 AST chunking from P…

b041f9c

…R\n\n- Delete test/ast-code-chunking.test.mjs (not in upstream)\n- Remove ast-code-chunking.test.mjs from ci-test-manifest\n- Retain Issue CortexReach#763 MMR tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(retriever): applyMMRDiversity O(n³)→O(n²) Map lookup optimization (#763)#776

fix(retriever): applyMMRDiversity O(n³)→O(n²) Map lookup optimization (#763)#776
jlin53882 wants to merge 11 commits into
CortexReach:masterfrom
jlin53882:fix/mmr-diversity-optimization

jlin53882 commented May 9, 2026 •

edited

Loading

Uh oh!

TurboTheTurtle commented May 11, 2026

Uh oh!

rwmjhb left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jlin53882 commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Problem

Fix

Changes

Verification

Uh oh!

TurboTheTurtle commented May 11, 2026

Uh oh!

rwmjhb left a comment

Choose a reason for hiding this comment

PR #776 Review: fix(retriever): applyMMRDiversity O(n³)→O(n²) Map lookup optimization (#763)

Value Assessment

Summary

Evaluation Signals

Must Fix

Nice to Have

Recommended Action

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jlin53882 commented May 9, 2026 •

edited

Loading