Skip to content

feat: add RRF score fusion and rewrite recall tool#47

Merged
BYK merged 1 commit intomainfrom
feat/rrf-fusion
Mar 22, 2026
Merged

feat: add RRF score fusion and rewrite recall tool#47
BYK merged 1 commit intomainfrom
feat/rrf-fusion

Conversation

@BYK
Copy link
Owner

@BYK BYK commented Mar 22, 2026

Phase 2 of search improvements (depends on #46)

Adds cross-source score fusion using Reciprocal Rank Fusion and rewrites the recall tool to produce a single ranked result list.

Changes

New in src/search.ts

  • reciprocalRankFusion<T>() — merges multiple ranked lists using RRF (k=60, Cormack et al. 2009). Rank-based, not score-based, so magnitude differences across FTS tables don't matter.
  • normalizeRank() — min-max normalization of FTS5 BM25 ranks to 0–1 (for display only)

New scored search variants

  • ltm.searchScored() — returns KnowledgeEntry & { rank } with BM25 scores via bm25(knowledge_fts, 6, 2, 3)
  • temporal.searchScored() — returns TemporalMessage & { rank }
  • searchDistillationsScored() — returns Distillation & { rank }

All scored variants include AND→OR fallback (same as Phase 1 search functions).

Rewritten recall tool

  • Runs all 3 scored searches, tags results with source type
  • Fuses via RRF into a single ranked list
  • Output format: source-annotated list ([knowledge/category], [distilled], [temporal/role])
  • Most relevant results appear first regardless of which source they came from

Test coverage

  • 11 new tests for normalizeRank() and reciprocalRankFusion()
  • Tests cover: multi-list merge, dedup, empty lists, single list, custom k, score correctness

@BYK BYK enabled auto-merge (squash) March 22, 2026 14:29
@BYK BYK disabled auto-merge March 22, 2026 21:46
- Add reciprocalRankFusion() to search.ts — merges ranked lists using
  RRF (k=60) for cross-source relevance fusion
- Add normalizeRank() for min-max score normalization (display only)
- Add searchScored() to ltm.ts — returns BM25 rank with results
- Add searchScored() to temporal.ts — returns FTS5 rank with results
- Add searchDistillationsScored() to reflect.ts — scored distillation search
- Rewrite recall tool execute() to use scored search + RRF fusion
  - All 3 sources searched with scored variants
  - Results tagged with source type, fused via RRF
  - Single ranked list with source annotations: [knowledge/category],
    [distilled], [temporal/role]
- Add 11 new tests for normalizeRank and reciprocalRankFusion
@BYK BYK force-pushed the feat/rrf-fusion branch from 07f4eea to 9ff8bfa Compare March 22, 2026 21:46
@BYK BYK merged commit 332f7b2 into main Mar 22, 2026
1 check passed
@BYK BYK deleted the feat/rrf-fusion branch March 22, 2026 21:48
BYK added a commit that referenced this pull request Mar 22, 2026
## Phase 3 of search improvements (depends on #47)

Replaces the coarse bag-of-words term-overlap scoring in `forSession()`
with FTS5 BM25-based scoring.

### Problem

`forSession()` used manual term-overlap counting: extract top 30 words
>3 chars, count how many appear in each entry via `string.includes()`.
This ignored:
- Porter stemming ("configure" wouldn't match "configuration")  
- TF-IDF weighting (all matching terms counted equally)
- Stopwords (common words inflated match counts)

### Solution

**New `scoreEntriesFTS()`** in ltm.ts:
- Runs session context terms against `knowledge_fts` using BM25
- Uses **OR** semantics (not AND-then-OR) because we're scoring all
candidates for ranking, not searching for exact matches — an entry
matching 1 of 40 terms should get a low score, not be excluded
- BM25 naturally weights entries matching more terms higher
- Scores normalized to 0–1 and multiplied by entry confidence

**Improved `extractTopTerms()`** moved to `search.ts`:
- Now uses same STOPWORDS set from Phase 1
- Drops single chars only (not >3 char threshold) — preserves "DB",
"CI", "IO"
- Increased limit from 30 to 40 terms

### Safety net preserved
Top 5 project entries by confidence are always included regardless of
FTS match, preventing the scoring change from accidentally excluding
critical project knowledge.

### Test coverage
- 8 new tests for `extractTopTerms()` (stopwords, 2-char tokens, limits,
punctuation)
- All 12 existing `forSession()` tests continue to pass
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant