feat: replace forSession() scoring with FTS5 BM25 by BYK · Pull Request #48 · BYK/opencode-lore

BYK · 2026-03-22T14:32:38Z

Phase 3 of search improvements (depends on #47)

Replaces the coarse bag-of-words term-overlap scoring in forSession() with FTS5 BM25-based scoring.

Problem

forSession() used manual term-overlap counting: extract top 30 words >3 chars, count how many appear in each entry via string.includes(). This ignored:

Porter stemming ("configure" wouldn't match "configuration")
TF-IDF weighting (all matching terms counted equally)
Stopwords (common words inflated match counts)

Solution

New scoreEntriesFTS() in ltm.ts:

Runs session context terms against knowledge_fts using BM25
Uses OR semantics (not AND-then-OR) because we're scoring all candidates for ranking, not searching for exact matches — an entry matching 1 of 40 terms should get a low score, not be excluded
BM25 naturally weights entries matching more terms higher
Scores normalized to 0–1 and multiplied by entry confidence

Improved extractTopTerms() moved to search.ts:

Now uses same STOPWORDS set from Phase 1
Drops single chars only (not >3 char threshold) — preserves "DB", "CI", "IO"
Increased limit from 30 to 40 terms

Safety net preserved

Top 5 project entries by confidence are always included regardless of FTS match, preventing the scoring change from accidentally excluding critical project knowledge.

Test coverage

8 new tests for extractTopTerms() (stopwords, 2-char tokens, limits, punctuation)
All 12 existing forSession() tests continue to pass

- Move extractTopTerms() to search.ts (shared, uses STOPWORDS + single-char filter) - Replace scoreEntries() bag-of-words with scoreEntriesFTS(): - Uses FTS5 bm25() with column weights (title=6, content=2, category=3) - OR semantics: ranks all candidates (not AND-then-OR) since we're scoring for relevance, not searching for exact matches - BM25 naturally weights entries matching more terms higher - Normalized scores (0-1) multiplied by confidence for final ranking - Safety net preserved: top-5 project entries by confidence always included - Move FTS_WEIGHTS constant to single definition before forSession() - Add 8 new tests for extractTopTerms (stopwords, 2-char tokens, limits)

## Phase 4 of search improvements (depends on #48) Adds a configurable search section and optional LLM-based query expansion for the recall tool. ### New config: `search` section ```json { "search": { "ftsWeights": { "title": 6.0, "content": 2.0, "category": 3.0 }, "recallLimit": 10, "queryExpansion": false } } ``` - **`ftsWeights`** — BM25 column weights for knowledge FTS5 search. Tune how much title/content/category matches matter relative to each other. - **`recallLimit`** — Max results per source in the recall tool before RRF fusion (default 10, range 1-50). - **`queryExpansion`** — When enabled, the recall tool uses the configured LLM to generate 2-3 alternative query phrasings before search, improving recall for ambiguous queries. ### Query expansion (`search.queryExpansion: true`) When enabled: 1. The configured LLM generates 2-3 alternative phrasings of the user's query 2. FTS5 searches run for each variant (original + expansions) 3. All results are fused via RRF across all query variants 4. Original query naturally gets higher weight (appears first in fusion) Implementation: - Uses the same worker session pattern as distillation/curation - 3-second timeout — if the LLM is slow, falls back to original query only - Errors caught silently (logged) — never blocks search - Registered as `lore-query-expand` hidden agent ### Config wiring - `ftsWeights` flows through to all BM25 searches in ltm.ts via `ftsWeights()` function (reads from `config().search.ftsWeights`) - `recallLimit` used as the per-source limit in the recall tool - `client` + `searchConfig` passed through `createRecallTool()` ### Test coverage - 7 new config tests: defaults, partial overrides, range validation, optional section

BYK enabled auto-merge (squash) March 22, 2026 14:32

BYK mentioned this pull request Mar 22, 2026

feat: add search config surface and LLM query expansion #49

Merged

BYK disabled auto-merge March 22, 2026 21:46

BYK force-pushed the feat/fts-session-scoring branch from 3f8cdeb to 5c1792c Compare March 22, 2026 21:46

BYK enabled auto-merge (squash) March 22, 2026 21:48

BYK force-pushed the feat/fts-session-scoring branch from 5c1792c to 5c522fe Compare March 22, 2026 21:48

BYK merged commit 708f298 into main Mar 22, 2026
1 check passed

BYK deleted the feat/fts-session-scoring branch March 22, 2026 21:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: replace forSession() scoring with FTS5 BM25#48

feat: replace forSession() scoring with FTS5 BM25#48
BYK merged 1 commit intomainfrom
feat/fts-session-scoring

BYK commented Mar 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BYK commented Mar 22, 2026

Phase 3 of search improvements (depends on #47)

Problem

Solution

Safety net preserved

Test coverage

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant