feat: replace forSession() scoring with FTS5 BM25#48
Merged
Conversation
3f8cdeb to
5c1792c
Compare
- Move extractTopTerms() to search.ts (shared, uses STOPWORDS + single-char filter)
- Replace scoreEntries() bag-of-words with scoreEntriesFTS():
- Uses FTS5 bm25() with column weights (title=6, content=2, category=3)
- OR semantics: ranks all candidates (not AND-then-OR) since we're
scoring for relevance, not searching for exact matches
- BM25 naturally weights entries matching more terms higher
- Normalized scores (0-1) multiplied by confidence for final ranking
- Safety net preserved: top-5 project entries by confidence always included
- Move FTS_WEIGHTS constant to single definition before forSession()
- Add 8 new tests for extractTopTerms (stopwords, 2-char tokens, limits)
5c1792c to
5c522fe
Compare
BYK
added a commit
that referenced
this pull request
Mar 22, 2026
## Phase 4 of search improvements (depends on #48) Adds a configurable search section and optional LLM-based query expansion for the recall tool. ### New config: `search` section ```json { "search": { "ftsWeights": { "title": 6.0, "content": 2.0, "category": 3.0 }, "recallLimit": 10, "queryExpansion": false } } ``` - **`ftsWeights`** — BM25 column weights for knowledge FTS5 search. Tune how much title/content/category matches matter relative to each other. - **`recallLimit`** — Max results per source in the recall tool before RRF fusion (default 10, range 1-50). - **`queryExpansion`** — When enabled, the recall tool uses the configured LLM to generate 2-3 alternative query phrasings before search, improving recall for ambiguous queries. ### Query expansion (`search.queryExpansion: true`) When enabled: 1. The configured LLM generates 2-3 alternative phrasings of the user's query 2. FTS5 searches run for each variant (original + expansions) 3. All results are fused via RRF across all query variants 4. Original query naturally gets higher weight (appears first in fusion) Implementation: - Uses the same worker session pattern as distillation/curation - 3-second timeout — if the LLM is slow, falls back to original query only - Errors caught silently (logged) — never blocks search - Registered as `lore-query-expand` hidden agent ### Config wiring - `ftsWeights` flows through to all BM25 searches in ltm.ts via `ftsWeights()` function (reads from `config().search.ftsWeights`) - `recallLimit` used as the per-source limit in the recall tool - `client` + `searchConfig` passed through `createRecallTool()` ### Test coverage - 7 new config tests: defaults, partial overrides, range validation, optional section
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase 3 of search improvements (depends on #47)
Replaces the coarse bag-of-words term-overlap scoring in
forSession()with FTS5 BM25-based scoring.Problem
forSession()used manual term-overlap counting: extract top 30 words >3 chars, count how many appear in each entry viastring.includes(). This ignored:Solution
New
scoreEntriesFTS()in ltm.ts:knowledge_ftsusing BM25Improved
extractTopTerms()moved tosearch.ts:Safety net preserved
Top 5 project entries by confidence are always included regardless of FTS match, preventing the scoring change from accidentally excluding critical project knowledge.
Test coverage
extractTopTerms()(stopwords, 2-char tokens, limits, punctuation)forSession()tests continue to pass