Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@
<!-- lore:019c8f4f-67c8-7cf4-b93b-c5ec46ed94b6 -->
* **Lore DB uses incremental auto\_vacuum to prevent free-page bloat**: Lore's SQLite DB uses incremental auto\_vacuum (schema version 3 migration) to prevent free-page bloat from deletions. The migration sets PRAGMA auto\_vacuum = INCREMENTAL then VACUUM outside a transaction. temporal\_messages is the primary storage consumer (~51MB); knowledge table is tiny.

<!-- lore:019d15de-e2d6-7ff2-ab86-b78ca39688a7 -->
* **Lore search pipeline: FTS5 with AND-then-OR fallback and RRF fusion**: Lore's search overhaul (planned/in-progress) replaces three independent search systems with a unified pipeline in \`src/search.ts\`. Key design: \`ftsQuery()\` builds AND queries (primary), \`ftsQueryOr()\` builds OR queries (fallback only when AND returns zero results). Blanket OR was rejected empirically — it adds noise even with stopword filtering. Conservative stopword list excludes domain terms like 'handle', 'state', 'type'. FTS5 rank is negative (more negative = better); \`ORDER BY rank\` sorts best first. \`bm25()\` with column weights (title=6, content=2, category=3) verified working in Bun's SQLite. Recall tool uses Reciprocal Rank Fusion (k=60) across knowledge, temporal, and distillation sources. \`forSession()\` scoring uses OR (not AND-then-OR) because it's ranking all candidates, not searching for exact matches — BM25 naturally weights multi-term matches higher.

<!-- lore:019c8f8c-47c3-71a2-b5fd-248a2cfeba78 -->
* **Lore temporal pruning runs after distillation and curation on session.idle**: In src/index.ts, session.idle awaits backgroundDistill and backgroundCurate sequentially before running temporal.prune(). Ordering is critical: pruning must not delete unprocessed messages. Pruning defaults: 120-day retention, 1GB max storage (in .lore.json under pruning.retention and pruning.maxStorage). These generous defaults were chosen because the system was new — earlier proposals of 7d/200MB were based on insufficient data.

Expand All @@ -23,6 +26,9 @@
<!-- lore:019c904b-7924-7187-8471-8ad2423b8946 -->
* **Curator prompt scoped to code-relevant knowledge only**: CURATOR\_SYSTEM in src/prompt.ts now explicitly excludes: general ecosystem knowledge available online, business strategy and marketing positioning, product pricing models, third-party tool details not needed for development, and personal contact information. This was added after the curator extracted entries about OpenWork integration strategy (including an email address), Lore Cloud pricing tiers, and AGENTS.md ecosystem facts — none of which help an agent write code. The curatorUser() function also appends guidance to prefer updating existing entries over creating new ones for the same concept, reducing duplicate creation.

<!-- lore:019d15de-e2e4-777f-8e00-fe21198117ad -->
* **Lore plugin cannot use native Node addons — pure bun:sqlite only**: Lore is a Bun plugin library (\`main: 'src/index.ts'\`, Plugin type) running inside OpenCode's compiled Bun binary. It has no build step and cannot use native Node addons (no better-sqlite3, no node-llama-cpp, no sqlite-vec). Dependencies must be pure JS/TS or Bun built-ins. This rules out QMD as a library dependency (requires better-sqlite3 + node-llama-cpp + sqlite-vec). QMD's search patterns (BM25 + vector + RRF + reranking) are adapted for pure FTS5 instead. Vector/embedding search would need to use OpenCode's existing chat providers rather than local GGUF models.

### Gotcha

<!-- lore:019c91d6-04af-7334-8374-e8bbf14cb43d -->
Expand All @@ -31,6 +37,9 @@
<!-- lore:019cb615-0b10-7bbc-a7db-50111118c200 -->
* **Lore auto-recovery can infinite-loop without re-entrancy guard**: Three v0.5.2 bugs causing excessive background LLM requests: (1) Auto-recovery loop — session.error handler injected recovery prompt → could overflow again → loop. Fix: recoveringSessions Set as re-entrancy guard. (2) Curator ran every idle — \`onIdle || afterTurns\` short-circuited (onIdle=true). Fix: \`||\` → \`&&\`. Lesson: boolean flag gating numeric threshold needs AND not OR. (3) shouldSkip() fell back to session.list() on unknown sessions. Fix: remove list fallback, cache in activeSessions.

<!-- lore:019d15de-e2e1-7ea0-a0bb-ab59227422e8 -->
* **Lore knowledge FTS search was sorted by updated\_at, not BM25 relevance**: In \`ltm.search()\`, knowledge FTS results were ordered by \`k.updated\_at DESC\` instead of FTS5 BM25 rank — most recently edited won over most relevant. Fix: replace the \`WHERE k.rowid IN (SELECT rowid FROM knowledge\_fts ...)\` subquery pattern with a JOIN that exposes \`rank\`, then \`ORDER BY bm25(knowledge\_fts, 6.0, 2.0, 3.0)\`. Also: distillations had no FTS table at all (LIKE-only search), fixed by adding \`distillation\_fts\` in schema migration v7 with backfill and sync triggers.

<!-- lore:019c8f4f-67ca-7212-a8c4-8a75b230ceea -->
* **Test DB isolation via LORE\_DB\_PATH and Bun test preload**: Lore test suite uses isolated temp DB via test/setup.ts preload (bunfig.toml). Preload sets LORE\_DB\_PATH to mkdtempSync path before any imports of src/db.ts; afterAll cleans up. src/db.ts checks LORE\_DB\_PATH first. agents-file.test.ts needs beforeEach cleanup for intra-file isolation and TEST\_UUIDS cleanup in afterAll (shared with ltm.test.ts). Individual test files don't need close() calls — preload handles DB lifecycle.

Expand Down
31 changes: 30 additions & 1 deletion src/db.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ import { Database } from "bun:sqlite";
import { join, dirname } from "path";
import { mkdirSync } from "fs";

const SCHEMA_VERSION = 6;
const SCHEMA_VERSION = 7;

const MIGRATIONS: string[] = [
`
Expand Down Expand Up @@ -179,6 +179,35 @@ const MIGRATIONS: string[] = [
DROP INDEX IF EXISTS idx_temporal_distilled;
DROP INDEX IF EXISTS idx_distillation_project;
`,
`
-- Version 7: FTS5 for distillations — enables ranked search instead of LIKE.
CREATE VIRTUAL TABLE IF NOT EXISTS distillation_fts USING fts5(
observations,
content=distillations,
content_rowid=rowid,
tokenize='porter unicode61'
);

-- Backfill existing data (skip empty observations from schema v1→v2 migration)
INSERT INTO distillation_fts(rowid, observations)
SELECT rowid, observations FROM distillations WHERE observations != '';

-- Sync triggers
CREATE TRIGGER IF NOT EXISTS distillation_fts_insert AFTER INSERT ON distillations BEGIN
INSERT INTO distillation_fts(rowid, observations) VALUES (new.rowid, new.observations);
END;

CREATE TRIGGER IF NOT EXISTS distillation_fts_delete AFTER DELETE ON distillations BEGIN
INSERT INTO distillation_fts(distillation_fts, rowid, observations)
VALUES('delete', old.rowid, old.observations);
END;

CREATE TRIGGER IF NOT EXISTS distillation_fts_update AFTER UPDATE ON distillations BEGIN
INSERT INTO distillation_fts(distillation_fts, rowid, observations)
VALUES('delete', old.rowid, old.observations);
INSERT INTO distillation_fts(rowid, observations) VALUES (new.rowid, new.observations);
END;
`,
];

function dataDir() {
Expand Down
72 changes: 42 additions & 30 deletions src/ltm.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import { uuidv7 } from "uuidv7";
import { db, ensureProject } from "./db";
import { ftsQuery } from "./temporal";
import { ftsQuery, ftsQueryOr, EMPTY_QUERY } from "./search";

// ~3 chars per token — validated as best heuristic against real API data.
function estimateTokens(text: string): number {
Expand Down Expand Up @@ -364,44 +364,56 @@ function searchLike(input: {
.all(...likeParams, input.limit) as KnowledgeEntry[];
}

/** BM25 column weights for knowledge_fts: title, content, category. */
const FTS_WEIGHTS = { title: 6.0, content: 2.0, category: 3.0 };

export function search(input: {
query: string;
projectPath?: string;
limit?: number;
}): KnowledgeEntry[] {
const limit = input.limit ?? 20;
const q = ftsQuery(input.query);
if (input.projectPath) {
const pid = ensureProject(input.projectPath);
try {
return db()
.query(
`SELECT k.* FROM knowledge k
WHERE k.rowid IN (SELECT rowid FROM knowledge_fts WHERE knowledge_fts MATCH ?)
AND (k.project_id = ? OR k.project_id IS NULL OR k.cross_project = 1)
AND k.confidence > 0.2
ORDER BY k.updated_at DESC LIMIT ?`,
)
.all(q, pid, limit) as KnowledgeEntry[];
} catch {
return searchLike({
query: input.query,
projectPath: input.projectPath,
limit,
});
}
}
if (q === EMPTY_QUERY) return [];

const pid = input.projectPath ? ensureProject(input.projectPath) : null;

const ftsSQL = pid
? `SELECT k.* FROM knowledge k
JOIN knowledge_fts f ON k.rowid = f.rowid
WHERE knowledge_fts MATCH ?
AND (k.project_id = ? OR k.project_id IS NULL OR k.cross_project = 1)
AND k.confidence > 0.2
ORDER BY bm25(knowledge_fts, ?, ?, ?) LIMIT ?`
: `SELECT k.* FROM knowledge k
JOIN knowledge_fts f ON k.rowid = f.rowid
WHERE knowledge_fts MATCH ?
AND k.confidence > 0.2
ORDER BY bm25(knowledge_fts, ?, ?, ?) LIMIT ?`;

const { title, content, category } = FTS_WEIGHTS;
const ftsParams = pid
? [q, pid, title, content, category, limit]
: [q, title, content, category, limit];

try {
return db()
.query(
`SELECT k.* FROM knowledge k
WHERE k.rowid IN (SELECT rowid FROM knowledge_fts WHERE knowledge_fts MATCH ?)
AND k.confidence > 0.2
ORDER BY k.updated_at DESC LIMIT ?`,
)
.all(q, limit) as KnowledgeEntry[];
const results = db().query(ftsSQL).all(...ftsParams) as KnowledgeEntry[];
if (results.length) return results;

// AND returned nothing — try OR fallback for broader recall
const qOr = ftsQueryOr(input.query);
if (qOr === EMPTY_QUERY) return [];

const ftsParamsOr = pid
? [qOr, pid, title, content, category, limit]
: [qOr, title, content, category, limit];
return db().query(ftsSQL).all(...ftsParamsOr) as KnowledgeEntry[];
} catch {
return searchLike({ query: input.query, limit });
return searchLike({
query: input.query,
projectPath: input.projectPath,
limit,
});
}
}

Expand Down
84 changes: 66 additions & 18 deletions src/reflect.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ import * as temporal from "./temporal";
import * as ltm from "./ltm";
import * as log from "./log";
import { db, ensureProject } from "./db";
import { ftsQuery, ftsQueryOr, EMPTY_QUERY } from "./search";
import { serialize, inline, h, p, ul, lip, liph, t, root } from "./markdown";

type Distillation = {
Expand All @@ -13,41 +14,83 @@ type Distillation = {
session_id: string;
};

function searchDistillations(input: {
projectPath: string;
// LIKE-based fallback for when FTS5 fails unexpectedly on distillations.
function searchDistillationsLike(input: {
pid: string;
query: string;
sessionID?: string;
limit?: number;
limit: number;
}): Distillation[] {
const pid = ensureProject(input.projectPath);
const limit = input.limit ?? 10;
// Search distillation narratives and facts with LIKE since we don't have FTS on them
const terms = input.query
.toLowerCase()
.split(/\s+/)
.filter((t) => t.length > 2);
.filter((t) => t.length > 1);
if (!terms.length) return [];

const conditions = terms
.map(() => "LOWER(observations) LIKE ?")
.join(" AND ");
const params: string[] = [];
for (const term of terms) {
params.push(`%${term}%`);
}

const query = input.sessionID
const likeParams = terms.map((t) => `%${t}%`);
const sql = input.sessionID
? `SELECT id, observations, generation, created_at, session_id FROM distillations WHERE project_id = ? AND session_id = ? AND ${conditions} ORDER BY created_at DESC LIMIT ?`
: `SELECT id, observations, generation, created_at, session_id FROM distillations WHERE project_id = ? AND ${conditions} ORDER BY created_at DESC LIMIT ?`;
const allParams = input.sessionID
? [pid, input.sessionID, ...params, limit]
: [pid, ...params, limit];

? [input.pid, input.sessionID, ...likeParams, input.limit]
: [input.pid, ...likeParams, input.limit];
return db()
.query(query)
.query(sql)
.all(...allParams) as Distillation[];
}

function searchDistillations(input: {
projectPath: string;
query: string;
sessionID?: string;
limit?: number;
}): Distillation[] {
const pid = ensureProject(input.projectPath);
const limit = input.limit ?? 10;
const q = ftsQuery(input.query);
if (q === EMPTY_QUERY) return [];

const ftsSQL = input.sessionID
? `SELECT d.id, d.observations, d.generation, d.created_at, d.session_id
FROM distillations d
JOIN distillation_fts f ON d.rowid = f.rowid
WHERE distillation_fts MATCH ?
AND d.project_id = ? AND d.session_id = ?
ORDER BY rank LIMIT ?`
: `SELECT d.id, d.observations, d.generation, d.created_at, d.session_id
FROM distillations d
JOIN distillation_fts f ON d.rowid = f.rowid
WHERE distillation_fts MATCH ?
AND d.project_id = ?
ORDER BY rank LIMIT ?`;
const params = input.sessionID
? [q, pid, input.sessionID, limit]
: [q, pid, limit];

try {
const results = db().query(ftsSQL).all(...params) as Distillation[];
if (results.length) return results;

// AND returned nothing — try OR fallback
const qOr = ftsQueryOr(input.query);
if (qOr === EMPTY_QUERY) return [];
const paramsOr = input.sessionID
? [qOr, pid, input.sessionID, limit]
: [qOr, pid, limit];
return db().query(ftsSQL).all(...paramsOr) as Distillation[];
} catch {
// FTS5 failed — fall back to LIKE search
return searchDistillationsLike({
pid,
query: input.query,
sessionID: input.sessionID,
limit,
});
}
}

function formatResults(input: {
temporalResults: temporal.TemporalMessage[];
distillationResults: Distillation[];
Expand Down Expand Up @@ -115,6 +158,11 @@ export function createRecallTool(projectPath: string, knowledgeEnabled = true):
const scope = args.scope ?? "all";
const sid = context.sessionID;

// If the query is all stopwords / single chars, short-circuit with guidance
if (ftsQuery(args.query) === EMPTY_QUERY) {
return "Query too vague — try using specific keywords, file names, or technical terms.";
}

let temporalResults: temporal.TemporalMessage[] = [];
if (scope !== "knowledge") {
try {
Expand Down
Loading
Loading