feat: add Voyage AI embedding search by BYK · Pull Request #50 · BYK/opencode-lore

BYK · 2026-03-22T21:14:10Z

Phase 5: Vector embedding search (depends on #49)

Adds semantic vector search using Voyage AI's voyage-code-3 model, layered on top of the existing BM25 + RRF fusion pipeline.

How it works

On knowledge create/update: fire-and-forget embedding via Voyage AI API → stored as Float32Array BLOB in SQLite
On recall search: embed the query → brute-force cosine similarity over all knowledge BLOBs → feed vector results into existing RRF fusion as another ranked list

Why Voyage AI?

voyage-code-3 is code-optimized — best-in-class for technical/code text retrieval
200M free tokens — effectively free forever for <100 knowledge entries
Single API key (VOYAGE_API_KEY), OpenAI-compatible response format
Supports `input_type: "document"" vs "query"" for retrieval-optimized embeddings

Why pure-JS cosine, not libSQL/sqlite-vec?

Tested both. bun:sqlite is standard SQLite — no vector_distance_cos() or DiskANN. The @libsql/client-wasm WASM package also lacks vector functions. Native libsql works but adds a ~15MB native dependency — overkill for <100 entries where brute-force cosine takes microseconds.

Graceful degradation

Gated behind search.embeddings.enabled (default: false) + VOYAGE_API_KEY env var
If API key missing or API fails: FTS-only search continues working normally
embedKnowledgeEntry() is fire-and-forget — embedding failures are logged, never thrown
Entries without embeddings are simply excluded from vector search

RRF integration

Vector results use the same k: key prefix as BM25 knowledge results → RRF merges rather than duplicates. An entry found by both BM25 and vector search gets a higher combined RRF score.

Config

{
  "search": {
    "embeddings": {
      "enabled": true,
      "model": "voyage-code-3",
      "dimensions": 1024
    }
  }
}

Test coverage

15 new tests: cosine similarity (identical/opposite/orthogonal/zero vectors), BLOB round-trip (single/empty/1024-dim), vectorSearch (sorting/limit/null embeddings/confidence filter), isAvailable, config schema
281 total tests, 0 failures

- New src/embedding.ts: Voyage AI client, cosine similarity, vectorSearch - embed() calls POST https://api.voyageai.com/v1/embeddings - cosineSimilarity() pure-JS dot product / magnitude - vectorSearch() brute-force over knowledge BLOBs (<100 entries) - embedKnowledgeEntry() fire-and-forget (errors logged, never thrown) - backfillEmbeddings() batch-embeds entries missing embeddings - checkConfigChange() detects model/dimension changes and clears stale embeddings for re-embedding on next backfill - Schema migration v8: - ADD COLUMN embedding BLOB to knowledge table - CREATE TABLE kv_meta (key-value store for plugin state) - Config: search.embeddings section (enabled, model, dimensions) - Default: disabled, voyage-code-3, 1024 dims - Requires VOYAGE_API_KEY env var - Hook embedding into ltm.create() and ltm.update() - Fire-and-forget after sync DB write - Re-embeds on content change - Add vector search as additional RRF list in recall tool - Same k: key prefix as BM25 knowledge — RRF merges, not duplicates - Entries found by both BM25 and vector get boosted score - Startup backfill when embeddings first enabled - Migration strategy: on startup, compare model+dimensions config fingerprint against stored value — if changed, clear all embeddings and re-embed in background - 18 new tests: cosine similarity, BLOB round-trip, vectorSearch, isAvailable, config schema, config change detection

BYK enabled auto-merge (squash) March 22, 2026 21:14

BYK force-pushed the feat/voyage-embeddings branch from ac06f35 to 291aecc Compare March 22, 2026 21:36

BYK disabled auto-merge March 22, 2026 21:46

BYK force-pushed the feat/voyage-embeddings branch from 291aecc to 84ed0a2 Compare March 22, 2026 21:47

BYK enabled auto-merge (squash) March 22, 2026 21:48

BYK force-pushed the feat/voyage-embeddings branch from 84ed0a2 to 8ca4351 Compare March 22, 2026 21:50

BYK merged commit b24141c into main Mar 22, 2026
1 check passed

BYK deleted the feat/voyage-embeddings branch March 22, 2026 21:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Voyage AI embedding search#50

feat: add Voyage AI embedding search#50
BYK merged 1 commit intomainfrom
feat/voyage-embeddings

BYK commented Mar 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BYK commented Mar 22, 2026

Phase 5: Vector embedding search (depends on #49)

How it works

Why Voyage AI?

Why pure-JS cosine, not libSQL/sqlite-vec?

Graceful degradation

RRF integration

Config

Test coverage

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant