Skip to content

feat: add Voyage AI embedding provider — closes #1#222

Open
TerminalGravity wants to merge 1 commit intomainfrom
feat/voyage-embeddings
Open

feat: add Voyage AI embedding provider — closes #1#222
TerminalGravity wants to merge 1 commit intomainfrom
feat/voyage-embeddings

Conversation

@TerminalGravity
Copy link
Collaborator

Adds Voyage AI as a third embedding provider option alongside local (Xenova) and OpenAI.

Changes

  • src/lib/embeddings.ts — new VoyageEmbeddingProvider class (voyage-3 model, 1024 dims, 128 texts/batch)
  • src/lib/config.ts — extended EmbeddingProvider type and config interface with voyage_api_key / voyage_model
  • src/lib/timeline-db.tsTimelineConfig and getEmbedder() updated to support voyage
  • src/tools/onboard-project.ts — accepts voyage_api_key and voyage_model params
  • tests/lib/embeddings.test.ts — 3 new tests for voyage provider creation
  • README.md — documented VOYAGE_API_KEY, VOYAGE_MODEL env vars and config.yml options

Usage

export EMBEDDING_PROVIDER=voyage
export VOYAGE_API_KEY=pa-...
# optional: VOYAGE_MODEL=voyage-3-lite

Or in .preflight/config.yml:

embeddings:
  provider: voyage
  voyage_api_key: pa-...
  voyage_model: voyage-3

Closes #1

Copy link
Collaborator Author

@TerminalGravity TerminalGravity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean implementation. The batch-by-128 approach matches Voyage's API limits, and sorting by index before collecting is the right call since the API doesn't guarantee ordering.

A few things:

  1. Dimensions are model-dependentvoyage-3 is 1024-dim, but voyage-code-3 is also 1024 while voyage-3-lite is 512. Hardcoding dimensions = 1024 means swapping models via config could silently produce wrong-sized vectors in LanceDB. Consider fetching dimensions from a model→dim map or making it configurable.

  2. input_type: "document" — this is correct for indexing, but queries should use input_type: "query". The embed() method (used for search queries) goes through embedBatch which always sends "document". Voyage's docs say this asymmetry matters for retrieval quality.

  3. Rate limiting — no retry/backoff on 429s. The OpenAI provider doesn't have this either so it's consistent, but worth noting for heavy onboarding runs.

None of these are blockers for a first pass, but the input_type one is worth fixing before merge — it'll measurably impact search quality.

- Add VoyageEmbeddingProvider class supporting voyage-3 (1024d) and voyage-3-lite (512d)
- Update EmbeddingConfig type to include 'voyage' provider option
- Update onboard-project, timeline-db, and CLI init to accept voyage
- Add 3 tests for voyage provider creation and dimension checks
- All 46 tests passing
@TerminalGravity TerminalGravity force-pushed the feat/voyage-embeddings branch from 1ef342b to 59ccda1 Compare March 20, 2026 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Voyage AI embedding provider

1 participant