docs(base): canonical SearchResult score contract; align faiss l2 by thorwhalen · Pull Request #21 · i2mint/vd

thorwhalen · 2026-05-27T12:17:12Z

Refs #9.

Context

Issue #9 flagged that ChromaCollection.search returned a metric-blind score (1/(1+distance) for every metric), so the same cosine query landed on different score scales across backends. The headline Chroma fix has already landed via vd.backends._helpers.score_from_distance(distance, metric) (chroma.py:166 routes through it). This PR closes the rest of #9 — the cross-backend contract documentation, the audit of adapters that didn't route through the helper, and a regression test pinning the contract on the reference (memory) backend.

The contract (now documented)

Every SearchResult["score"] is higher-is-better, per-metric canonical similarity:

metric	canonical score	range
`cosine`	`1 - cosine_distance`	`[-1, 1]`
`dot`	raw inner product	`(-inf, +inf)`
`l2`	`1 / (1 + euclidean_distance)`	`(0, 1]`

Adapters split cleanly into two camps:

Distance-returning (Chroma, DuckDB, LanceDB, Milvus L2, pgvector, Redis, sqlite-vec, Turbopuffer, Weaviate): already route through score_from_distance → canonical.
Native-similarity-returning (Elasticsearch kNN, MongoDB Atlas, Pinecone): backend hands back a higher-is-better number on its own per-metric scale; rescaling would change combined-ranking ties, so the adapter passes through and documents the deviation in its own docstring.

The two outliers — FAISS (used the inline formula 1/(1+v) for L2) and Qdrant (1/(1+score) for l2 is correct but undocumented) — are addressed below.

Changes

vd/base.py — new "Score semantics" section in the module preamble, with the contract table and rationale. Collection.search docstring's Yields block references it.
vd/backends/_helpers.py — score_from_distance docstring spells out the per-metric formula and which adapters route through it vs return native scores. New doctest for the dot branch (score_from_distance(-0.7, 'dot') == 0.7).
vd/backends/faiss.py — route the L2 path through score_from_distance("l2", v) (was an inline 1.0 / (1.0 + value)). Add an inline comment explaining that the IndexFlatIP cosine path is canonical-by-construction (vectors are L2-normalized at write, so inner product is cosine similarity in [-1, 1]) and that the dot path returns the raw inner product, which is exactly vd's canonical dot score.
vd/backends/qdrant.py — behaviour unchanged; comment clarifies that the existing 1/(1+score) for L2 already matches vd's canonical formula and notes that the branch must be revisited if a future Qdrant client version flips Euclid to higher-is-better.
tests/test_contract.py — new "Score semantics" section pins the contract on the in-memory reference adapter:
- Identical-vector query → metric-max score (cosine=1.0, dot=1.0, l2=1.0).
- Cosine-orthogonal pair → exactly 0.0 (regression against the previous chroma-style metric-blind formula, which would have produced 1/(1+1) = 0.5).
- The helper's documented formula table per metric.

Verification

pytest tests/test_contract.py tests/test_core.py → 139 passed (+5 new score tests), 140 skipped (unchanged — provider-gated).
pytest --doctest-modules vd/base.py vd/backends/_helpers.py → 4 passed (includes the new dot doctest).
Local end-to-end check on FAISS with faiss 1.13.2: identical-vector L2 score = 1.0; cosine identical = 1.0; cosine orthogonal = 0.0 — matches the memory reference adapter exactly.

Scope notes

Did not change Elasticsearch / MongoDB / Pinecone — by design, per the contract. Each already documents in its own adapter docstring that the score is the backend's native combined-ranking score on its own scale.
Did not change the cross-backend protocol shape (no new fields). The contract is documentary + a doctest + a regression test on the canonical implementation.

Draft until reviewed — main thing to confirm is the FAISS dot-metric note: vd's canonical dot score is the raw inner product, and IndexFlatIP for the dot metric returns the raw inner product directly, so no transform is correct. (For cosine the vectors are normalized at write so the same inner product is cosine similarity.)

Issue #9 noted that Chroma's score was metric-blind. The Chroma fix landed earlier via score_from_distance(); this change closes the loop: - vd/base.py: a new "Score semantics" section pins the cross-backend contract — higher-is-better, per-metric canonical similarity (cosine ∈ [-1, 1], dot raw inner product, l2 squashed to (0, 1]). Collection .search docstring references it. - vd/backends/_helpers.py: score_from_distance docstring spells out the formula per metric and which adapters route through it vs return a native combined score (ES / MongoDB / Pinecone). - vd/backends/faiss.py: route the l2 path through score_from_distance (was inline 1/(1+v)); add a comment explaining FAISS IndexFlatIP for cosine/dot is already canonical so passthrough is correct. - vd/backends/qdrant.py: behaviour unchanged, comment flags the Euclid case for future revisit if Qdrant changes its score sign convention. - tests/test_contract.py: pin the canonical contract on the in-memory reference adapter (identical-vector → metric-max, cosine-orthogonal → 0) and the helper's documented formula table. Refs #9

thorwhalen marked this pull request as ready for review May 28, 2026 08:25

thorwhalen merged commit 066ffa0 into master May 28, 2026
12 checks passed

thorwhalen deleted the claude/issue-9-document-score-contract branch May 28, 2026 08:25

thorwhalen mentioned this pull request May 28, 2026

Chroma backend search score is metric-blind and not comparable across backends #9

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(base): canonical SearchResult score contract; align faiss l2#21

docs(base): canonical SearchResult score contract; align faiss l2#21
thorwhalen merged 1 commit into
masterfrom
claude/issue-9-document-score-contract

thorwhalen commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thorwhalen commented May 27, 2026

Context

The contract (now documented)

Changes

Verification

Scope notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant