Skip to content

docs(base): canonical SearchResult score contract; align faiss l2#21

Merged
thorwhalen merged 1 commit into
masterfrom
claude/issue-9-document-score-contract
May 28, 2026
Merged

docs(base): canonical SearchResult score contract; align faiss l2#21
thorwhalen merged 1 commit into
masterfrom
claude/issue-9-document-score-contract

Conversation

@thorwhalen
Copy link
Copy Markdown
Member

Refs #9.

Context

Issue #9 flagged that ChromaCollection.search returned a metric-blind score (1/(1+distance) for every metric), so the same cosine query landed on different score scales across backends. The headline Chroma fix has already landed via vd.backends._helpers.score_from_distance(distance, metric) (chroma.py:166 routes through it). This PR closes the rest of #9 — the cross-backend contract documentation, the audit of adapters that didn't route through the helper, and a regression test pinning the contract on the reference (memory) backend.

The contract (now documented)

Every SearchResult["score"] is higher-is-better, per-metric canonical similarity:

metric canonical score range
cosine 1 - cosine_distance [-1, 1]
dot raw inner product (-inf, +inf)
l2 1 / (1 + euclidean_distance) (0, 1]

Adapters split cleanly into two camps:

  • Distance-returning (Chroma, DuckDB, LanceDB, Milvus L2, pgvector, Redis, sqlite-vec, Turbopuffer, Weaviate): already route through score_from_distance → canonical.
  • Native-similarity-returning (Elasticsearch kNN, MongoDB Atlas, Pinecone): backend hands back a higher-is-better number on its own per-metric scale; rescaling would change combined-ranking ties, so the adapter passes through and documents the deviation in its own docstring.

The two outliers — FAISS (used the inline formula 1/(1+v) for L2) and Qdrant (1/(1+score) for l2 is correct but undocumented) — are addressed below.

Changes

  • vd/base.py — new "Score semantics" section in the module preamble, with the contract table and rationale. Collection.search docstring's Yields block references it.
  • vd/backends/_helpers.pyscore_from_distance docstring spells out the per-metric formula and which adapters route through it vs return native scores. New doctest for the dot branch (score_from_distance(-0.7, 'dot') == 0.7).
  • vd/backends/faiss.py — route the L2 path through score_from_distance("l2", v) (was an inline 1.0 / (1.0 + value)). Add an inline comment explaining that the IndexFlatIP cosine path is canonical-by-construction (vectors are L2-normalized at write, so inner product is cosine similarity in [-1, 1]) and that the dot path returns the raw inner product, which is exactly vd's canonical dot score.
  • vd/backends/qdrant.py — behaviour unchanged; comment clarifies that the existing 1/(1+score) for L2 already matches vd's canonical formula and notes that the branch must be revisited if a future Qdrant client version flips Euclid to higher-is-better.
  • tests/test_contract.py — new "Score semantics" section pins the contract on the in-memory reference adapter:
    • Identical-vector query → metric-max score (cosine=1.0, dot=1.0, l2=1.0).
    • Cosine-orthogonal pair → exactly 0.0 (regression against the previous chroma-style metric-blind formula, which would have produced 1/(1+1) = 0.5).
    • The helper's documented formula table per metric.

Verification

  • pytest tests/test_contract.py tests/test_core.py → 139 passed (+5 new score tests), 140 skipped (unchanged — provider-gated).
  • pytest --doctest-modules vd/base.py vd/backends/_helpers.py → 4 passed (includes the new dot doctest).
  • Local end-to-end check on FAISS with faiss 1.13.2: identical-vector L2 score = 1.0; cosine identical = 1.0; cosine orthogonal = 0.0 — matches the memory reference adapter exactly.

Scope notes

  • Did not change Elasticsearch / MongoDB / Pinecone — by design, per the contract. Each already documents in its own adapter docstring that the score is the backend's native combined-ranking score on its own scale.
  • Did not change the cross-backend protocol shape (no new fields). The contract is documentary + a doctest + a regression test on the canonical implementation.

Draft until reviewed — main thing to confirm is the FAISS dot-metric note: vd's canonical dot score is the raw inner product, and IndexFlatIP for the dot metric returns the raw inner product directly, so no transform is correct. (For cosine the vectors are normalized at write so the same inner product is cosine similarity.)

Issue #9 noted that Chroma's score was metric-blind. The Chroma fix
landed earlier via score_from_distance(); this change closes the loop:

- vd/base.py: a new "Score semantics" section pins the cross-backend
  contract — higher-is-better, per-metric canonical similarity (cosine
  ∈ [-1, 1], dot raw inner product, l2 squashed to (0, 1]). Collection
  .search docstring references it.
- vd/backends/_helpers.py: score_from_distance docstring spells out the
  formula per metric and which adapters route through it vs return a
  native combined score (ES / MongoDB / Pinecone).
- vd/backends/faiss.py: route the l2 path through score_from_distance
  (was inline 1/(1+v)); add a comment explaining FAISS IndexFlatIP for
  cosine/dot is already canonical so passthrough is correct.
- vd/backends/qdrant.py: behaviour unchanged, comment flags the Euclid
  case for future revisit if Qdrant changes its score sign convention.
- tests/test_contract.py: pin the canonical contract on the in-memory
  reference adapter (identical-vector → metric-max, cosine-orthogonal
  → 0) and the helper's documented formula table.

Refs #9
@thorwhalen thorwhalen marked this pull request as ready for review May 28, 2026 08:25
@thorwhalen thorwhalen merged commit 066ffa0 into master May 28, 2026
12 checks passed
@thorwhalen thorwhalen deleted the claude/issue-9-document-score-contract branch May 28, 2026 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant