docs(base): canonical SearchResult score contract; align faiss l2#21
Merged
Merged
Conversation
Issue #9 noted that Chroma's score was metric-blind. The Chroma fix landed earlier via score_from_distance(); this change closes the loop: - vd/base.py: a new "Score semantics" section pins the cross-backend contract — higher-is-better, per-metric canonical similarity (cosine ∈ [-1, 1], dot raw inner product, l2 squashed to (0, 1]). Collection .search docstring references it. - vd/backends/_helpers.py: score_from_distance docstring spells out the formula per metric and which adapters route through it vs return a native combined score (ES / MongoDB / Pinecone). - vd/backends/faiss.py: route the l2 path through score_from_distance (was inline 1/(1+v)); add a comment explaining FAISS IndexFlatIP for cosine/dot is already canonical so passthrough is correct. - vd/backends/qdrant.py: behaviour unchanged, comment flags the Euclid case for future revisit if Qdrant changes its score sign convention. - tests/test_contract.py: pin the canonical contract on the in-memory reference adapter (identical-vector → metric-max, cosine-orthogonal → 0) and the helper's documented formula table. Refs #9
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Refs #9.
Context
Issue #9 flagged that
ChromaCollection.searchreturned a metric-blind score (1/(1+distance)for every metric), so the same cosine query landed on different score scales across backends. The headline Chroma fix has already landed viavd.backends._helpers.score_from_distance(distance, metric)(chroma.py:166 routes through it). This PR closes the rest of #9 — the cross-backend contract documentation, the audit of adapters that didn't route through the helper, and a regression test pinning the contract on the reference (memory) backend.The contract (now documented)
Every
SearchResult["score"]is higher-is-better, per-metric canonical similarity:cosine1 - cosine_distance[-1, 1]dot(-inf, +inf)l21 / (1 + euclidean_distance)(0, 1]Adapters split cleanly into two camps:
score_from_distance→ canonical.The two outliers — FAISS (used the inline formula
1/(1+v)for L2) and Qdrant (1/(1+score)forl2is correct but undocumented) — are addressed below.Changes
vd/base.py— new "Score semantics" section in the module preamble, with the contract table and rationale.Collection.searchdocstring's Yields block references it.vd/backends/_helpers.py—score_from_distancedocstring spells out the per-metric formula and which adapters route through it vs return native scores. New doctest for thedotbranch (score_from_distance(-0.7, 'dot') == 0.7).vd/backends/faiss.py— route the L2 path throughscore_from_distance("l2", v)(was an inline1.0 / (1.0 + value)). Add an inline comment explaining that theIndexFlatIPcosine path is canonical-by-construction (vectors are L2-normalized at write, so inner product is cosine similarity in[-1, 1]) and that thedotpath returns the raw inner product, which is exactly vd's canonical dot score.vd/backends/qdrant.py— behaviour unchanged; comment clarifies that the existing1/(1+score)for L2 already matches vd's canonical formula and notes that the branch must be revisited if a future Qdrant client version flips Euclid to higher-is-better.tests/test_contract.py— new "Score semantics" section pins the contract on the in-memory reference adapter:1/(1+1) = 0.5).Verification
pytest tests/test_contract.py tests/test_core.py→ 139 passed (+5 new score tests), 140 skipped (unchanged — provider-gated).pytest --doctest-modules vd/base.py vd/backends/_helpers.py→ 4 passed (includes the newdotdoctest).faiss 1.13.2: identical-vector L2 score = 1.0; cosine identical = 1.0; cosine orthogonal = 0.0 — matches the memory reference adapter exactly.Scope notes
Draft until reviewed — main thing to confirm is the FAISS dot-metric note: vd's canonical dot score is the raw inner product, and
IndexFlatIPfor the dot metric returns the raw inner product directly, so no transform is correct. (For cosine the vectors are normalized at write so the same inner product is cosine similarity.)