Skip to content

Commit 2042d0f

Browse files
committed
docs(embedded): sync metric alias list in type stub, switch argpartition pivot
- _coordinode_embedded.pyi: type stub docstring for `metric` now lists every accepted alias (cosine/angular, euclidean/l2, dot/dot_product/ ip/inner_product, manhattan/l1). The Rust parser and the constructor docstring on the Rust side already enumerate the full set; the .pyi stub had a stale subset. - tests/unit/test_hnsw.py: brute-force helper now passes `k - 1` as the argpartition pivot instead of `k`. Both forms yield identical sets for random gaussian inputs (no ties), but the `k - 1` form matches the most common Python phrasing of "k smallest via argpartition" and stops the static-analyzer from flagging the same `k` vs `k - 1` concern on every review round.
1 parent ef485d1 commit 2042d0f

2 files changed

Lines changed: 9 additions & 10 deletions

File tree

coordinode-embedded/python/coordinode_embedded/_coordinode_embedded.pyi

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,11 @@ class Hnsw:
1515
Args:
1616
dim: Embedding dimension. Must match the vectors passed to ``fit``
1717
and ``knn_query``.
18-
metric: Distance metric — one of ``"cosine"`` / ``"angular"``,
19-
``"euclidean"`` / ``"l2"``, ``"dot"`` / ``"inner_product"``,
20-
``"manhattan"`` / ``"l1"``.
18+
metric: Distance metric. Accepted spellings (case-insensitive):
19+
- cosine similarity: ``"cosine"``, ``"angular"``
20+
- Euclidean (L2): ``"euclidean"``, ``"l2"``
21+
- dot product: ``"dot"``, ``"dot_product"``, ``"ip"``, ``"inner_product"``
22+
- Manhattan (L1): ``"manhattan"``, ``"l1"``
2123
M: Max connections per element per layer (HNSW spec). Default 16.
2224
ef_construction: Candidate list size during build. Default 200.
2325
max_elements: Hint to pre-allocate node storage. Default 1_000_000.

tests/unit/test_hnsw.py

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -16,14 +16,11 @@
1616
def _brute_force_topk(X, q, k: int):
1717
# argpartition gives the top-k indices in O(N), vs argsort's O(N log N).
1818
# We only need the SET of nearest k, ordering inside the set doesn't
19-
# matter for the recall metric.
20-
#
21-
# The `k` argument to argpartition is the pivot index, NOT an off-by-one:
22-
# numpy places the (k+1)-th smallest at position k, with everything
23-
# smaller at positions 0..k-1. So `[:k]` gives exactly the k smallest
24-
# — verified empirically against np.argsort on random vectors.
19+
# matter for the recall metric. Pivot is k-1 (0-indexed) so the
20+
# element at position k-1 lands at its sorted position and everything
21+
# smaller is at 0..k-2 — slice [:k] yields the k smallest.
2522
dists = ((X - q) ** 2).sum(axis=1)
26-
return set(np.argpartition(dists, k)[:k].tolist())
23+
return set(np.argpartition(dists, k - 1)[:k].tolist())
2724

2825

2926
def test_metric_parsing_and_dim_validation() -> None:

0 commit comments

Comments
 (0)