docs(embedded): drop numpy version pin in init comment, document argpartition pivot

polaz · polaz · commit 07d7dfd778cd · 2026-05-23T10:30:41.000+03:00
- src/lib.rs: the explanation of why the pymodule fn does not call a
  NumPy initializer no longer mentions a specific numpy crate line.
  The numpy crate has held the same lazy-import policy across 0.23
  and 0.24, and that note was already stale when the crate dependency
  moved to 0.24 in the previous round.

- tests/unit/test_hnsw.py: `_brute_force_topk` gains an inline note
  documenting numpy's argpartition pivot semantics — the `k` argument
  is a pivot index, not an off-by-one. Verified empirically against
  np.argsort on random vectors. Heads off the recurring "should be
  k-1" review suggestion.
diff --git a/coordinode-embedded/src/lib.rs b/coordinode-embedded/src/lib.rs
@@ -317,8 +317,10 @@ fn _coordinode_embedded(m: &Bound<'_, PyModule>) -> PyResult<()> {
     // No explicit NumPy C-API initialization here.  The `numpy` crate
     // (see crate-level docs: "Loading NumPy is done automatically and on
     // demand") triggers `import numpy.core` lazily the first time a
-    // PyArray operation runs.  An explicit init step would be a no-op
-    // and there is no public initializer in numpy 0.23.
+    // PyArray operation runs.  An explicit init step would be a no-op —
+    // the crate exposes no public initializer (verified against the
+    // 0.23/0.24 lines, and that policy is unlikely to change while the
+    // crate's lazy-import design holds).
     m.add_class::<LocalClient>()?;
     m.add_class::<hnsw::Hnsw>()?;
     Ok(())
diff --git a/tests/unit/test_hnsw.py b/tests/unit/test_hnsw.py
@@ -17,6 +17,11 @@ def _brute_force_topk(X, q, k: int):
     # argpartition gives the top-k indices in O(N), vs argsort's O(N log N).
     # We only need the SET of nearest k, ordering inside the set doesn't
     # matter for the recall metric.
+    #
+    # The `k` argument to argpartition is the pivot index, NOT an off-by-one:
+    # numpy places the (k+1)-th smallest at position k, with everything
+    # smaller at positions 0..k-1.  So `[:k]` gives exactly the k smallest
+    # — verified empirically against np.argsort on random vectors.
     dists = ((X - q) ** 2).sum(axis=1)
     return set(np.argpartition(dists, k)[:k].tolist())