Skip to content

Add native Hnsw class to coordinode-embedded for ann-benchmarks integration #69

@polaz

Description

@polaz

Why

CoordiNode currently has no path onto the canonical ann-benchmarks.com leaderboard, because the ann-benchmarks Docker harness expects each algorithm to expose an in-process Python API (`fit` / `set_query_arguments` / `query`). The existing `coordinode-embedded.LocalClient` provides Cypher access, which carries parser+planner overhead that's not comparable with in-process HNSW libraries (hnswlib, FAISS, ScaNN, Annoy).

What

Add a new PyO3 class `Hnsw` in `coordinode-embedded` that wraps `coordinode_vector::hnsw::HnswIndex` directly — a fast-path bypass around Cypher.

Surface:

```python
from coordinode_embedded import Hnsw
idx = Hnsw(dim=128, metric="euclidean", M=16, ef_construction=200)
idx.fit(numpy_array) # (N, dim), float32
idx.set_ef(80)
labels = idx.knn_query(query_vec, k=10) # numpy array of int64 IDs
```

Bench integration flow

  1. CoordiNode CI (private monorepo) runs on every push to main on self-hosted runner
  2. CI checks out coordinode-python main, overrides `coordinode-rs/` submodule to point at the coordinode SHA being benched, builds wheel via maturin, runs ann-benchmarks via Docker adapter (lives in coordinode repo under `benches/ann-benchmarks-adapter/`)
  3. Result JSON pushed to `bench-data` branch for the docs site

For the final upstream PR to `erikbern/ann-benchmarks`, the adapter Dockerfile will simply `pip install coordinode-embedded==` from PyPI.

Acceptance criteria

  • `Hnsw` class registered in `_coordinode_embedded` module
  • Supports metrics: `cosine` / `angular`, `euclidean` / `l2`, `dot`, `manhattan` / `l1`
  • `fit(np.ndarray[float32, 2D])` — batch insert via `insert_batch`
  • `set_ef(int)` — runtime ef tuning via `set_ef_search`
  • `knn_query(np.ndarray[float32, 1D], k)` returns numpy `int64` array of length k
  • Unit test on SIFT1M slice (1000 vectors): build + query + recall@10 sanity check
  • `maturin build --release` produces wheel
  • No regression in `LocalClient` tests

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions