Motivation
Issues #100 and #147 request GPU-accelerated indexing. zvec already has standalone GPU backends (FAISS GPU, cuVS CAGRA/IVF-PQ, Apple MPS) but they're not integrated with the Collection API — users can't do collection.query() with GPU acceleration.
Proposed approach
1. UnifiedGpuIndex ABC (python/zvec/backends/unified.py)
Abstract base with train() + add() + search() + size() + backend_name, plus 6 adapters:
| Adapter |
Wraps |
Priority |
CppCuvsAdapter |
Native _zvec pybind11 (zero-copy) |
1 (highest) |
CuvsCAGRAAdapter |
cuvs.neighbors.cagra |
2 |
CuvsIvfPqAdapter |
cuvs.neighbors.ivf_pq |
3 |
FaissGpuAdapter |
backends/gpu.py::GPUIndex |
4 |
AppleMpsAdapter |
backends/apple_silicon.py |
5 |
FaissCpuAdapter |
FAISS CPU fallback |
6 |
C++ native path is preferred (avoids Python→GPU data copies).
2. GpuIndex bridge (python/zvec/gpu_index.py)
Connects GPU search to Collection:
gpu = collection.gpu_index("embedding")
gpu.build(vectors, ids)
docs = gpu.query(query_vector, topk=10, output_fields=["title"])
# Returns list[Doc] — same format as collection.query()
Flow: GPU search → map indices to doc IDs → collection.fetch() → attach scores → return list[Doc].
3. Backend detection (python/zvec/backends/detect.py)
Adds C++ cuVS and Python cuVS detection with proper priority chain.
Tested on RTX 4090
| Backend |
QPS (50K vectors, dim=128) |
| FAISS GPU (flat) |
529,316 |
| cuVS IVF-PQ |
45,771 |
| cuVS CAGRA |
43,711 |
Questions for maintainers
- Is
collection.gpu_index(field_name) the right API surface, or would you prefer a different entry point?
- The current design requires explicit
gpu.build(vectors, ids) since _Collection has no scan/iterate API. Would you consider exposing a bulk vector extraction API from C++?
- Any preference on how the C++ cuVS priority should interact with the existing backend detection?
Draft implementation: #176 (addresses #100 and #147)
Motivation
Issues #100 and #147 request GPU-accelerated indexing. zvec already has standalone GPU backends (FAISS GPU, cuVS CAGRA/IVF-PQ, Apple MPS) but they're not integrated with the Collection API — users can't do
collection.query()with GPU acceleration.Proposed approach
1.
UnifiedGpuIndexABC (python/zvec/backends/unified.py)Abstract base with
train()+add()+search()+size()+backend_name, plus 6 adapters:CppCuvsAdapter_zvecpybind11 (zero-copy)CuvsCAGRAAdaptercuvs.neighbors.cagraCuvsIvfPqAdaptercuvs.neighbors.ivf_pqFaissGpuAdapterbackends/gpu.py::GPUIndexAppleMpsAdapterbackends/apple_silicon.pyFaissCpuAdapterC++ native path is preferred (avoids Python→GPU data copies).
2.
GpuIndexbridge (python/zvec/gpu_index.py)Connects GPU search to Collection:
Flow: GPU search → map indices to doc IDs →
collection.fetch()→ attach scores → returnlist[Doc].3. Backend detection (
python/zvec/backends/detect.py)Adds C++ cuVS and Python cuVS detection with proper priority chain.
Tested on RTX 4090
Questions for maintainers
collection.gpu_index(field_name)the right API surface, or would you prefer a different entry point?gpu.build(vectors, ids)since_Collectionhas no scan/iterate API. Would you consider exposing a bulk vector extraction API from C++?Draft implementation: #176 (addresses #100 and #147)