What problem does this solve?
This proposes a hybrid design that keeps LEANN’s tiny on-device footprint while removing the heavy online re-embedding step.
Proposed solution
Local: store the pruned HNSW/graph skeleton + lightweight PQ/OPQ sketches for coarse search.
Remote: store the full-precision vectors (prefer FP16) behind a low-latency batch API.
Query: do local coarse traversal, batch-fetch a small set of candidate vectors from remote, then re-rank locally. Layer in LRU caching and look-ahead prefetch to keep tail latency low.
Example usage