Skip to content

feat(logosdb): add LogosDB vector database integration#782

Open
jose-compu wants to merge 1 commit into
zilliztech:mainfrom
jose-compu:feat/logosdb-integration
Open

feat(logosdb): add LogosDB vector database integration#782
jose-compu wants to merge 1 commit into
zilliztech:mainfrom
jose-compu:feat/logosdb-integration

Conversation

@jose-compu
Copy link
Copy Markdown

Summary

  • Adds LogosDB as a supported vector database backend — a fast, embedded HNSW vector store written in C/C++ with Python bindings, backed by memory-mapped binary storage and hnswlib.
  • Implements the full VectorDB interface: __init__, init context manager, insert_embeddings (via put_batch), search_embedding, and optimize.
  • Registers DB.LogosDB in the enum and wires up init_cls, config_cls, and case_config_cls.
  • Adds the logosdb CLI subcommand with a --uri flag (local directory path).
  • Adds logosdb as an optional extra in pyproject.toml.

Design notes

  • LogosDB is an embedded (single-process, file-based) database — no server required. The DB directory is passed via --uri.
  • Distance metric is derived from the case MetricType at runtime (COSINE / L2 / IP). COSINE is the default and auto-normalizes vectors.
  • Benchmark metadata IDs are stored as the text field (str(id)) and parsed back on search, since LogosDB's internal row IDs are independent of the benchmark ID space.
  • HNSW index is built incrementally on insert; optimize() is a no-op with a log message.

Benchmark result

Tested on Performance1536D50K (OpenAI embeddings, 50K vectors, 1536 dim, COSINE) on Apple M-series:

Metric Value
Load duration 340 s
Serial latency p99 4.6 ms
Serial latency p95 4.0 ms
Recall@100 0.9347
NDCG 0.9464

Test plan

  • pip install logosdb (binary wheels for Linux x86_64/aarch64 and macOS x86_64/arm64, CPython 3.9-3.13)
  • vectordbbench logosdb --uri /tmp/vdbbench_logosdb --case-type Performance1536D50K --skip-search-concurrent
  • Verify recall, latency, and result JSON written to vectordb_bench/results/LogosDB/

- Add LogosDB embedded HNSW client (local file-based, mmap, hnswlib)
- Config: LogosDBConfig (uri path) + LogosDBIndexConfig (metric type)
- Supports COSINE, L2, and IP distance metrics
- Uses put_batch for efficient bulk insert; metadata IDs stored as text
- Register DB.LogosDB enum, init_cls, config_cls, case_config_cls
- Register 'logosdb' CLI command in vectordbbench
- Add logosdb optional extra in pyproject.toml

Benchmark result (50K OpenAI 1536-dim, COSINE):
  recall@100=0.9347  ndcg=0.9464  p99=4.6ms  p95=4.0ms
@sre-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jose-compu
To complete the pull request process, please assign xuanyang-cn after the PR has been reviewed.
You can assign the PR to them by writing /assign @xuanyang-cn in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jose-compu
Copy link
Copy Markdown
Author

can you please review @sre-ci-robot @jkatz @javiervegas @claude ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants