Skip to content

bf_tree migration away from diskann-providers#1020

Draft
JordanMaples wants to merge 4 commits intomainfrom
jordanmaples/bf_tree_migration
Draft

bf_tree migration away from diskann-providers#1020
JordanMaples wants to merge 4 commits intomainfrom
jordanmaples/bf_tree_migration

Conversation

@JordanMaples
Copy link
Copy Markdown
Contributor

@JordanMaples JordanMaples commented May 5, 2026

So far, this is a mostly mechanical movement of the bf_tree prorovider from inside of diskann-providers crate into its own crate. This is largely a copilot authored PR, guided by me.

The general flow of the work was:

  1. create benchmarks, similar to the inmem ones, for bf_tree. (search and insert)
  2. extract the bf_tree provider.
  3. move tests and perform cleanup.
  4. validate that there wasn't a perf degradation, using the benchmarks, post move.

There are a couple remaining things to do here before we take this PR:

  1. Look at removing the tableprovider functionality for deletes from the bf_tree and instead directly dropping the keys
  2. dropping PQ functionality from the bf_tree
  3. add SQ functionality

Some concerns that need to be addressed before this is ready for review.
As part of the move, there are some concepts that were moved into glue.rs and had their visibility expanded. This will need to be cleaned up and moved back to their previous location (and visibility level) once the tableprovider for the delete dealt with.

JordanMaples and others added 4 commits May 5, 2026 13:58
Add bf_tree insert benchmarks (Criterion + IAI)

Add pre-migration bf_tree provider benchmarks mirroring the existing
inmem insert benchmark. Both Criterion (wall-clock) and IAI-Callgrind
(instruction-count) benchmarks use the same sift-256 dataset and PQ
training as the inmem benchmarks for apples-to-apples comparison.

Gated behind #[cfg(feature = "bf_tree")] so existing benchmarks
continue to work without the feature flag.

Run with: cargo bench --bench bench_main --features virtual_storage,bf_tree

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add bf_tree search benchmarks for migration parity

Add search benchmarks (Criterion + IAI-Callgrind) alongside existing
insert benchmarks. Search exercises the read path through
BfTreeProvider (vector lookups, neighbor access, greedy search
traversal) which is critical for detecting regressions post-migration.

The search benchmark builds the index once during setup, then measures
10 KNN queries (k=10, l=20) against the populated index.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Extract bf_tree provider into diskann-bf_tree-provider crate

Move the bf_tree provider and caching modules from diskann-providers
into the dedicated diskann-bf_tree-provider crate. This makes the
bf-tree dependency explicit at the crate level rather than hidden
behind a feature flag.

Changes:
- Move bf_tree/ and caching/ modules to diskann-bf_tree-provider/src/
- Set up Cargo.toml with proper dependencies (bf-tree, diskann,
  diskann-providers, diskann-utils, diskann-vector, etc.)
- Convert all crate:: imports to diskann_providers:: imports
- Bump visibility of postprocess module, AsDeletionCheck/DeletionCheck
  traits, and TableDeleteProviderAsync methods to pub
- Remove bf_tree feature flag and optional deps from diskann-providers
- Remove cfg(feature = "bf_tree") gates from table_delete_provider.rs
- Rename provider/provider.rs to provider/bf_tree_provider.rs (clippy)
- Disable caching example.rs tests (cross-crate cfg(test) limitation)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Move bf_tree benchmarks to diskann-bf_tree-provider

Migrate Criterion and IAI-Callgrind benchmark targets from
diskann-providers to the new diskann-bf_tree-provider crate.
Update imports to use the new crate's module paths. Remove
bf_tree benchmark remnants (cfg gates, modules) from
diskann-providers benches.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

cargo.lock got missed by last commit

Move delete bitmap serialization to diskann-bf_tree-provider

Move to_bytes/from_bytes logic out of TableDeleteProviderAsync (where
it was bf_tree-scoped) into a dedicated delete_bitmap_serde module in
the new crate. The reimplementation works through the public API and
produces byte-identical output (LE u32 words). Revert the methods in
diskann-providers to pub(crate) + #[cfg(test)].

Adds format-level compatibility tests including known fixtures and
padding bit rejection.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Move DeletionCheck traits and RemoveDeletedIdsAndCopy to diskann::graph::glue

These are core to the inplace delete algorithm's search post-processing
and belong in the diskann crate alongside SearchPostProcess, CopyIds,
FilterStartPoints, and Pipeline. This removes the need to widen
diskann-providers' postprocess module to pub for external crates.

Also removes the now-dead to_bytes/from_bytes methods and their tests
from TableDeleteProviderAsync in diskann-providers, since that
serialization logic has been moved to diskann-bf_tree-provider.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Use workspace_root() for benchmark test data paths

Replace env!("CARGO_MANIFEST_DIR")-relative paths with
diskann_utils::workspace_root() so benchmarks find test_data/
without requiring symlinks in each crate directory.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The bf_tree feature was removed from diskann-providers when the
provider was extracted into its own crate (diskann-bf_tree-provider).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@JordanMaples JordanMaples force-pushed the jordanmaples/bf_tree_migration branch from b94f4df to 44be934 Compare May 5, 2026 20:58
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 98.74214% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.97%. Comparing base (95b6593) to head (44be934).

Files with missing lines Patch % Lines
..._tree-provider/src/provider/delete_bitmap_serde.rs 98.23% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1020      +/-   ##
==========================================
- Coverage   89.51%   88.97%   -0.55%     
==========================================
  Files         460      469       +9     
  Lines       85424    88878    +3454     
==========================================
+ Hits        76469    79077    +2608     
- Misses       8955     9801     +846     
Flag Coverage Δ
miri 88.97% <98.74%> (-0.55%) ⬇️
unittests 88.82% <98.74%> (-0.54%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-bf_tree-provider/src/caching/bf_cache.rs 93.39% <100.00%> (ø)
diskann-bf_tree-provider/src/caching/error.rs 98.86% <ø> (ø)
diskann-bf_tree-provider/src/caching/provider.rs 28.57% <ø> (ø)
diskann-bf_tree-provider/src/caching/utils.rs 92.39% <100.00%> (ø)
...-bf_tree-provider/src/provider/bf_tree_provider.rs 71.25% <100.00%> (ø)
diskann-bf_tree-provider/src/provider/mod.rs 0.00% <ø> (ø)
...bf_tree-provider/src/provider/neighbor_provider.rs 92.21% <ø> (ø)
...ree-provider/src/provider/quant_vector_provider.rs 89.24% <ø> (ø)
...n-bf_tree-provider/src/provider/vector_provider.rs 86.03% <100.00%> (ø)
...roviders/src/model/graph/provider/async_/common.rs 90.05% <ø> (+1.75%) ⬆️
... and 7 more

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants