bf_tree migration away from diskann-providers#1020
Draft
JordanMaples wants to merge 4 commits intomainfrom
Draft
bf_tree migration away from diskann-providers#1020JordanMaples wants to merge 4 commits intomainfrom
JordanMaples wants to merge 4 commits intomainfrom
Conversation
Add bf_tree insert benchmarks (Criterion + IAI)
Add pre-migration bf_tree provider benchmarks mirroring the existing
inmem insert benchmark. Both Criterion (wall-clock) and IAI-Callgrind
(instruction-count) benchmarks use the same sift-256 dataset and PQ
training as the inmem benchmarks for apples-to-apples comparison.
Gated behind #[cfg(feature = "bf_tree")] so existing benchmarks
continue to work without the feature flag.
Run with: cargo bench --bench bench_main --features virtual_storage,bf_tree
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add bf_tree search benchmarks for migration parity
Add search benchmarks (Criterion + IAI-Callgrind) alongside existing
insert benchmarks. Search exercises the read path through
BfTreeProvider (vector lookups, neighbor access, greedy search
traversal) which is critical for detecting regressions post-migration.
The search benchmark builds the index once during setup, then measures
10 KNN queries (k=10, l=20) against the populated index.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extract bf_tree provider into diskann-bf_tree-provider crate
Move the bf_tree provider and caching modules from diskann-providers
into the dedicated diskann-bf_tree-provider crate. This makes the
bf-tree dependency explicit at the crate level rather than hidden
behind a feature flag.
Changes:
- Move bf_tree/ and caching/ modules to diskann-bf_tree-provider/src/
- Set up Cargo.toml with proper dependencies (bf-tree, diskann,
diskann-providers, diskann-utils, diskann-vector, etc.)
- Convert all crate:: imports to diskann_providers:: imports
- Bump visibility of postprocess module, AsDeletionCheck/DeletionCheck
traits, and TableDeleteProviderAsync methods to pub
- Remove bf_tree feature flag and optional deps from diskann-providers
- Remove cfg(feature = "bf_tree") gates from table_delete_provider.rs
- Rename provider/provider.rs to provider/bf_tree_provider.rs (clippy)
- Disable caching example.rs tests (cross-crate cfg(test) limitation)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move bf_tree benchmarks to diskann-bf_tree-provider
Migrate Criterion and IAI-Callgrind benchmark targets from
diskann-providers to the new diskann-bf_tree-provider crate.
Update imports to use the new crate's module paths. Remove
bf_tree benchmark remnants (cfg gates, modules) from
diskann-providers benches.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
cargo.lock got missed by last commit
Move delete bitmap serialization to diskann-bf_tree-provider
Move to_bytes/from_bytes logic out of TableDeleteProviderAsync (where
it was bf_tree-scoped) into a dedicated delete_bitmap_serde module in
the new crate. The reimplementation works through the public API and
produces byte-identical output (LE u32 words). Revert the methods in
diskann-providers to pub(crate) + #[cfg(test)].
Adds format-level compatibility tests including known fixtures and
padding bit rejection.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move DeletionCheck traits and RemoveDeletedIdsAndCopy to diskann::graph::glue
These are core to the inplace delete algorithm's search post-processing
and belong in the diskann crate alongside SearchPostProcess, CopyIds,
FilterStartPoints, and Pipeline. This removes the need to widen
diskann-providers' postprocess module to pub for external crates.
Also removes the now-dead to_bytes/from_bytes methods and their tests
from TableDeleteProviderAsync in diskann-providers, since that
serialization logic has been moved to diskann-bf_tree-provider.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use workspace_root() for benchmark test data paths
Replace env!("CARGO_MANIFEST_DIR")-relative paths with
diskann_utils::workspace_root() so benchmarks find test_data/
without requiring symlinks in each crate directory.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The bf_tree feature was removed from diskann-providers when the provider was extracted into its own crate (diskann-bf_tree-provider). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
b94f4df to
44be934
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1020 +/- ##
==========================================
- Coverage 89.51% 88.97% -0.55%
==========================================
Files 460 469 +9
Lines 85424 88878 +3454
==========================================
+ Hits 76469 79077 +2608
- Misses 8955 9801 +846
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
So far, this is a mostly mechanical movement of the bf_tree prorovider from inside of diskann-providers crate into its own crate. This is largely a copilot authored PR, guided by me.
The general flow of the work was:
There are a couple remaining things to do here before we take this PR:
Some concerns that need to be addressed before this is ready for review.
As part of the move, there are some concepts that were moved into glue.rs and had their visibility expanded. This will need to be cleaned up and moved back to their previous location (and visibility level) once the tableprovider for the delete dealt with.