Skip to content

add bf_tree benchmark infrastructure#1106

Open
JordanMaples wants to merge 3 commits into
mainfrom
jordanmaples/bftree-benchmark
Open

add bf_tree benchmark infrastructure#1106
JordanMaples wants to merge 3 commits into
mainfrom
jordanmaples/bftree-benchmark

Conversation

@JordanMaples
Copy link
Copy Markdown
Contributor

@JordanMaples JordanMaples commented May 26, 2026

Adds benchmark support for the bf_tree storage backend — both full-precision and spherical-quantized, in static (build-once) and streaming (insert/delete/search) modes.

What's included

Benchmark variants (4 tags):

  • graph-index-bftree-full-precision-f32 — static build + search
  • graph-index-bftree-stream-full-precision-f32 — streaming with runbook
  • graph-index-build-bftree-spherical-quantization — static spherical (1/2/4-bit)
  • graph-index-stream-bftree-spherical-quantization — streaming spherical (1/2/4-bit)

Bug fixes in diskann-bftree:

  1. Delete crash (use-after-free in pruning): inplace_delete removes a vector from storage via Delete::delete, then immediately prunes the deleted node's neighbors — which requires reading the deleted vectorback through the accessor. The accessor would hit a Transient error (vector gone) and panic/skip. Fix: cache both full-precision and quantized vectors before deletion, and fall back to the cache in theaccessor's get_element and on_elements_unordered when storage returns Transient.
  2. Max-degree mismatch in append_vector: The old append_vector would resize the neighbor list to self.dim and write the count at dim - 1, but set_neighbors wrote a variable-length buffer. This inconsistencycaused bf-tree page fragmentation and potential read errors when the two code paths produced different value sizes for the same key. Fix: both paths now write a fixed-size dim-element buffer using the sameformat (|neighbors|padding|count|).
  3. Heap allocation in hot path: set_neighbors and append_vector used Vec::concat() / Vec::resize() on every call. Replaced with a stack-allocated [u8; 2048] buffer (supports up to 512 u32 neighbors — wellbeyond any practical max_degree).

Benchmark infrastructure:

  • Expose bf_tree internal config (buffer sizes, fanout, promotion rates) as JSON input fields
  • Shared helpers to reduce duplication: bftree_parameters_from(), run_streaming()
  • clear_delete_caches() to bound memory in long-running streaming workloads
  • Unbounded quant_delete_cache growth identified and fixed — caches are cleared after each maintenance pass

Example configs

See diskann-benchmark/example/graph-index-bftree*.json for ready-to-run configurations targeting MSTuring-1M.

tests on CosmosDBDevBox VM (16 vCPU 64GB RAM 2,048GB SSD)
Results from MSTuring-1M

Full Precision streaming:

 max_degree=64, L_build=128, alpha=1.2, 4GB bf_tree buffers, 4 build threads, 2 search threads

┌───────┬────────┬─────────┬───────────────────┬──────────────────────────┬────────────────────┐
│ Stage │ Op     │ Vectors │ Avg / p99 Latency │ Recall@10 (L=32 / L=200) │ QPS (L=32 / L=200) │
├───────┼────────┼─────────┼───────────────────┼──────────────────────────┼────────────────────┤
│ 1     │ Insert │ 1M      │ 11.9ms / 17.9ms   │ —                        │ —                  │
├───────┼────────┼─────────┼───────────────────┼──────────────────────────┼────────────────────┤
│ 2     │ Search │ —       │ —                 │ 82.8% / 97.0%            │ 460 / 99           │
├───────┼────────┼─────────┼───────────────────┼──────────────────────────┼────────────────────┤
│ 3     │ Delete │ 500K    │ 16.3ms / 20.9ms   │ —                        │ —                  │
├───────┼────────┼─────────┼───────────────────┼──────────────────────────┼────────────────────┤
│ 4     │ Search │ —       │ —                 │ 84.2% / 97.9%            │ 474 / 97           │
├───────┼────────┼─────────┼───────────────────┼──────────────────────────┼────────────────────┤
│ 5     │ Insert │ 500K    │ 15.2ms / 19.6ms   │ —                        │ —                  │
├───────┼────────┼─────────┼───────────────────┼──────────────────────────┼────────────────────┤
│ 6     │ Search │ —       │ —                 │ 83.9% / 97.4%            │ 430 / 92           │
└───────┴────────┴─────────┴───────────────────┴──────────────────────────┴────────────────────┘

Spherical 2-bit streaming

 max_degree=64, L_build=128, alpha=1.2, 4GB bf_tree buffers, 4 build threads, 2 search threads

┌───────┬────────┬─────────┬───────────────────┬──────────────────────────┬────────────────────┐
│ Stage │ Op     │ Vectors │ Avg / p99 Latency │ Recall@10 (L=32 / L=200) │ QPS (L=32 / L=200) │
├───────┼────────┼─────────┼───────────────────┼──────────────────────────┼────────────────────┤
│ 1     │ Insert │ 1M      │ 4.8ms / 7.6ms     │ —                        │ —                  │
├───────┼────────┼─────────┼───────────────────┼──────────────────────────┼────────────────────┤
│ 2     │ Search │ —       │ —                 │ 74.7% / 94.0%            │ 1167 / 255         │
├───────┼────────┼─────────┼───────────────────┼──────────────────────────┼────────────────────┤
│ 3     │ Delete │ 500K    │ 6.9ms / 9.3ms     │ —                        │ —                  │
├───────┼────────┼─────────┼───────────────────┼──────────────────────────┼────────────────────┤
│ 4     │ Search │ —       │ —                 │ 73.4% / 94.1%            │ 1103 / 233         │
├───────┼────────┼─────────┼───────────────────┼──────────────────────────┼────────────────────┤
│ 5     │ Insert │ 500K    │ 6.0ms / 8.6ms     │ —                        │ —                  │
├───────┼────────┼─────────┼───────────────────┼──────────────────────────┼────────────────────┤
│ 6     │ Search │ —       │ —                 │ 75.1% / 94.3%            │ 1053 / 229         │
└───────┴────────┴─────────┴───────────────────┴──────────────────────────┴────────────────────┘

Spherical 4-bit streaming


 max_degree=64, L_build=128, alpha=1.2, 4GB bf_tree buffers, 4 build threads, 2 search threads

┌───────┬────────┬─────────┬───────────────────┬──────────────────────────┬────────────────────┐
│ Stage │ Op     │ Vectors │ Avg / p99 Latency │ Recall@10 (L=32 / L=200) │ QPS (L=32 / L=200) │
├───────┼────────┼─────────┼───────────────────┼──────────────────────────┼────────────────────┤
│ 1     │ Insert │ 1M      │ 4.8ms / 7.7ms     │ —                        │ —                  │
├───────┼────────┼─────────┼───────────────────┼──────────────────────────┼────────────────────┤
│ 2     │ Search │ —       │ —                 │ 82.3% / 96.7%            │ 1127 / 253         │
├───────┼────────┼─────────┼───────────────────┼──────────────────────────┼────────────────────┤
│ 3     │ Delete │ 500K    │ 6.9ms / 9.5ms     │ —                        │ —                  │
├───────┼────────┼─────────┼───────────────────┼──────────────────────────┼────────────────────┤
│ 4     │ Search │ —       │ —                 │ 82.4% / 97.3%            │ 1095 / 231         │
├───────┼────────┼─────────┼───────────────────┼──────────────────────────┼────────────────────┤
│ 5     │ Insert │ 500K    │ 6.3ms / 9.1ms     │ —                        │ —                  │
├───────┼────────┼─────────┼───────────────────┼──────────────────────────┼────────────────────┤
│ 6     │ Search │ —       │ —                 │ 83.1% / 97.1%            │ 1019 / 217         │
└───────┴────────┴─────────┴───────────────────┴──────────────────────────┴────────────────────┘

Comparison across bit widths at L=200:

┌─────────────────┬────────┬─────┬────────────────┐
│ Variant         │ Recall │ QPS │ Insert Latency │
├─────────────────┼────────┼─────┼────────────────┤
│ Full Precision  │ 97.0%  │ 99  │ 11.9ms         │
├─────────────────┼────────┼─────┼────────────────┤
│ Spherical 4-bit │ 96.7%  │ 253 │ 4.8ms          │
├─────────────────┼────────┼─────┼────────────────┤
│ Spherical 2-bit │ 94.0%  │ 255 │ 4.8ms          │
└─────────────────┴────────┴─────┴────────────────┘

The configuration used in the above tests was enough to keep all of the data in memory and didn't overflow to disk. I also tested full-precision using 32mb of cb size and experienced a 2x performance degradation due to the disk lookups.

CI reports an increase in IR size of 11%, which is over the 5% allowed in the CI file. This is largely unavoidable given the amount of code added here, so that CI gate should be ignored for this PR.

@JordanMaples JordanMaples force-pushed the jordanmaples/bftree-benchmark branch from 4201a33 to 1661985 Compare May 26, 2026 21:21
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 26, 2026

Codecov Report

❌ Patch coverage is 61.43791% with 59 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.41%. Comparing base (f8bbf3e) to head (cfddda1).

Files with missing lines Patch % Lines
diskann-bftree/src/provider.rs 50.00% 34 Missing ⚠️
diskann-bftree/src/neighbors.rs 77.77% 10 Missing ⚠️
diskann-benchmark/src/inputs/graph_index.rs 65.38% 9 Missing ⚠️
diskann-benchmark/src/backend/index/benchmarks.rs 33.33% 6 Missing ⚠️

❌ Your patch status has failed because the patch coverage (61.43%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1106      +/-   ##
==========================================
- Coverage   89.46%   89.41%   -0.06%     
==========================================
  Files         482      482              
  Lines       91092    91192     +100     
==========================================
+ Hits        81497    81540      +43     
- Misses       9595     9652      +57     
Flag Coverage Δ
miri 89.41% <61.43%> (-0.06%) ⬇️
unittests 89.06% <61.43%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-benchmark/src/backend/index/build.rs 85.92% <100.00%> (ø)
diskann-benchmark/src/backend/index/mod.rs 100.00% <100.00%> (ø)
diskann-benchmark/src/backend/index/product.rs 100.00% <ø> (ø)
diskann-benchmark/src/backend/index/scalar.rs 100.00% <ø> (ø)
diskann-benchmark/src/backend/index/spherical.rs 100.00% <ø> (ø)
...ann-benchmark/src/backend/index/streaming/stats.rs 0.00% <ø> (ø)
diskann-benchmark/src/inputs/mod.rs 81.25% <ø> (ø)
diskann-bftree/src/quant.rs 89.83% <ø> (-1.70%) ⬇️
diskann-benchmark/src/backend/index/benchmarks.rs 47.31% <33.33%> (ø)
diskann-benchmark/src/inputs/graph_index.rs 37.29% <65.38%> (+1.01%) ⬆️
... and 2 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JordanMaples JordanMaples force-pushed the jordanmaples/bftree-benchmark branch 2 times, most recently from 2981ffc to c36e2a5 Compare May 26, 2026 22:44
@JordanMaples JordanMaples marked this pull request as ready for review May 27, 2026 15:15
@JordanMaples JordanMaples requested review from a team and Copilot May 27, 2026 15:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds bf_tree-backed benchmark support to diskann-benchmark (full-precision + spherical quantization; static + streaming variants) and updates diskann-bftree to address inplace-delete behavior and reduce per-call allocations in neighbor serialization.

Changes:

  • Add a new bftree input schema + backend benchmarks (static build/search and streaming runbook modes, including spherical quantization bit-width dispatch).
  • Add delete-time vector caching in diskann-bftree to support inplace-delete pruning after the underlying storage entries are removed.
  • Rework neighbor list serialization to avoid per-call heap allocations (stack buffer fast path).

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
diskann-bftree/src/quant.rs Makes get_vector_sync available outside tests for delete-time caching.
diskann-bftree/src/provider.rs Adds delete caches + accessor fallbacks for inplace-delete pruning; exposes clear_delete_caches().
diskann-bftree/src/neighbors.rs Switches neighbor serialization to a stack buffer fast path.
diskann-benchmark/src/inputs/mod.rs Centralizes PRINT_WIDTH + write_field! for consistent summaries.
diskann-benchmark/src/inputs/graph_index.rs Exposes IndexBuild helpers for reuse by bf_tree inputs.
diskann-benchmark/src/inputs/bftree.rs New bf_tree input schemas (FP + spherical; static + streaming) and display/validation.
diskann-benchmark/src/backend/index/streaming/stats.rs Adds a bf_tree-only GenericStats::noop() helper for skipped maintenance steps.
diskann-benchmark/src/backend/index/mod.rs Registers bf_tree benchmarks behind the bftree feature.
diskann-benchmark/src/backend/index/bftree/mod.rs New bf_tree benchmark module + shared streaming runner.
diskann-benchmark/src/backend/index/bftree/full_precision.rs Implements bf_tree full-precision static + streaming benchmarks.
diskann-benchmark/src/backend/index/bftree/spherical.rs Implements bf_tree spherical quantization static + streaming benchmarks.
diskann-benchmark/example/graph-index-bftree.json Example config for bf_tree full-precision static build/search.
diskann-benchmark/example/graph-index-bftree-stream.json Example config for bf_tree full-precision streaming runbook.
diskann-benchmark/example/graph-index-bftree-spherical.json Example config for bf_tree spherical quantization static build/search.
diskann-benchmark/example/graph-index-bftree-spherical-stream.json Example config for bf_tree spherical quantization streaming runbook.
diskann-benchmark/Cargo.toml Adds bf_tree dependency + feature flag.
Cargo.toml Adds diskann-bftree to workspace dependencies.
Cargo.lock Locks in the new dependency edge.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread diskann-bftree/src/provider.rs
Comment thread diskann-bftree/src/provider.rs
Comment thread diskann-bftree/src/neighbors.rs
Comment thread diskann-bftree/src/neighbors.rs
Comment thread diskann-benchmark/Cargo.toml
Comment thread diskann-benchmark/example/graph-index-bftree-stream.json
Comment thread diskann-benchmark/example/graph-index-bftree-spherical-stream.json
Comment thread diskann-benchmark/src/inputs/bftree.rs
Comment thread diskann-benchmark/src/inputs/bftree.rs
Comment thread diskann-benchmark/src/inputs/bftree.rs
@JordanMaples JordanMaples force-pushed the jordanmaples/bftree-benchmark branch from b8cdc3a to 8b0469c Compare May 27, 2026 16:39
@JordanMaples
Copy link
Copy Markdown
Contributor Author

@copilot re-evaluate please

Copy link
Copy Markdown
Contributor

Copilot AI commented May 27, 2026

@copilot re-evaluate please

Re-evaluated: you’re right, those match result arms compile as written. I verified locally with cargo check -p diskann-bftree and did not find a moved-value compile error there, so no code change is needed for that feedback.

JordanMaples and others added 2 commits May 28, 2026 12:52
Add benchmark support for the BfTreeProvider backend, including:
- Full-precision static and streaming benchmarks
- Spherical quantization (1/2/4-bit) static and streaming benchmarks
- Delete cache in BfTreeProvider for inplace_delete graph repair
- Stack buffer optimization for neighbor serialization
- Early validation of max_degree vs buffer capacity
- Runtime num_bits validation (1/2/4) for spherical inputs
- ADDING_PROVIDERS.md guide for wiring new backends

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@JordanMaples JordanMaples force-pushed the jordanmaples/bftree-benchmark branch from 96b5580 to dc64d96 Compare May 28, 2026 21:51
@JordanMaples JordanMaples force-pushed the jordanmaples/bftree-benchmark branch from 0461814 to cfddda1 Compare May 29, 2026 16:53
Copy link
Copy Markdown
Contributor

@hildebrandmw hildebrandmw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jordan, this is looking good!

}

/// Construct a type-erased `Poly<dyn Quantizer>` for the given NBITS.
macro_rules! make_quantizer_poly {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This need not be a macro:

fn new_quantizer<const NBITS: usize>(
    quantizer: SphericalQuantizer,
) -> Result<Poly<dyn Quantizer>, AllocatorError>
where
    spherical_iface::Impl<NBITS>: spherical_iface::Constructible + Quantizer,
{
    let imp = spherical_iface::Impl::<NBITS>::new(quantizer)?;
    diskann_quantization::poly!(Quantizer, imp, GlobalAllocator)
}

1 => make_quantizer_poly!(1, quantizer),
2 => make_quantizer_poly!(2, quantizer),
4 => make_quantizer_poly!(4, quantizer),
n => anyhow::bail!("Unsupported num_bits: {n}"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add this check to try_match? The goal of try_match is to catch input issues like this as early as possible, so a user doesn't queue up a list of benchmarks just to have them fail mid-run.

I guess this is added as part of input validation, so maybe not entirely needed, but is still a nice safety mechanism?

// During `inplace_delete`, the algorithm first removes vectors from storage via
// `Delete::delete`, then prunes the neighbors of the deleted node which requires
// reading the deleted vector back through the accessor. We cache vectors here
// before deletion so the accessor can fall back to the cache on a `Transient` error.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think is the right way to handle this. The issue is multi-insert wanting to access the adjacency list, right? I think there are several better options:

  1. In bf-tree, we can delete the data, but not the adjacency lists. That way inplace delete will continue to work. This is a local fix.
  2. Adjust the semantics of inplace delete or how it's called in benchmarks to actually delete the vector only after inplace delete runs so the algorithm still has adjacency list information?

In general, if we see a mismatch like this, we should focus on not working around the problem.

/// Stack buffer size for neighbor serialization. Supports up to 512 neighbors
/// with u32 IDs (512 * 4 = 2048 bytes). Any practical ANN workload uses
/// max_degree well below this limit.
const MAX_NEIGHBOR_BUF_BYTES: usize = 2048;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a huge stack allocation. Is this showing up in profiles?

We already create auxiliary accessors for neighbor accesses - can't we use a Vec inside those as the scratch space? The accessors are already reused across calls and so we can amortize the Vec allocation.

| `src/inputs/<provider>.rs` | Input structs (JSON schema), `Display`, `Checker`, `Example` impls |
| `src/backend/index/<provider>/mod.rs` | Module root, `register_benchmarks()`, shared helpers |
| `src/backend/index/<provider>/full_precision.rs` | Static + streaming FP benchmarks |
| `src/backend/index/<provider>/spherical.rs` | Quantized benchmarks (if applicable) |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these seem pretty prescriptive for BF-tree in particular? Can this be trimmed down to more general instructions? Or better yet, folded into the README?

write_field!(f, "cb_size_byte", self.cb_size_byte)?;
write_field!(f, "leaf_page_size", self.leaf_page_size)?;
if let Some(v) = self.cb_max_record_size {
write_field!(f, "cb_max_record_size", v)?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along with my other comment, we should at least record what the default values are, even if they aren't explicitly set by the user (though they probably should be).

build: IndexBuild,
search_phase: SearchPhase,
#[serde(default)]
vector_store_config: Option<BfTreeStoreConfig>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really think it's a good idea to not use defaults here. It's more verbose, but also means we have more information.


/// Construct an empty stats entry (e.g., when maintenance is skipped).
#[cfg(feature = "bftree")]
pub(crate) fn empty(kind: Cow<'static, str>) -> Self {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to avoid this kind of empty representation? The generic stats was written in a way that was intentionally pretty hard to construct with out data, and Rust already has a way of representing optional data with Option. I get that this is to reuse StreamStats, but maybe there's a better way?

))?);

let max_points =
((max_points_arg as f32) * (1.0 + 2.0 * consolidate_threshold)).ceil() as usize;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this exists to work around limitations in delete tracking for the inmem index. Since bf-tree is using "hard deletes" instead, I think we can potentially avoid needing this extra overhead. Happy to talk about it offline.

use diskann::graph::InplaceDeleteMethod;
use diskann::utils::ONE;
use diskann_benchmark_core::{recall::Rows, streaming::executors::bigann};
use diskann_utils::views::MatrixView;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to either group all imports together at the top or split this into two files?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Diskann-benchmark should support Bf-tree and other providers

5 participants