Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -62,9 +62,11 @@ diskann-providers = { path = "diskann-providers", default-features = false, vers
diskann-disk = { path = "diskann-disk", version = "0.53.0" }
diskann-label-filter = { path = "diskann-label-filter", version = "0.53.0" }
# Infra
# Infra
diskann-benchmark-runner = { path = "diskann-benchmark-runner", version = "0.53.0" }
diskann-benchmark-core = { path = "diskann-benchmark-core", version = "0.53.0" }
diskann-tools = { path = "diskann-tools", version = "0.53.0" }
diskann-bftree = {path = "diskann-bftree", version = "0.53.0" }

# External dependencies (shared versions)
anyhow = "1.0.98"
Expand Down
4 changes: 4 additions & 0 deletions diskann-benchmark/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ diskann-wide.workspace = true
diskann-label-filter.workspace = true
diskann-tools = { workspace = true }
diskann-disk = { workspace = true, optional = true }
diskann-bftree = { workspace = true, optional = true }
cfg-if.workspace = true
diskann-benchmark-runner = { workspace = true }
opentelemetry = { workspace = true, optional = true }
Expand Down Expand Up @@ -63,6 +64,9 @@ scalar-quantization = []
# Enable minmax-quantization based algorithms
minmax-quantization = []

# Enable bftree backend
bftree = ["dep:diskann-bftree"]

Comment thread
JordanMaples marked this conversation as resolved.
# Enable Disk Index benchmarks
disk-index = [
"diskann-disk/perf_test",
Expand Down
58 changes: 58 additions & 0 deletions diskann-benchmark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -505,3 +505,61 @@ error reporting in the event of a dispatch fail much easier for the user to unde

Refer to implementations within the benchmarking framework for what some of this may look like.

### Adding a Storage Provider

When adding an entirely new storage provider (e.g., a new `DiskANNIndex<DP>` backend), use the
bf_tree implementation (`src/backend/index/bftree/`) as a reference.

#### Files to Create

| File | Purpose |
|------|---------|
| `src/inputs/<provider>.rs` | Input structs (JSON schema), `Display`, `Checker`, `Example` impls |
| `src/backend/index/<provider>/mod.rs` | Module root, `register_benchmarks()`, shared helpers |
| `src/backend/index/<provider>/*.rs` | One file per benchmark variant |
| `example/graph-index-<provider>*.json` | Example configs for each variant |

#### Files to Modify

| File | Change |
|------|--------|
| `Cargo.toml` | Add optional dependency on your provider crate |
| `src/inputs/mod.rs` | Feature-gated `pub(crate) mod <provider>` |
| `src/backend/index/mod.rs` | Feature-gated `mod <provider>` + call `register_benchmarks()` |

#### Checklist

**Input structs:**
- Define input structs with all fields your provider needs
- Consider reusing shared types from `graph_index` where they fit — but only include fields your provider actually uses
- Create separate structs for static vs streaming variants
- Streaming struct includes `DynamicRunbookParams`
- Implement `validate()` for path resolution and sanity checks

**Static benchmark:**
- Implement `Benchmark` trait (see above for the full trait walkthrough)
- `try_match` should reject unsupported configurations early
- Implement `QueryType` for your provider type (associates the vector element type)

**Streaming benchmark:**
- Implement `ManagedStream<T>` on a stream struct:
- `search` — run KNN with ground truth comparison
- `insert` / `replace` — insert vectors at given slots
- `delete` — delete vectors at given slots
- `maintain` — provider-specific maintenance (cache clearing, consolidation, etc.)
- Wrap in `Managed<T, StreamStats>` (handles slot management, GT translation, maintenance scheduling)
- Implement the `Benchmark` trait for the streaming entry point

**Registration:**
- Choose descriptive tag strings (e.g., `graph-index-<provider>-full-precision-f32`)
- Feature-gate with `#[cfg(feature = "...")]`

#### Notes

- `Managed` triggers `maintain()` based on the BigANN runbook's explicit consolidate operations
- `StreamStats` has variants for each operation type
- Matching on literals to dispatch const-generic parameters (e.g., `num_bits`) is fine — it
effectively dispatches to const generics while keeping the `Benchmark` impl monomorphic
- Check IR growth with `cargo llvm-lines --package diskann-benchmark --all-features --release`;
each new `DiskANNIndex<DP>` instantiation adds ~150-300K IR lines

94 changes: 94 additions & 0 deletions diskann-benchmark/example/graph-index-bftree-spherical-stream.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
{
"search_directories": [
"../big-ann-benchmarks/data/MSTuringANNS",
"../big-ann-benchmarks/neurips23/runbooks"
],
Comment thread
JordanMaples marked this conversation as resolved.
"jobs": [
{
"type": "graph-index-stream-bftree-spherical-quantization",
"content": {
"build": {
"data_type": "float32",
"data": "base1b.fbin.crop_nb_1000000",
"distance": "squared_l2",
"max_degree": 64,
"l_build": 128,
"start_point_strategy": "medoid",
"alpha": 1.2,
"backedge_ratio": 1.0,
"num_threads": 4
},
"search_phase": {
"search-type": "topk",
"queries": "query100K.fbin",
"groundtruth": "msturing-gt-1M",
"reps": 1,
"num_threads": [
2
],
"runs": [
{
"search_n": 10,
"search_l": [
32,
200
],
"recall_k": 10
}
]
},
"runbook_params": {
"runbook_path": "simple_runbook.yaml",
"dataset_name": "msturing-1M",
"gt_directory": "1000000/simple_runbook.yaml",
"ip_delete_method": {
"method": "visited_and_top_k",
"params": {
"k_value": 50,
"l_value": 128
}
},
"ip_delete_num_to_replace": 3,
"consolidate_threshold": 0.2
},
"seed": 42,
"transform_kind": "null",
"num_bits": 2,
"pre_scale": "reciprocal_mean_norm",
"vector_store_config": {
"cb_size_byte": 4294967296,
"leaf_page_size": 4096,
"cb_max_record_size": null,
"cb_min_record_size": null,
"read_promotion_rate": null,
"scan_promotion_rate": null,
"cb_copy_on_access_ratio": null,
"read_record_cache": null,
"cache_only": null
},
"neighbor_store_config": {
"cb_size_byte": 4294967296,
"leaf_page_size": 4096,
"cb_max_record_size": null,
"cb_min_record_size": null,
"read_promotion_rate": null,
"scan_promotion_rate": null,
"cb_copy_on_access_ratio": null,
"read_record_cache": null,
"cache_only": null
},
"quant_store_config": {
"cb_size_byte": 4294967296,
"leaf_page_size": 4096,
"cb_max_record_size": null,
"cb_min_record_size": null,
"read_promotion_rate": null,
"scan_promotion_rate": null,
"cb_copy_on_access_ratio": null,
"read_record_cache": null,
"cache_only": null
}
}
}
]
}
83 changes: 83 additions & 0 deletions diskann-benchmark/example/graph-index-bftree-spherical.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
{
"search_directories": [
"test_data/disk_index_search"
],
"jobs": [
{
"type": "graph-index-build-bftree-spherical-quantization",
"content": {
"build": {
"data_type": "float32",
"data": "disk_index_siftsmall_learn_256pts_data.fbin",
"distance": "squared_l2",
"max_degree": 32,
"l_build": 50,
"insert_retry": null,
"start_point_strategy": "medoid",
"alpha": 1.2,
"backedge_ratio": 1.0,
"num_threads": 1,
"multi_insert": null,
"save_path": null
},
"search_phase": {
"search-type": "topk",
"queries": "disk_index_sample_query_10pts.fbin",
"groundtruth": "disk_index_10pts_idx_uint32_truth_search_res.bin",
"reps": 5,
"num_threads": [
1
],
"runs": [
{
"search_n": 20,
"search_l": [
20,
30,
40
],
"recall_k": 10
}
]
},
"seed": 42,
"transform_kind": "null",
"num_bits": 2,
"pre_scale": "reciprocal_mean_norm",
"vector_store_config": {
"cb_size_byte": 67108864,
"leaf_page_size": 4096,
"cb_max_record_size": null,
"cb_min_record_size": null,
"read_promotion_rate": null,
"scan_promotion_rate": null,
"cb_copy_on_access_ratio": null,
"read_record_cache": null,
"cache_only": null
},
"neighbor_store_config": {
"cb_size_byte": 67108864,
"leaf_page_size": 4096,
"cb_max_record_size": null,
"cb_min_record_size": null,
"read_promotion_rate": null,
"scan_promotion_rate": null,
"cb_copy_on_access_ratio": null,
"read_record_cache": null,
"cache_only": null
},
"quant_store_config": {
"cb_size_byte": 67108864,
"leaf_page_size": 4096,
"cb_max_record_size": null,
"cb_min_record_size": null,
"read_promotion_rate": null,
"scan_promotion_rate": null,
"cb_copy_on_access_ratio": null,
"read_record_cache": null,
"cache_only": null
}
}
}
]
}
79 changes: 79 additions & 0 deletions diskann-benchmark/example/graph-index-bftree-stream.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
{
"search_directories": [
"../big-ann-benchmarks/data/MSTuringANNS",
"../big-ann-benchmarks/neurips23/runbooks"
],
Comment thread
JordanMaples marked this conversation as resolved.
"jobs": [
{
"type": "graph-index-stream-bftree-full-precision",
"content": {
"build": {
"data_type": "float32",
"data": "base1b.fbin.crop_nb_1000000",
"distance": "squared_l2",
"max_degree": 64,
"l_build": 128,
"start_point_strategy": "medoid",
"alpha": 1.2,
"backedge_ratio": 1.0,
"num_threads": 4
},
"search_phase": {
"search-type": "topk",
"queries": "query100K.fbin",
"groundtruth": "msturing-gt-1M",
"reps": 1,
"num_threads": [
2
],
"runs": [
{
"search_n": 10,
"search_l": [
32,
200
],
"recall_k": 10
}
]
},
"runbook_params": {
"runbook_path": "simple_runbook.yaml",
"dataset_name": "msturing-1M",
"gt_directory": "1000000/simple_runbook.yaml",
"ip_delete_method": {
"method": "visited_and_top_k",
"params": {
"k_value": 50,
"l_value": 128
}
},
"ip_delete_num_to_replace": 3,
"consolidate_threshold": 0.2
},
"vector_store_config": {
"cb_size_byte": 4294967296,
"leaf_page_size": 4096,
"cb_max_record_size": null,
"cb_min_record_size": null,
"read_promotion_rate": null,
"scan_promotion_rate": null,
"cb_copy_on_access_ratio": null,
"read_record_cache": null,
"cache_only": null
},
"neighbor_store_config": {
"cb_size_byte": 4294967296,
"leaf_page_size": 4096,
"cb_max_record_size": null,
"cb_min_record_size": null,
"read_promotion_rate": null,
"scan_promotion_rate": null,
"cb_copy_on_access_ratio": null,
"read_record_cache": null,
"cache_only": null
}
}
}
]
}
Loading
Loading