Skip to content

Refactor DiskIndexSearcher::flat_search to use batching#1097

Merged
hildebrandmw merged 3 commits into
mainfrom
mhildebr/exhaustive-2
May 21, 2026
Merged

Refactor DiskIndexSearcher::flat_search to use batching#1097
hildebrandmw merged 3 commits into
mainfrom
mhildebr/exhaustive-2

Conversation

@hildebrandmw
Copy link
Copy Markdown
Contributor

Refactor DiskIndexSearcher::flat_search to use the bulk pq_distances method instead of one-by-one computation. This does two things:

  1. The bulk method is presumably a little more efficient.
  2. This moves the implementation away from Accessor/BuildQueryComputer to help with Simplify the DataProvider contract for graph search #1067.

The small tricky bit is deriving the maximum batch size from the size of scratch. To that end, I added a private accessor for PQScratch to return this bound.

@hildebrandmw hildebrandmw requested review from a team and Copilot May 21, 2026 18:04
@hildebrandmw hildebrandmw changed the title Refactor DiskIndexSearcher::flat_search. Refactor DiskIndexSearcher::flat_search to use batching May 21, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors DiskIndexSearcher::flat_search (disk-based search) to compute PQ distances in batches using the bulk pq_distances path, reducing per-vector overhead and moving the implementation away from Accessor::get_element/BuildQueryComputer to support the direction in #1067.

Changes:

  • Refactor flat_search to scan IDs in chunks and call DiskAccessor::pq_distances per chunk rather than computing PQ distances one-by-one.
  • Add PQScratch::max_vectors() to expose the maximum safe batch size based on scratch capacity.
  • Extend existing PQ scratch tests to validate max_vectors().

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
diskann-disk/src/search/provider/disk_provider.rs Batch flat_search PQ distance computation via pq_distances, deriving batch size from scratch capacity.
diskann-disk/src/search/pq/pq_scratch.rs Add PQScratch::max_vectors() and test coverage for the new accessor.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread diskann-disk/src/search/provider/disk_provider.rs
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 78.94737% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.48%. Comparing base (5443ca0) to head (19a4a3c).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
diskann-disk/src/search/provider/disk_provider.rs 75.00% 4 Missing ⚠️

❌ Your patch status has failed because the patch coverage (78.94%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1097      +/-   ##
==========================================
+ Coverage   89.46%   89.48%   +0.01%     
==========================================
  Files         473      474       +1     
  Lines       89653    89751      +98     
==========================================
+ Hits        80212    80311      +99     
+ Misses       9441     9440       -1     
Flag Coverage Δ
miri 89.48% <78.94%> (+0.01%) ⬆️
unittests 89.12% <78.94%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-disk/src/search/pq/pq_scratch.rs 90.00% <100.00%> (+0.81%) ⬆️
diskann-disk/src/search/provider/disk_provider.rs 90.86% <75.00%> (-0.33%) ⬇️

... and 12 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@arkrishn94 arkrishn94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mark, just had one question about whether there is any change to allocation in the search path.

Comment thread diskann-disk/src/search/provider/disk_provider.rs
@hildebrandmw
Copy link
Copy Markdown
Contributor Author

Thanks Mark, just had one question about whether there is any change to allocation in the search path.

There shouldn't be. If there is, then the normal beam expansion also has an allocation. I believe the PQ scratch is sized such that all the necessary buffers are already allocated. Plus, the scratch is stored in an ObjectPool to further amortize the allocation. Basically, there should be no more allocations than there already are.

@hildebrandmw hildebrandmw merged commit 4f70a82 into main May 21, 2026
24 checks passed
@hildebrandmw hildebrandmw deleted the mhildebr/exhaustive-2 branch May 21, 2026 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants