Skip to content

Ensure range results are sorted after the second round.#1114

Open
hildebrandmw wants to merge 2 commits into
mainfrom
mhildebr/sorted
Open

Ensure range results are sorted after the second round.#1114
hildebrandmw wants to merge 2 commits into
mainfrom
mhildebr/sorted

Conversation

@hildebrandmw
Copy link
Copy Markdown
Contributor

When performing range expansion, nodes in the range are not added in order. This can lead to the final result of range search not being sorted by distance, which I think is a little surprising.

This PR fixes that by sorting the in_range buffer after the second search round. The out-of-order-ness can be seen in the old baseline, which this PR updates to the new sorted order.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Range search results from the two-round path were returned in discovery order rather than sorted by distance. This PR sorts scratch.in_range after range_search_internal so range search output is consistently ordered by nondecreasing distance, and refreshes the baseline JSON for the only test that exercises the second-round path.

Changes:

  • Sort scratch.in_range after the second range-search round in Range::search.
  • Update the two_round_search baseline to reflect the new sorted order.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated no comments.

File Description
diskann/src/graph/search/range_search.rs Adds sort_unstable() on scratch.in_range after range_search_internal to guarantee sorted output.
diskann/test/generated/graph/test/cases/range_search/two_round_search.json Regenerated baseline reflecting sorted-by-distance results.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.46%. Comparing base (f8bbf3e) to head (2c3dec8).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1114   +/-   ##
=======================================
  Coverage   89.46%   89.46%           
=======================================
  Files         482      482           
  Lines       91092    91093    +1     
=======================================
+ Hits        81497    81500    +3     
+ Misses       9595     9593    -2     
Flag Coverage Δ
miri 89.46% <100.00%> (+<0.01%) ⬆️
unittests 89.11% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann/src/graph/search/range_search.rs 97.83% <100.00%> (+<0.01%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@magdalendobson
Copy link
Copy Markdown
Contributor

When this code was originally written, leaving the results unsorted was intentional, for a couple of reasons. The first is that it just doesn't matter for correctness, and unlike top-k search we don't get the results in sorted order for free. Furthermore, the number of results within the range is not bounded, so the sort could potentially be very expensive.

I am open to the argument that callers might expect the results to be in sorted order and be confused when they are not, but is that better to fix with documentation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants