Ensure range results are sorted after the second round.#1114
Ensure range results are sorted after the second round.#1114hildebrandmw wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Range search results from the two-round path were returned in discovery order rather than sorted by distance. This PR sorts scratch.in_range after range_search_internal so range search output is consistently ordered by nondecreasing distance, and refreshes the baseline JSON for the only test that exercises the second-round path.
Changes:
- Sort
scratch.in_rangeafter the second range-search round inRange::search. - Update the
two_round_searchbaseline to reflect the new sorted order.
Reviewed changes
Copilot reviewed 1 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| diskann/src/graph/search/range_search.rs | Adds sort_unstable() on scratch.in_range after range_search_internal to guarantee sorted output. |
| diskann/test/generated/graph/test/cases/range_search/two_round_search.json | Regenerated baseline reflecting sorted-by-distance results. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1114 +/- ##
=======================================
Coverage 89.46% 89.46%
=======================================
Files 482 482
Lines 91092 91093 +1
=======================================
+ Hits 81497 81500 +3
+ Misses 9595 9593 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
When this code was originally written, leaving the results unsorted was intentional, for a couple of reasons. The first is that it just doesn't matter for correctness, and unlike top-k search we don't get the results in sorted order for free. Furthermore, the number of results within the range is not bounded, so the sort could potentially be very expensive. I am open to the argument that callers might expect the results to be in sorted order and be confused when they are not, but is that better to fix with documentation? |
When performing range expansion, nodes in the range are not added in order. This can lead to the final result of range search not being sorted by distance, which I think is a little surprising.
This PR fixes that by sorting the
in_rangebuffer after the second search round. The out-of-order-ness can be seen in the old baseline, which this PR updates to the new sorted order.