Skip to content

Add interface to do prefetch on KnnVectorValues and an example implementation to use prefetch in Scorer#15722

Open
navneet1v wants to merge 1 commit intoapache:mainfrom
navneet1v:main
Open

Add interface to do prefetch on KnnVectorValues and an example implementation to use prefetch in Scorer#15722
navneet1v wants to merge 1 commit intoapache:mainfrom
navneet1v:main

Conversation

@navneet1v
Copy link
Contributor

@navneet1v navneet1v commented Feb 18, 2026

Description

This change add the ability to do prefetch for KnnVectorValues. Along with the change, there is a sample implementation of how to add create a PrefetchableFlatVectorScorer that can wrap any Scorer and do prefetch before doing any bulk scoring. Places where this prefetch can be useful:

  1. When there is memory contension.
  2. If underline storage is slow say remote store and calling prefetch can start downloading vectors in cache and then RAM.
  3. The current HNSW based search is not impacted since the prefetch interfaces are not called

The changes include:

  1. New prefetch interface on KNNVectorValues.
  2. Only prefetch if numOfOrds > 1

Issue

#15286

@navneet1v
Copy link
Contributor Author

@mikemccand , @vigyasharma can you please take a look and let me know what you think of in terms of interfaces. Because another approach I had in mind was to create another type of scorer called as Prefetchable Scorer that different systems can wrap if they want to do prefetch.

@navneet1v navneet1v force-pushed the main branch 4 times, most recently from 1300834 to f27d3a3 Compare February 19, 2026 09:03
@navneet1v navneet1v requested a review from mccullocht February 19, 2026 09:03
@benwtrent
Copy link
Member

I don't mind the interface. But it does seem that the "scorer.bulkScore" should "just do the right thing" and prefetch if necessary.

I think this will harm the "preferred path", which is the one where all vectors are in MMAP'd space. We need to benchmark with quantized vectors, where the vector ops during traversal are magnitudes cheaper, not just floating point.

In the aggressively nasty path, I am not sure prefetching 16-32 vectors at a time will give us much as when it comes to modern IO throughput numbers, that is barely scratching the surface (4kb * 16 if using raw vectors, which nobody should be using raw vectors anymore...way lower memory numbers if quantized...). If we are prefetching during graph traversal due to not having enough memory, I think we need to prefetch WAY more aggressively. Like, prefetching multiple candidate neighbors in the future (even ones we won't end up scoring :/).

All in all, I think if we want HNSW to act nice in low memory, we need to:

  • invest more in the Bipartite ordering of vectors so that neighbors are near each other on disk
  • Prefetch larger blocks hoping to get many vectors we care about besides the immediate ordinals.

@navneet1v
Copy link
Contributor Author

We need to benchmark with quantized vectors, where the vector ops during traversal are magnitudes cheaper, not just floating point.

I am working on the benchmarks.

Prefetch larger blocks hoping to get many vectors we care about besides the immediate ordinals.

This is basic implementation I am thinking to combine the few ordinals in a block say 128KB(configurable) to do prefetch.

Will it be better if I remove the prefetch from scorer and graph searcher and move it to only KNNVectorValues. So that at-least for rescoring or for their own scorer prefetch can be used.

@mccullocht
Copy link
Contributor

I don't mind the interface. But it does seem that the "scorer.bulkScore" should "just do the right thing" and prefetch if necessary.

Agreed, I think we'd like prefetching to be internal to the scorer.

I think this will harm the "preferred path", which is the one where all vectors are in MMAP'd space. We need to benchmark with quantized vectors, where the vector ops during traversal are magnitudes cheaper, not just floating point.

Is this ultimately a result of the behavior of mmap input prefetch()? The currently implementation does a synchronous madvise() syscall which I would expect to harm performance when everything is in memory and may be worse than doing a read when the data has spilled to storage.

If we are prefetching during graph traversal due to not having enough memory, I think we need to prefetch WAY more aggressively. Like, prefetching multiple candidate neighbors in the future (even ones we won't end up scoring :/).

Popping multiple candidates in the graph searcher like this is the suggested approach for parallelizing IO in the DiskANN paper. This would be a another query parameter unless we can reliably detect low memory conditions which is challenging if you let mmap manage caching :/.

@navneet1v
Copy link
Contributor Author

navneet1v commented Feb 19, 2026

I don't mind the interface. But it does seem that the "scorer.bulkScore" should "just do the right thing" and prefetch if necessary.

Agreed, I think we'd like prefetching to be internal to the scorer.

Thanks @benwtrent and @mccullocht for your input. I don't have concerns in doing prefetch inside the scorer.

How does this plan sound?

  1. Raise a PR where we open up the interfaces on KNNVectorValues for Prefetch.
  2. Another PR: Add a new scorer that can do prefetch in the bulkScore function.

Can we consider this approach?

The currently implementation does a synchronous madvise() syscall which I would expect to harm performance when everything is in memory and may be worse than doing a read when the data has spilled to storage.

@mccullocht it is async. We make the call and then immediately return. Is your comment mainly that the call is in sync path rather than in a separate thread?

This would be a another query parameter unless we can reliably detect low memory conditions which is challenging if you let mmap manage caching :/.

I think we can leave that to the application, and if we provide an implementation like PrefetchableScorer of RandomScorer, then during search an application can choose to pick a scorer based as per their needs. Is that something we can consider?

@mccullocht
Copy link
Contributor

The currently implementation does a synchronous madvise() syscall which I would expect to harm performance when everything is in memory and may be worse than doing a read when the data has spilled to storage.

@mccullocht it is async. We make the call and then immediately return. Is your comment mainly that the call is in sync path rather than in a separate thread?

Syscalls are relatively expensive. If the data is already in memory, adding a syscall for every vector is going to be noticeable. I believe this is what Ben means when he says it will harm the "preferred path". I'm not sure if it would make a difference to run the syscall in the background. In any case I'm guessing you'll see this in benchmarks when the data set is small enough to fit in RAM.

I think we can leave that to the application, and if we provide an implementation like PrefetchableScorer of RandomScorer, then during search an application can choose to pick a scorer based as per their needs. Is that something we can consider?

Indicating this preference has to fit through the interface in KnnVectorsReader.search() to override or wrap the vector scorer. I don't have any ideas that are really clean or obvious. Maybe the collector could contain a function that wraps the scorer?

@navneet1v
Copy link
Contributor Author

Hi @mccullocht and @benwtrent

I took another pass at the code where I cleaned up the prefetch logic from the HNSWGraphSearcher and only kept the interfaces at KNNVectorValues related to prefetch.

I have also added an example implementation of how to do prefetch (PrefetchableFlatVectorsScorer). This should now keep things clean and separated. Please let me know what are your thoughts on this.

@navneet1v navneet1v changed the title Add the ability to do prefetch and do prefetch during vector search Add interface to do prefetch on KnnVectorValues and an example implementation to use prefetch in Scorer Feb 20, 2026
…entation to use prefetch in Scorer

Signed-off-by: Navneet Verma <navneev@amazon.com>
@navneet1v
Copy link
Contributor Author

@mccullocht , @benwtrent can I get a review on this PR? Since I have moved out prefetching logic outside. There should be no impact on current HNSW algorithm

Comment on lines +109 to +112
static class PrefetchableRandomVectorScorer
extends RandomVectorScorer.AbstractRandomVectorScorer {

private final RandomVectorScorer.AbstractRandomVectorScorer randomVectorScorer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were some earlier issues discussing the correct behavior for wrapper classes when the underlying abstract class might change. I think there might be some value in adding a unit test for this class similar to TestFilterWeight that uses reflection to make sure that all methods are delegated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point let me try to add a test for this.

Comment on lines +92 to +100
@Override
public UpdateableRandomVectorScorer scorer() throws IOException {
return this.randomVectorScorerSupplier.scorer();
}

@Override
public RandomVectorScorerSupplier copy() throws IOException {
return new PrefetchableRandomVectorScorerSupplier(randomVectorScorerSupplier.copy());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this PrefetchableRandomVectorScorerSupplier actually do anything? I don't see it producing a PrefetchableRandomVectorScorer.

Was the scorer() method supposed to do that? (But that means that PrefetchableRandomVectorScorer needs to implement UpdateableRandomVectorScorer -- maybe that's not a problem?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msfroh I think you pointed out correct thing. I missed it. But when I looked at it little bit more details what I have found is my intension while making the change was I want to do prefetch for search and not specifically for indexing. And as per read of the code

public UpdateableRandomVectorScorer scorer() throws IOException {
      return this.randomVectorScorerSupplier.scorer();
    }

this I am not returning Prefetchable implementation of UpdateableRandomVectorScorer since I don't want prefetching on indexing as of now. But let me add it. I don't see a harm here. :) Thank you for this. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants