Skip to content

Prevent nested parallelism in HNSW bench#1895

Open
julianmi wants to merge 4 commits intorapidsai:mainfrom
julianmi:hnswlib-bench-threading
Open

Prevent nested parallelism in HNSW bench#1895
julianmi wants to merge 4 commits intorapidsai:mainfrom
julianmi:hnswlib-bench-threading

Conversation

@julianmi
Copy link
Contributor

@julianmi julianmi commented Mar 9, 2026

Setting the gbench number of threads and the HNSWlib config number of threads can lead to nested parallelism. This patch proposes to either use throughput mode using multiple gbench threads or latency mode using batch parallelism. Additionally, there is a significant overhead in going through the thread pool. It is skipped in the search method to handle single query batch size efficiently.

- Setting the gbench number of threads and the HNSWlib config number of threads can lead to nested parallelism. Force either throughput mode using multiple gbench threads or latency mode using batch paralleism.
- Added a check in `search` method to handle single query batch size efficiently. There is a significant overhead in going throught he thread pool.
@julianmi julianmi requested a review from a team as a code owner March 9, 2026 14:18
@aamijar aamijar added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels Mar 9, 2026
Copy link
Member

@aamijar aamijar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @julianmi, what is the UX for using multiple threads in HSNW bench? Does the user set the gbench threads parameter, or the num_threads_ parameter?

@achirkin
Copy link
Contributor

To answer @aamijar

In the latency mode, gbench measures how long does it take to execute a single search call for the given algorithm and batch size. In this mode, gbench is always single-threaded. To make the use of the whole CPU, HNSW has its own threading logic. This makes the HNSW measures more realistic and fair against GPU algorithms.

In the throughput mode, gbench measures how many requests can the given algorithm serve per second. Thus, gbench provides independent threads to do the search calls. This clashes with the internal HNSW threading. Because gbench creates its threads and manages batching outside the measured benchmark loop, the performance of HNSW generally looks better with gbench threads than with the internal threads. Hence we just disable internal batching completely in the throughput mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

Development

Successfully merging this pull request may close these issues.

3 participants