Prevent nested parallelism in HNSW bench#1895
Prevent nested parallelism in HNSW bench#1895julianmi wants to merge 4 commits intorapidsai:mainfrom
Conversation
- Setting the gbench number of threads and the HNSWlib config number of threads can lead to nested parallelism. Force either throughput mode using multiple gbench threads or latency mode using batch paralleism. - Added a check in `search` method to handle single query batch size efficiently. There is a significant overhead in going throught he thread pool.
|
To answer @aamijar In the latency mode, gbench measures how long does it take to execute a single search call for the given algorithm and batch size. In this mode, gbench is always single-threaded. To make the use of the whole CPU, HNSW has its own threading logic. This makes the HNSW measures more realistic and fair against GPU algorithms. In the throughput mode, gbench measures how many requests can the given algorithm serve per second. Thus, gbench provides independent threads to do the search calls. This clashes with the internal HNSW threading. Because gbench creates its threads and manages batching outside the measured benchmark loop, the performance of HNSW generally looks better with gbench threads than with the internal threads. Hence we just disable internal batching completely in the throughput mode. |
Setting the gbench number of threads and the HNSWlib config number of threads can lead to nested parallelism. This patch proposes to either use throughput mode using multiple gbench threads or latency mode using batch parallelism. Additionally, there is a significant overhead in going through the thread pool. It is skipped in the
searchmethod to handle single query batch size efficiently.