Reduce the Capacity template for IVF-Flat search #1681

lowener · 2026-01-07T18:13:06Z

Currently the Capacity template goes from 1 to 256 by power of 2.
By changing it to power of 4 from 1 to 256, we can reduce the size of libcuvs from 157 Mb to 146 Mb (11 Mb or 7% reduction).

After some tests on mnist-784-euclidean, across multiple topk and a nprobe of 1 or 5, the impact on the throughput would be around 4%. The measurements are noisy as the power-of-4 version is sometimes faster than the base version. The benchmarks are reproducible by running the script present in the first commit of the PR.

Topk	N-Probes	QPS base	QPS power of 4	Pow-of-4 over Base
1	1	341,646	300,844	88%
1	5	269,131	257,179	96%
2	1	328,880	293,591	89%
2	5	224,674	264,695	118%
4	1	308,350	296,900	96%
4	5	227,393	220,282	97%
5	1	340,225	296,276	87%
5	5	296,486	278,676	94%
10	1	301,967	308,025	102%
10	5	234,487	286,652	122%
20	1	335,355	311,835	93%
20	5	231,498	256,806	111%
50	1	336,700	310,101	92%
50	5	293,545	241,445	82%
100	1	337,883	277,521	82%
100	5	227,633	223,234	98%
--------	--------	-------	-------	-------
Average	--------	-------	-------	96%

Signed-off-by: Mickael Ide <mide@nvidia.com>

seunghwak · 2026-01-07T22:40:07Z

cpp/src/neighbors/ivf_flat/ivf_flat_interleaved_scan.cuh

                              rmm::cuda_stream_view stream)
 {
-  const int capacity = raft::bound_by_power_of_two(k);
+  const int capacity = bound_by_power_of_four(k);


Have you compared the binary size reduction of this approach vs converting capacity to a run-time parameter?

Maybe for the first step, we may try to just estimate the potential size reduction without worrying too much about performance or even correctness.

In the kernel function (https://github.com/rapidsai/cuvs/blob/main/cpp/src/neighbors/ivf_flat/ivf_flat_interleaved_scan.cuh#L810)

Capacity is mainly used in two places.

https://github.com/rapidsai/cuvs/blob/main/cpp/src/neighbors/ivf_flat/ivf_flat_interleaved_scan.cuh#L828
=> Just replacing constexpr to const will be sufficient for initial best case estimate.

https://github.com/rapidsai/cuvs/blob/main/cpp/src/neighbors/ivf_flat/ivf_flat_interleaved_scan.cuh#L852
=> Looks more involved (need to dig into the internals of block_sort_t), but for the initial estimate, we may just set Capacity here to an arbitrary value (e.g. 4) to just quickly get an idea about the upper limit in binary size reduction.

If the size you get with this approach is significantly smaller, then it might be worth further investigation. If the size reduction is comparable or even less, yeah, better not bother.

cjnolet · 2026-01-14T21:38:49Z

we can reduce the size of libcuvs from 157 Mb to 146 Mb (11 Mb or 7% reduction).

@lowener is this per architecture? Any idea what the savings is for the binary when all architectures are compiled?

lowener added 3 commits December 5, 2025 08:17

Add benchmark script

1f14ae5

Signed-off-by: Mickael Ide <mide@nvidia.com>

Remove bench script Update copyright

4088059

Use pow-of-4

71a6508

lowener requested a review from a team as a code owner January 7, 2026 18:13

github-project-automation bot moved this to Todo in Vector Search, ML, & Data Mining Release Board Jan 7, 2026

github-project-automation bot added this to Vector Search, ML, & Data Mining Release Board Jan 7, 2026

divyegala requested review from cjnolet and divyegala January 7, 2026 18:38

seunghwak reviewed Jan 7, 2026

View reviewed changes

Merge branch 'main' into 26.02-flat-kernel

bdc1efb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce the Capacity template for IVF-Flat search #1681

Reduce the Capacity template for IVF-Flat search #1681

lowener commented Jan 7, 2026 •

edited

Loading

Uh oh!

seunghwak Jan 7, 2026

Uh oh!

cjnolet commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Reduce the Capacity template for IVF-Flat search #1681

Are you sure you want to change the base?

Reduce the Capacity template for IVF-Flat search #1681

Conversation

lowener commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seunghwak Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

cjnolet commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lowener commented Jan 7, 2026 •

edited

Loading