Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 49 additions & 17 deletions docs/source/cuvs_bench/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -106,24 +106,36 @@ Running the benchmarks
End-to-end: smaller-scale benchmarks (<1M to 10M)
-------------------------------------------------

The steps below demonstrate how to download, install, and run benchmarks on a subset of 10M vectors from the Yandex Deep-1B dataset By default the datasets will be stored and used from the folder indicated by the `RAPIDS_DATASET_ROOT_DIR` environment variable if defined, otherwise a datasets sub-folder from where the script is being called:
The steps below demonstrate how to download, install, and run benchmarks on a subset of 10M vectors from the Yandex Deep-1B dataset. By default the datasets will be stored and used from the folder indicated by the `RAPIDS_DATASET_ROOT_DIR` environment variable if defined, otherwise a datasets sub-folder from where the script is being called.

.. code-block:: bash


# (1) prepare dataset.
# (1) Prepare dataset.
python -m cuvs_bench.get_dataset --dataset deep-image-96-angular --normalize

# (2) build and search index
python -m cuvs_bench.run --dataset deep-image-96-inner --algorithms cuvs_cagra --batch-size 10 -k 10
.. code-block:: python

# (2) Build and search index.
from cuvs_bench.orchestrator import BenchmarkOrchestrator

orchestrator = BenchmarkOrchestrator(backend_type="cpp_gbench")
results = orchestrator.run_benchmark(
dataset="deep-image-96-inner",
algorithms="cuvs_cagra",
count=10,
batch_size=10,
build=True,
search=True,
)

# (3) export data
.. code-block:: bash

# (3) Export data.
python -m cuvs_bench.run --data-export --dataset deep-image-96-inner

# (4) plot results
# (4) Plot results.
python -m cuvs_bench.plot --dataset deep-image-96-inner


.. list-table::

* - Dataset name
Expand Down Expand Up @@ -192,19 +204,33 @@ The steps below demonstrate how to download, install, and run benchmarks on a su
.. code-block:: bash

mkdir -p datasets/deep-1B
# (1) prepare dataset
# (1) Prepare dataset.
# download manually "Ground Truth" file of "Yandex DEEP"
# suppose the file name is deep_new_groundtruth.public.10K.bin
python -m cuvs_bench.split_groundtruth --groundtruth datasets/deep-1B/deep_new_groundtruth.public.10K.bin
# two files 'groundtruth.neighbors.ibin' and 'groundtruth.distances.fbin' should be produced

# (2) build and search index
python -m cuvs_bench.run --dataset deep-1B --algorithms cuvs_cagra --batch-size 10 -k 10
.. code-block:: python

# (3) export data
# (2) Build and search index.
from cuvs_bench.orchestrator import BenchmarkOrchestrator

orchestrator = BenchmarkOrchestrator(backend_type="cpp_gbench")
results = orchestrator.run_benchmark(
dataset="deep-1B",
algorithms="cuvs_cagra",
count=10,
batch_size=10,
build=True,
search=True,
)

.. code-block:: bash

# (3) Export data.
python -m cuvs_bench.run --data-export --dataset deep-1B

# (4) plot results
# (4) Plot results.
python -m cuvs_bench.plot --dataset deep-1B

The usage of `python -m cuvs_bench.split_groundtruth` is:
Expand Down Expand Up @@ -414,7 +440,7 @@ Creating and customizing dataset configurations

A single configuration will often define a set of algorithms, with associated index and search parameters, that can be generalize across datasets. We use YAML to define dataset specific and algorithm specific configurations.

A default `datasets.yaml` is provided by CUVS in `${CUVS_HOME}/python/cuvs_bench/src/cuvs_bench/run/conf` with configurations available for several datasets. Here's a simple example entry for the `sift-128-euclidean` dataset:
A default `datasets.yaml` is provided by CUVS in `${CUVS_HOME}/python/cuvs_bench/cuvs_bench/config/datasets/datasets.yaml` with configurations available for several datasets. Here's a simple example entry for the `sift-128-euclidean` dataset:

.. code-block:: yaml

Expand All @@ -430,6 +456,9 @@ Configuration files for ANN algorithms supported by `cuvs-bench` are provided in
.. code-block:: yaml

name: cuvs_cagra
constraints:
build: cuvs_bench.config.algos.constraints.cuvs_cagra_build
search: cuvs_bench.config.algos.constraints.cuvs_cagra_search
groups:
base:
build:
Expand All @@ -447,9 +476,11 @@ Configuration files for ANN algorithms supported by `cuvs-bench` are provided in

The default parameters for which the benchmarks are run can be overridden by creating a custom YAML file for algorithms with a `base` group.

There config above has 2 fields:
1. `name` - define the name of the algorithm for which the parameters are being specified.
2. `groups` - define a run group which has a particular set of parameters. Each group helps create a cross-product of all hyper-parameter fields for `build` and `search`.
The config above has 3 fields:

1. `name` - The name of the algorithm for which the parameters are being specified.
2. `constraints` - Optional. Python import paths to functions that validate build and search parameter combinations (e.g. ``cuvs_bench.config.algos.constraints.cuvs_cagra_build``). Each function returns ``True`` if the parameters are valid, ``False`` otherwise; invalid combinations are skipped and not benchmarked.
3. `groups` - Run groups, each with a set of parameters. Each group defines a cross-product of all hyper-parameter fields for `build` and `search`.

The table below contains all algorithms supported by cuVS. Each unique algorithm will have its own set of `build` and `search` settings. The :doc:`ANN Algorithm Parameter Tuning Guide <param_tuning>` contains detailed instructions on choosing build and search parameters for each supported algorithm.

Expand Down Expand Up @@ -626,4 +657,5 @@ Add a new entry to `algos.yaml` to map the name of the algorithm to its binary e
build.rst
datasets.rst
param_tuning.rst
pluggable_backend.rst
wiki_all_dataset.rst
32 changes: 32 additions & 0 deletions docs/source/cuvs_bench/param_tuning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,38 @@ cuVS Bench Parameter Tuning Guide

This guide outlines the various parameter settings that can be specified in :doc:`cuVS Benchmarks <index>` yaml configuration files and explains the impact they have on corresponding algorithms to help inform their settings for benchmarking across desired levels of recall.

Benchmark modes
===============

When you run benchmarks with ``BenchmarkOrchestrator.run_benchmark()``, you can choose how parameters are explored:

**Sweep mode (default)**

Pass ``mode="sweep"`` or omit ``mode``. The orchestrator builds the full Cartesian product of all build and search parameter lists defined in the algorithm YAML (see :doc:`Creating and customizing dataset configurations <index>`). Every valid combination (after constraint filtering) is run. Use this for exhaustive comparison across the configured parameter grid.

**Tune mode**

Pass ``mode="tune"`` to perform hyperparameter optimization using Optuna instead of running every combination. You must pass:

- **constraints** (dict): The optimization target and optional bounds. One metric must be ``"maximize"`` or ``"minimize"`` (the goal). Others can set hard limits with ``{"min": X}`` or ``{"max": X}``. Examples: ``{"recall": "maximize", "latency": {"max": 10}}`` or ``{"latency": "minimize", "recall": {"min": 0.95}}``.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice!

- **n_trials** (int, optional): Maximum number of Optuna trials (default 100). Ignored in sweep mode.

Example:

.. code-block:: python

results = orchestrator.run_benchmark(
mode="tune",
dataset="deep-image-96-inner",
algorithms="cuvs_cagra",
constraints={"recall": "maximize", "latency": {"max": 5.0}},
n_trials=50,
count=10,
batch_size=10,
)

The parameter tables below describe the build and search knobs that sweep mode varies and that tune mode can optimize.

cuVS Indexes
============

Expand Down
Loading
Loading