Centralize shared utilities across benchmark backends#2040
Conversation
…ing across Python-native backends
…nfiguration, gather_algorithm_configs) from CppGBenchConfigLoader to base ConfigLoader class.
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR refactors cuvs-bench's benchmark orchestration layer to support lazy vector loading, template-method config loading, customizable executable discovery, and enhanced CLI modes. It introduces shared backend utilities for dtype inference, vector loading, parameter grid expansion, and recall computation; updates test backends to accept indexes lists; and removes the deprecated ChangesDataset Lazy Loading and Utilities
Config Loading Refactoring
Orchestration and Recall Computation
CLI Enhancement
Test Updates
Documentation
Package Exports
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@python/cuvs_bench/cuvs_bench/backends/utils.py`:
- Around line 57-58: The code currently assigns dtype = _DTYPE_FOR_EXT.get(ext,
np.float32) which silently defaults unknown file extensions to float32; change
this to fail fast by looking up ext in _DTYPE_FOR_EXT and raising a clear
ValueError (including ext and path) when not found instead of defaulting. Update
the logic around ext and dtype (the ext variable and _DTYPE_FOR_EXT lookup) so
unknown suffixes raise an exception with a descriptive message to prevent
corrupted benchmark inputs.
- Around line 60-66: The code that reads binary matrices (reading n_rows/n_cols
and payload) must validate header and payload lengths and ensure subset_size is
non-negative and within bounds: check that f.read(4) calls returned 4 bytes
before converting to uint32 and raise a ValueError if not; validate subset_size
>= 0 and if provided clamp it to n_rows but error if it's > n_rows; compute
expected_bytes = n_rows * n_cols * np.dtype(dtype).itemsize and verify f.read
returns exactly that many bytes (or at least enough for the requested subset)
before calling np.frombuffer, raising a clear ValueError if sizes mismatch;
update the logic around n_rows, n_cols, subset_size, dtype, np.frombuffer and
f.read to use these checks so malformed files produce actionable errors rather
than cryptic crashes.
In `@python/cuvs_bench/cuvs_bench/orchestrator/config_loaders.py`:
- Around line 234-243: The code silently ignores invalid algorithm_configuration
values when they are neither files nor directories; update the block handling
algorithm_configuration (the variable algorithm_configuration that populates
algos_conf_fs) to detect the invalid path and raise a clear exception (e.g.,
raise ValueError) or log-and-raise with the provided path and its type so
callers know the override is invalid instead of silently falling back to
defaults; implement this in the same conditional after the
os.path.isdir/os.path.isfile checks where algos_conf_fs is modified.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 4d1822e3-7adb-4f35-a054-0962f282eec1
📒 Files selected for processing (2)
python/cuvs_bench/cuvs_bench/backends/utils.pypython/cuvs_bench/cuvs_bench/orchestrator/config_loaders.py
| if algorithm_configuration: | ||
| if os.path.isdir(algorithm_configuration): | ||
| algos_conf_fs += [ | ||
| os.path.join(algorithm_configuration, f) | ||
| for f in os.listdir(algorithm_configuration) | ||
| if ".json" not in f | ||
| ] | ||
| elif os.path.isfile(algorithm_configuration): | ||
| algos_conf_fs.append(algorithm_configuration) | ||
| return algos_conf_fs |
There was a problem hiding this comment.
Do not silently ignore invalid algorithm_configuration paths.
Line 234 accepts an override input, but when it is neither a file nor a directory, the code currently falls back silently to defaults.
Proposed fix
if algorithm_configuration:
if os.path.isdir(algorithm_configuration):
algos_conf_fs += [
os.path.join(algorithm_configuration, f)
for f in os.listdir(algorithm_configuration)
if ".json" not in f
]
elif os.path.isfile(algorithm_configuration):
algos_conf_fs.append(algorithm_configuration)
+ else:
+ raise FileNotFoundError(
+ f"algorithm_configuration path does not exist: "
+ f"{algorithm_configuration}"
+ )
return algos_conf_fs🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@python/cuvs_bench/cuvs_bench/orchestrator/config_loaders.py` around lines 234
- 243, The code silently ignores invalid algorithm_configuration values when
they are neither files nor directories; update the block handling
algorithm_configuration (the variable algorithm_configuration that populates
algos_conf_fs) to detect the invalid path and raise a clear exception (e.g.,
raise ValueError) or log-and-raise with the provided path and its type so
callers know the override is invalid instead of silently falling back to
defaults; implement this in the same conditional after the
os.path.isdir/os.path.isfile checks where algos_conf_fs is modified.
…idate header/payload size, warn on invalid algorithm config paths.
…tor loading from file paths on first access
…or load_vectors to use it.
…stry.py to match current BenchmarkBackend abstract interface
…ansparent vector loading, and base ConfigLoader inherited methods
…taset lookup, DatasetConfig construction) and delegate backend-specific logic to abstract _build_benchmark_configs().
…sts for Python-native backends.
… user-specified executable path capability.
…-n-trials, and --backend-config flags.
jrbourbeau
left a comment
There was a problem hiding this comment.
Thanks @jnke2016 for moving these to a shared place. Overall LGTM -- just left a couple minor (nonblocking) comments
| # it was passed in or loaded from disk. | ||
| self._base_vectors = base_vectors if base_vectors is not None else np.empty((0, 0)) | ||
| self._query_vectors = query_vectors if query_vectors is not None else np.empty((0, 0)) | ||
| self._groundtruth_neighbors = groundtruth_neighbors |
There was a problem hiding this comment.
Why is _groundtruth_neighbors treated differently here than _base_vectors and _query_vectors?
There was a problem hiding this comment.
base_vectors and query_vectors default to None in the constructor but are converted to np.empty((0, 0)) internally so that the dims, n_base, and n_queries properties can safely call .shape and .size without None guards. groundtruth_neighbors stays as None because it's optional (not every dataset has ground truth), and those properties don't depend on it.
There was a problem hiding this comment.
However, I can refactor both to match groundtruth_neighbors by defaulting to None instead and introducing a None guard in the dims, n_base, and n_queries properties. This would make all three fields consistent
| if ".json" not in f | ||
| and "constraint" not in f | ||
| and ".py" not in f | ||
| and "__pycache__" not in f |
There was a problem hiding this comment.
Are these supposed to be yaml files? If so, checking for that extension (.yaml + .yml) might be better. Maybe the situation is more complex though
There was a problem hiding this comment.
Yes those are supposed to be yaml file. The exclusion filter was part of the legacy benchmark scripts (run.py) in gather_algorithm_configs(), which lists all files in the config/algos/ directory and filters out non-config files (__init__.py, __pycache__/, constraints/, .json files) to keep only the .yaml algorithm configuration files. It was ported as-is to the base ConfigLoader in the new API. Your suggestion to check for .yaml/.yml inclusion instead is better since it's more explicit and won't accidentally pick up new non-YAML file types that may be added to the directory in the future. I'll update it.
…parently after backend.search() returns
jrbourbeau
left a comment
There was a problem hiding this comment.
@jnke2016 just checking in here. I see this is marked as a draft -- are there other things you'd still like to include in this PR?
|
@jrbourbeau Yes, there is one remaining change (minor): making the parameter expansion transparent so that the orchestrator handles it and no backend needs to implement. This should be pushed later today |
…all backends use the same expansion logic transparently.
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (1)
python/cuvs_bench/cuvs_bench/tests/test_registry.py (1)
32-33: ⚡ Quick winFail fast when
indexesis empty in dummy backend methods.Both test backends currently return successful results even when
indexesis empty, which can hide regressions in caller logic. Raise a clearValueErrorinbuild()andsearch()instead.Proposed patch
class DummyBackend(BenchmarkBackend): @@ def build(self, dataset, indexes, force=False, dry_run=False): - first = indexes[0] if indexes else None + if not indexes: + raise ValueError("Expected at least one IndexConfig in 'indexes', got 0") + first = indexes[0] return BuildResult( - index_path=first.file if first else "", + index_path=first.file, @@ - build_params=first.build_param if first else {}, + build_params=first.build_param, success=True, ) @@ distances = np.random.rand(n_queries, k) - first = indexes[0] if indexes else None + if not indexes: + raise ValueError("Expected at least one IndexConfig in 'indexes', got 0") + first = indexes[0] @@ - search_params=first.search_params if first else [], + search_params=first.search_params, success=True, ) @@ class AnotherDummyBackend(BenchmarkBackend): @@ def build(self, dataset, indexes, force=False, dry_run=False): - first = indexes[0] if indexes else None + if not indexes: + raise ValueError("Expected at least one IndexConfig in 'indexes', got 0") + first = indexes[0] return BuildResult( - index_path=first.file if first else "", + index_path=first.file, @@ - build_params=first.build_param if first else {}, + build_params=first.build_param, success=True, ) @@ distances = np.random.rand(n_queries, k) - first = indexes[0] if indexes else None + if not indexes: + raise ValueError("Expected at least one IndexConfig in 'indexes', got 0") + first = indexes[0] @@ - search_params=first.search_params if first else [], + search_params=first.search_params, success=True, )As per coding guidelines, "Ensure missing validation does not cause crashes on invalid input through proper size/type checks."
Also applies to: 57-57, 79-80, 103-103
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuvs_bench/cuvs_bench/tests/test_registry.py` around lines 32 - 33, The dummy backend methods must fail fast when given an empty indexes list: update the build() and search() methods in the dummy backend class so they validate indexes is non-empty and raise a clear ValueError (e.g., "indexes must not be empty") instead of returning success; do the same validation in the other dummy methods that accept indexes (the other build()/search()-like stubs referenced in the diff) so that calls to build(), search(), and any related dummy backend helpers immediately raise ValueError when indexes is empty.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@python/cuvs_bench/cuvs_bench/backends/base.py`:
- Around line 91-120: The change made base_vectors, query_vectors, and
groundtruth_neighbors read-only, breaking callers that assign to these
attributes; add property setters for base_vectors, query_vectors, and
groundtruth_neighbors that accept a numpy array (or None where appropriate) and
assign to the underlying backing fields (_base_vectors, _query_vectors,
_groundtruth_neighbors) so the lazy-loading logic in the getters stays intact
while preserving assignability; ensure setters validate/convert input types
minimally (e.g., accept np.ndarray or None) and keep existing lazy-load behavior
in the getter methods (base_vectors, query_vectors, groundtruth_neighbors).
In `@python/cuvs_bench/cuvs_bench/backends/utils.py`:
- Around line 149-180: The compute_recall function assumes neighbors and
groundtruth are 2-D arrays with the same number of rows but does not validate
that; add upfront checks in compute_recall to (1) verify neighbors and
groundtruth are numpy ndarrays, (2) verify both have ndim == 2, and (3) verify
groundtruth.shape[0] == n_queries (where n_queries = neighbors.shape[0]); if any
check fails raise a clear ValueError describing the mismatch (mentioning
neighbors.shape and groundtruth.shape) so the benchmark loop fails fast with an
explanatory error rather than raising during indexing.
In `@python/cuvs_bench/cuvs_bench/orchestrator/__init__.py`:
- Line 6: Add a deprecated shim for the old run_benchmark export: import
warnings and import the original run_benchmark from .orchestrator under a
private name (e.g., _run_benchmark), then define a public run_benchmark function
that emits a DeprecationWarning (with a clear message to use
BenchmarkOrchestrator) and forwards all args/kwargs to _run_benchmark; keep
BenchmarkOrchestrator re-export unchanged so both BenchmarkOrchestrator and the
deprecated run_benchmark are available for one release.
In `@python/cuvs_bench/cuvs_bench/orchestrator/config_loaders.py`:
- Around line 142-198: The new _build_benchmark_configs() is currently abstract
which breaks existing ConfigLoader subclasses that only override load(); remove
the `@abstractmethod` decorator and provide a non-abstract default implementation
named _build_benchmark_configs(self, dataset_config, dataset_conf, dataset,
dataset_path, **kwargs) that emits a DeprecationWarning (or uses
warnings.warn(..., DeprecationWarning)) indicating subclasses should implement
the hook and returns an empty list (or reasonable default) so existing
subclasses continue to work until they are migrated; keep the method signature
exactly as in the diff so subclasses that do override it will still be used.
In `@python/cuvs_bench/cuvs_bench/orchestrator/orchestrator.py`:
- Around line 258-268: The recall computation runs even for failed searches and
can raise if a backend returned partial/malformed neighbors; adjust the logic to
check search_result.success before computing or assigning search_result.recall.
Specifically, in the block that inspects search_result.neighbors and
bench_dataset.groundtruth_neighbors, first verify search_result.success is
truthy, then ensure neighbors is non-empty and gt is not None before calling
compute_recall; apply the same change to the analogous block that currently
recomputes recall later in the file (the other
search_result/neighbors/compute_recall block). Ensure you only set
search_result.recall when success is true and compute_recall completes.
In `@python/cuvs_bench/cuvs_bench/run/__main__.py`:
- Around line 264-268: The code currently assumes yaml.safe_load(backend_config)
returns a mapping and silently falls back to "cpp_gbench"; instead validate the
parsed cfg before reading "backend": after yaml.safe_load(f) ensure cfg is a
dict (isinstance(cfg, dict)) and explicitly check that "backend" is present and
of expected type (str); if the file is not a mapping or the "backend" key is
missing/invalid, raise a clear ValueError that states the file must contain a
mapping with a "backend" key and include the actual type/value found; then set
backend_type = cfg.pop("backend") and backend_kwargs = cfg only after validation
(do not silently default to "cpp_gbench").
- Around line 271-278: The call to orchestrator.run_benchmark uses ternaries
(build if build else True, search if search else True) which override explicit
False flags and break build-only/search-only modes; change the arguments to pass
the build and search variables directly (build=build, search=search) so
run_benchmark receives the actual flags passed by the CLI (ensure any earlier
CLI parsing sets these to proper booleans or None as intended) and update any
related defaults where run_benchmark is defined if needed.
In `@python/cuvs_bench/cuvs_bench/tests/test_registry.py`:
- Around line 342-349: The tests use a hardcoded Unix-specific path
"/tmp/test_index" in the IndexConfig instances (see
IndexConfig(name="test_index", ... file="/tmp/test_index")), which triggers Ruff
S108; replace those hardcoded strings with the pytest tmp_path fixture by
constructing the path via tmp_path / "test_index" and passing it to
IndexConfig.file (cast to str if IndexConfig.file expects a string). Update both
occurrences (the block around IndexConfig at the first diff and the similar
block at the later occurrence) to use tmp_path so the tests are portable and
secure.
---
Nitpick comments:
In `@python/cuvs_bench/cuvs_bench/tests/test_registry.py`:
- Around line 32-33: The dummy backend methods must fail fast when given an
empty indexes list: update the build() and search() methods in the dummy backend
class so they validate indexes is non-empty and raise a clear ValueError (e.g.,
"indexes must not be empty") instead of returning success; do the same
validation in the other dummy methods that accept indexes (the other
build()/search()-like stubs referenced in the diff) so that calls to build(),
search(), and any related dummy backend helpers immediately raise ValueError
when indexes is empty.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e85be508-b1ee-4430-9066-bdd9bd4e6376
📒 Files selected for processing (10)
docs/source/cuvs_bench/index.rstpython/cuvs_bench/cuvs_bench/backends/base.pypython/cuvs_bench/cuvs_bench/backends/utils.pypython/cuvs_bench/cuvs_bench/orchestrator/__init__.pypython/cuvs_bench/cuvs_bench/orchestrator/config_loaders.pypython/cuvs_bench/cuvs_bench/orchestrator/orchestrator.pypython/cuvs_bench/cuvs_bench/run/__main__.pypython/cuvs_bench/cuvs_bench/tests/test_cli.pypython/cuvs_bench/cuvs_bench/tests/test_registry.pypython/cuvs_bench/cuvs_bench/tests/test_utils.py
| # | ||
|
|
||
| from .orchestrator import BenchmarkOrchestrator, run_benchmark | ||
| from .orchestrator import BenchmarkOrchestrator |
There was a problem hiding this comment.
Keep a deprecated run_benchmark export for one release.
Dropping this re-export breaks existing from cuvs_bench.orchestrator import run_benchmark imports immediately. Please leave a compatibility shim in place and warn before removing it.
As per coding guidelines, Prevent API breaking changes to public Python interfaces, including removing or renaming public methods/attributes without proper deprecation (minimum one release cycle).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@python/cuvs_bench/cuvs_bench/orchestrator/__init__.py` at line 6, Add a
deprecated shim for the old run_benchmark export: import warnings and import the
original run_benchmark from .orchestrator under a private name (e.g.,
_run_benchmark), then define a public run_benchmark function that emits a
DeprecationWarning (with a clear message to use BenchmarkOrchestrator) and
forwards all args/kwargs to _run_benchmark; keep BenchmarkOrchestrator re-export
unchanged so both BenchmarkOrchestrator and the deprecated run_benchmark are
available for one release.
| def load( | ||
| self, | ||
| dataset: str, | ||
| dataset_path: str, | ||
| **kwargs, | ||
| ) -> Tuple[DatasetConfig, List[BenchmarkConfig]]: | ||
| """ | ||
| Load and prepare benchmark configurations. | ||
|
|
||
| Handles shared config-loading steps (dataset YAML, dataset lookup, | ||
| DatasetConfig construction) then delegates to the backend-specific | ||
| _build_benchmark_configs() for producing BenchmarkConfig objects. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| dataset : str | ||
| Dataset name | ||
| dataset_path : str | ||
| Path to dataset directory | ||
| **kwargs | ||
| Backend-specific arguments passed through to | ||
| _build_benchmark_configs() | ||
|
|
||
| Returns | ||
| ------- | ||
| Tuple[DatasetConfig, List[BenchmarkConfig]] | ||
| Dataset configuration and list of benchmark configurations to run | ||
| """ | ||
| # Shared step 1: Load dataset YAML | ||
| ds_yaml_path = kwargs.get("dataset_configuration") or os.path.join( | ||
| self.config_path, "datasets", "datasets.yaml" | ||
| ) | ||
| ds_conf_all = self.load_yaml_file(ds_yaml_path) | ||
|
|
||
| # Shared step 2: Find dataset by name | ||
| ds_conf = self.get_dataset_configuration(dataset, ds_conf_all) | ||
|
|
||
| # Shared step 3: Construct DatasetConfig with resolved paths | ||
| dataset_config = self.build_dataset_config( | ||
| ds_conf, dataset_path, kwargs.get("subset_size") | ||
| ) | ||
|
|
||
| # Backend-specific step: produce BenchmarkConfig list | ||
| benchmark_configs = self._build_benchmark_configs( | ||
| dataset_config, ds_conf, dataset, dataset_path, **kwargs | ||
| ) | ||
| return dataset_config, benchmark_configs | ||
|
|
||
| @abstractmethod | ||
| def _build_benchmark_configs( | ||
| self, | ||
| dataset_config: DatasetConfig, | ||
| dataset_conf: dict, | ||
| dataset: str, | ||
| dataset_path: str, | ||
| **kwargs, | ||
| ) -> List[BenchmarkConfig]: |
There was a problem hiding this comment.
Don't make _build_benchmark_configs() a required override immediately.
This turns the refactor into a breaking change for existing out-of-tree ConfigLoader subclasses that only override load(): they can no longer be instantiated after this release. Keep the new hook non-abstract for at least one deprecation cycle.
Proposed compatibility-preserving fix
- `@abstractmethod`
def _build_benchmark_configs(
self,
dataset_config: DatasetConfig,
dataset_conf: dict,
dataset: str,
@@
Returns
-------
List[BenchmarkConfig]
List of benchmark configurations to run
"""
- pass
+ raise NotImplementedErrorAs per coding guidelines, Prevent API breaking changes to public Python interfaces, including removing or renaming public methods/attributes without proper deprecation (minimum one release cycle).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@python/cuvs_bench/cuvs_bench/orchestrator/config_loaders.py` around lines 142
- 198, The new _build_benchmark_configs() is currently abstract which breaks
existing ConfigLoader subclasses that only override load(); remove the
`@abstractmethod` decorator and provide a non-abstract default implementation
named _build_benchmark_configs(self, dataset_config, dataset_conf, dataset,
dataset_path, **kwargs) that emits a DeprecationWarning (or uses
warnings.warn(..., DeprecationWarning)) indicating subclasses should implement
the hook and returns an empty list (or reasonable default) so existing
subclasses continue to work until they are migrated; keep the method signature
exactly as in the diff so subclasses that do override it will still be used.
There was a problem hiding this comment.
Nothing in the codebase imports it. The CLI and BenchmarkOrchestrator class cover all use cases.
There was a problem hiding this comment.
@jnke2016, understood — since ConfigLoader is an internal class with no external consumers, and all entry points go through the CLI or BenchmarkOrchestrator, the breaking-change concern doesn't apply here. Thanks for the clarification!
( ˶ᵔ ᵕ ᵔ˶ ) 🐇
…build/search flag passthrough
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@python/cuvs_bench/cuvs_bench/backends/utils.py`:
- Around line 96-110: The code currently only checks subset_size < 1, which
allows non-integer values to slip through and error later; add an explicit type
validation before the numeric comparison: ensure subset_size is an integer
(e.g., isinstance(subset_size, (int, np.integer))) and reject bools (e.g.,
isinstance(subset_size, bool) -> disallow) and raise a clear
TypeError/ValueError if not an integer, then continue to check subset_size >= 1
and use that validated value when computing n_rows; reference the subset_size
variable in utils.py to locate the change.
- Around line 185-192: The compute_recall function currently allows negative k
(it computes gt_k = min(k, groundtruth.shape[1])) which can produce invalid
recall values; add an input validation at the start of compute_recall to reject
non-positive or negative k (e.g., if k < 0) by raising a ValueError with a clear
message that includes the expected range and the actual k, e.g. "k must be
non-negative integer, got {k}"; ensure you reference the compute_recall function
and variables k and groundtruth in the check so callers immediately see the
failure instead of getting a negative gt_k or wrong denominator.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 76943c95-d590-4322-98b6-b719cd0e0842
📒 Files selected for processing (5)
python/cuvs_bench/cuvs_bench/backends/base.pypython/cuvs_bench/cuvs_bench/backends/utils.pypython/cuvs_bench/cuvs_bench/orchestrator/orchestrator.pypython/cuvs_bench/cuvs_bench/run/__main__.pypython/cuvs_bench/cuvs_bench/tests/test_registry.py
🚧 Files skipped from review as they are similar to previous changes (3)
- python/cuvs_bench/cuvs_bench/backends/base.py
- python/cuvs_bench/cuvs_bench/run/main.py
- python/cuvs_bench/cuvs_bench/orchestrator/orchestrator.py
| if subset_size is not None and subset_size < 1: | ||
| raise ValueError( | ||
| f"subset_size must be a positive integer, got {subset_size}" | ||
| ) | ||
| with open(path, "rb") as f: | ||
| header = f.read(8) | ||
| if len(header) < 8: | ||
| raise ValueError( | ||
| f"File too small to contain a valid header (expected 8 bytes, " | ||
| f"got {len(header)}): {path}" | ||
| ) | ||
| n_rows = int(np.frombuffer(header[:4], dtype=np.uint32)[0]) | ||
| n_cols = int(np.frombuffer(header[4:], dtype=np.uint32)[0]) | ||
| if subset_size is not None: | ||
| n_rows = min(n_rows, subset_size) |
There was a problem hiding this comment.
Validate subset_size type explicitly before arithmetic/comparison.
Lines 96-110 only enforce a lower bound. Non-integer values (e.g., "1000" or 1.5) can fail later with non-actionable errors instead of a clear config validation error.
Proposed fix
def load_vectors(path: str, subset_size: Optional[int] = None) -> np.ndarray:
@@
- if subset_size is not None and subset_size < 1:
- raise ValueError(
- f"subset_size must be a positive integer, got {subset_size}"
- )
+ if subset_size is not None:
+ if not isinstance(subset_size, (int, np.integer)):
+ raise ValueError(
+ f"subset_size must be an integer, got {type(subset_size).__name__}"
+ )
+ if subset_size < 1:
+ raise ValueError(
+ f"subset_size must be a positive integer, got {subset_size}"
+ )As per coding guidelines, "Ensure missing validation does not cause crashes on invalid input through proper size/type checks."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@python/cuvs_bench/cuvs_bench/backends/utils.py` around lines 96 - 110, The
code currently only checks subset_size < 1, which allows non-integer values to
slip through and error later; add an explicit type validation before the numeric
comparison: ensure subset_size is an integer (e.g., isinstance(subset_size,
(int, np.integer))) and reject bools (e.g., isinstance(subset_size, bool) ->
disallow) and raise a clear TypeError/ValueError if not an integer, then
continue to check subset_size >= 1 and use that validated value when computing
n_rows; reference the subset_size variable in utils.py to locate the change.
| gt_k = min(k, groundtruth.shape[1]) | ||
| if gt_k == 0 or n_queries == 0: | ||
| return 0.0 | ||
| n_correct = sum( | ||
| len(set(neighbors[i, :k].tolist()) & set(groundtruth[i, :gt_k].tolist())) | ||
| for i in range(n_queries) | ||
| ) | ||
| return n_correct / (n_queries * gt_k) |
There was a problem hiding this comment.
Reject negative k in compute_recall to avoid invalid metrics.
Lines 185-192 compute gt_k = min(k, groundtruth.shape[1]) without validating k. A negative k can yield a negative denominator and invalid recall output.
Proposed fix
def compute_recall(
neighbors: np.ndarray, groundtruth: np.ndarray, k: int
) -> float:
@@
+ if not isinstance(k, (int, np.integer)):
+ raise ValueError(f"k must be an integer, got {type(k).__name__}")
+ if k < 0:
+ raise ValueError(f"k must be >= 0, got {k}")
+
n_queries = neighbors.shape[0]
@@
gt_k = min(k, groundtruth.shape[1])As per coding guidelines, "Provide clear and actionable error messages that include expected vs actual values where helpful."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@python/cuvs_bench/cuvs_bench/backends/utils.py` around lines 185 - 192, The
compute_recall function currently allows negative k (it computes gt_k = min(k,
groundtruth.shape[1])) which can produce invalid recall values; add an input
validation at the start of compute_recall to reject non-positive or negative k
(e.g., if k < 0) by raising a ValueError with a clear message that includes the
expected range and the actual k, e.g. "k must be non-negative integer, got {k}";
ensure you reference the compute_recall function and variables k and groundtruth
in the check so callers immediately see the failure instead of getting a
negative gt_k or wrong denominator.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
python/cuvs_bench/cuvs_bench/backends/utils.py (1)
186-194:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd validation for non-positive
kparameter.The function validates array shapes but doesn't validate
k. A negativekwould produce a negativegt_k, leading to invalid array slicing (neighbors[i, :k]returns empty for negative k) and a negative denominator in the recall calculation, resulting in meaningless output rather than a clear error.🛡️ Proposed fix
if groundtruth.shape[0] != n_queries: raise ValueError( f"Row count mismatch: neighbors has {n_queries} rows, " f"groundtruth has {groundtruth.shape[0]} rows" ) + if not isinstance(k, (int, np.integer)) or k < 1: + raise ValueError(f"k must be a positive integer, got {k}") gt_k = min(k, groundtruth.shape[1])As per coding guidelines, "Provide clear and actionable error messages that include expected vs actual values where helpful."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuvs_bench/cuvs_bench/backends/utils.py` around lines 186 - 194, The code currently computes gt_k = min(k, groundtruth.shape[1]) but never validates k; add an explicit validation at the start of this block to raise a ValueError when k <= 0 with a clear message including the provided k and the expected positive integer; this prevents negative gt_k and invalid slicing such as neighbors[i, :k] and a negative denominator in the recall calculation. Ensure the check covers non-integer or non-scalar k if applicable in the calling context (or cast/validate type beforehand) and mention k in the error text so developers see expected vs actual.
🧹 Nitpick comments (2)
python/cuvs_bench/cuvs_bench/orchestrator/config_loaders.py (1)
355-359: 💤 Low valueAdd
stackleveltowarnings.warnfor accurate caller attribution.Without
stacklevel=2, the warning will point to this line ingather_algorithm_configsrather than the caller's location, making it harder to identify where the invalid path was passed.♻️ Proposed fix
else: warnings.warn( f"algorithm_configuration path does not exist: " - f"{algorithm_configuration}" + f"{algorithm_configuration}", + stacklevel=2, )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuvs_bench/cuvs_bench/orchestrator/config_loaders.py` around lines 355 - 359, The warning in gather_algorithm_configs currently calls warnings.warn without a stacklevel, so the emitted warning points to this function instead of the caller; update the call that warns about a non-existent algorithm_configuration path to include stacklevel=2 (i.e., warnings.warn(..., stacklevel=2)) so the warning attributes to the caller that passed the invalid path.python/cuvs_bench/cuvs_bench/backends/utils.py (1)
96-99: 💤 Low valueConsider adding explicit type validation for
subset_size.The current validation only checks
subset_size < 1but doesn't validate the type. While string inputs would fail on comparison, float inputs like1.5would be silently accepted and used in integer arithmetic (min(n_rows, 1.5)), which could lead to unexpected behavior.♻️ Proposed fix
- if subset_size is not None and subset_size < 1: + if subset_size is not None: + if not isinstance(subset_size, (int, np.integer)) or isinstance(subset_size, bool): + raise TypeError( + f"subset_size must be an integer, got {type(subset_size).__name__}" + ) + if subset_size < 1: + raise ValueError( + f"subset_size must be a positive integer, got {subset_size}" + ) - raise ValueError( - f"subset_size must be a positive integer, got {subset_size}" - )As per coding guidelines, "Ensure missing validation does not cause crashes on invalid input through proper size/type checks."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuvs_bench/cuvs_bench/backends/utils.py` around lines 96 - 99, The current guard for subset_size only checks value but not type; update the validation around subset_size to explicitly require an integer (use isinstance(..., int) or numbers.Integral and exclude bool) and raise a TypeError if not an integer, then keep the existing ValueError if subset_size < 1; reference the subset_size variable and any uses like min(n_rows, subset_size) so callers won't pass floats/strings that cause surprising behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@python/cuvs_bench/cuvs_bench/backends/utils.py`:
- Around line 186-194: The code currently computes gt_k = min(k,
groundtruth.shape[1]) but never validates k; add an explicit validation at the
start of this block to raise a ValueError when k <= 0 with a clear message
including the provided k and the expected positive integer; this prevents
negative gt_k and invalid slicing such as neighbors[i, :k] and a negative
denominator in the recall calculation. Ensure the check covers non-integer or
non-scalar k if applicable in the calling context (or cast/validate type
beforehand) and mention k in the error text so developers see expected vs
actual.
---
Nitpick comments:
In `@python/cuvs_bench/cuvs_bench/backends/utils.py`:
- Around line 96-99: The current guard for subset_size only checks value but not
type; update the validation around subset_size to explicitly require an integer
(use isinstance(..., int) or numbers.Integral and exclude bool) and raise a
TypeError if not an integer, then keep the existing ValueError if subset_size <
1; reference the subset_size variable and any uses like min(n_rows, subset_size)
so callers won't pass floats/strings that cause surprising behavior.
In `@python/cuvs_bench/cuvs_bench/orchestrator/config_loaders.py`:
- Around line 355-359: The warning in gather_algorithm_configs currently calls
warnings.warn without a stacklevel, so the emitted warning points to this
function instead of the caller; update the call that warns about a non-existent
algorithm_configuration path to include stacklevel=2 (i.e., warnings.warn(...,
stacklevel=2)) so the warning attributes to the caller that passed the invalid
path.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: d721b528-59bf-4064-80b1-df9cc43d01b2
📒 Files selected for processing (7)
python/cuvs_bench/cuvs_bench/backends/base.pypython/cuvs_bench/cuvs_bench/backends/utils.pypython/cuvs_bench/cuvs_bench/orchestrator/config_loaders.pypython/cuvs_bench/cuvs_bench/orchestrator/orchestrator.pypython/cuvs_bench/cuvs_bench/tests/test_cli.pypython/cuvs_bench/cuvs_bench/tests/test_registry.pypython/cuvs_bench/cuvs_bench/tests/test_utils.py
✅ Files skipped from review due to trivial changes (1)
- python/cuvs_bench/cuvs_bench/backends/base.py
🚧 Files skipped from review as they are similar to previous changes (3)
- python/cuvs_bench/cuvs_bench/tests/test_utils.py
- python/cuvs_bench/cuvs_bench/tests/test_registry.py
- python/cuvs_bench/cuvs_bench/tests/test_cli.py
| self.metadata = metadata or {} | ||
|
|
||
| @property | ||
| def base_vectors(self) -> np.ndarray: |
There was a problem hiding this comment.
Maybe instead of base vectors, can we call this training_vectors for clarity? It's a good contrast to query_vectors.
There was a problem hiding this comment.
Got it. I addressed this and it will be included in my next commit
| self._query_vectors = value if value is not None else np.empty((0, 0)) | ||
|
|
||
| @property | ||
| def groundtruth_neighbors(self) -> Optional[np.ndarray]: |
There was a problem hiding this comment.
We should definitely store the neighbors AND the distances. At the very least for debugging purposes, we should have access to the distances so that users can print them out if needed.
| return data.reshape(n_rows, n_cols) | ||
|
|
||
|
|
||
| def expand_param_grid( |
There was a problem hiding this comment.
These functions seem like they should be core to the orchestrator / some other abstraction and shouldn't be merely utilities. Are these intended to be used directly by the backends? I think some sort of abstraction is needed to help force the backends to use them in some standard way. Otherwise, we're going to end up with users creating custom backends that don't follow our methodology and claiming they are getting some crazy different perf profile as a result (when in reality it's because they aren't using our tool in the intended way).
There was a problem hiding this comment.
For example, I think there should be a class that centralizes all of the dataset stuff (liek what you have above) but also provides a standardized way to get these params (maybe this new abstraction uses the utilities here, but I think these utilities should be private (_utils.py) and we should favor that standard abstraction layer. Maybe in addition to Dateset class, we have a "BenchmarkRunner" class? Or a "ParamRunner" class that generates the params and evaluates the recall? What we should aim for is for the documentation for adding new backends to have a clean, cohesive, concise, and consistent few lines of code that each backend can execute in order to properly run benchmarks. This will keep everyone doing the same thing as much as possible so we're always comparing apples to apples.
There was a problem hiding this comment.
These functions seem like they should be core to the orchestrator / some other abstraction and shouldn't be merely utilities. Are these intended to be used directly by the backends? I think some sort of abstraction is needed to help force the backends to use them in some standard way. Otherwise, we're going to end up with users creating custom backends that don't follow our methodology and claiming they are getting some crazy different perf profile as a result (when in reality it's because they aren't using our tool in the intended way).
These functions are not intended to be called directly by backends. They are called transparently by the framework: expand_param_grid is called by the config loader, compute_recall is called by the orchestrator, and load_vectors is called by the Dataset class. Backends only implement build() and search(). The utils.py file is just a centralized place where these functions live, but the framework is what calls them, not the backends.
There was a problem hiding this comment.
For example, I think there should be a class that centralizes all of the dataset stuff (liek what you have above) but also provides a standardized way to get these params (maybe this new abstraction uses the utilities here ... Maybe in addition to Dateset class, we have a "BenchmarkRunner" class? Or a "ParamRunner" class that generates the params and evaluates the recall?
This is largely what the current architecture provides. The Dataset class centralizes all dataset handling with transparent vector loading. The base ConfigLoader standardizes parameter expansion via expand_param_grid internally. The BenchmarkOrchestrator covers the role of both a BenchmarkRunner and ParamRunner:
- Generates params: calls
config_loader.load()which internally uses expand_param_grid - Evaluates recall: calls
compute_recallafterbackend.search()returns
Backend authors only implement build() and search() for a single configuration. Agreed on making utils.py private (_utils.py) to signal these are internal to the framework and not meant to be called by backends directly.
| # empty neighbors, so this is skipped for it. | ||
| # Note: The recall computation relies on empty neighbors to | ||
| # distinguish backends that already computed recall. | ||
| if search_result.success and search_result.neighbors.size > 0: |
There was a problem hiding this comment.
This is a good centralization. Maybe this satisfies my concern above, at least about having a standardized way to compute recall without having to rely on the user to invoke a utility function in each backend.
| from cuvs_bench.orchestrator.config_loaders import CppGBenchConfigLoader | ||
|
|
||
|
|
||
| def _write_test_bin(path, data): |
There was a problem hiding this comment.
Great to see all of these things being tested!
… as groundtruth_neighbors
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@python/cuvs_bench/cuvs_bench/backends/_utils.py`:
- Around line 186-202: The code silently slices neighbors[:, :k] even when
neighbors has fewer than k columns; add an explicit validation in the recall
computation to check neighbors.shape[1] >= k and raise a clear ValueError (or
similar) if not, referencing the variables neighbors, k, and groundtruth (and
existing gt_k) so the function fails fast on contract violations rather than
silently depressing recall; ensure the error message mentions the expected k and
the actual neighbors.shape[1] to aid debugging.
In `@python/cuvs_bench/cuvs_bench/backends/base.py`:
- Around line 62-183: Add a deprecated alias "base_vectors" that maps to the
existing training storage so callers using Dataset(base_vectors=...) or
dataset.base_vectors keep working: accept an optional base_vectors parameter in
__init__ and when provided set self._training_vectors (same as training_vectors
param handling), and add a property and setter named base_vectors that simply
forward to the existing training_vectors property and its setter (referencing
__init__, self._training_vectors, training_vectors property, and
training_vectors.setter) so lazy loading via base_file remains unchanged; mark
it as deprecated in the docstring/comments for removal in a future release.
In `@python/cuvs_bench/cuvs_bench/orchestrator/config_loaders.py`:
- Around line 774-778: The current branch returns build_path for any existing
path but doesn't ensure it's an executable file; update the check around
executable_dir/executable (the build_path variable) to only accept regular files
that are executable (e.g., use os.path.isfile(build_path) and
os.access(build_path, os.X_OK)) and only print/return when both checks pass; if
the candidate exists but is not an executable file, do not return it and
optionally log a clear message (including build_path) indicating it was skipped
because it is not an executable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 9b01f674-ac90-4875-9520-aba69f858f22
📒 Files selected for processing (7)
python/cuvs_bench/cuvs_bench/backends/_utils.pypython/cuvs_bench/cuvs_bench/backends/base.pypython/cuvs_bench/cuvs_bench/orchestrator/config_loaders.pypython/cuvs_bench/cuvs_bench/orchestrator/orchestrator.pypython/cuvs_bench/cuvs_bench/tests/test_cpp_gbench.pypython/cuvs_bench/cuvs_bench/tests/test_registry.pypython/cuvs_bench/cuvs_bench/tests/test_utils.py
🚧 Files skipped from review as they are similar to previous changes (3)
- python/cuvs_bench/cuvs_bench/tests/test_registry.py
- python/cuvs_bench/cuvs_bench/tests/test_utils.py
- python/cuvs_bench/cuvs_bench/orchestrator/orchestrator.py
| n_queries = neighbors.shape[0] | ||
| if groundtruth.shape[0] != n_queries: | ||
| raise ValueError( | ||
| f"Row count mismatch: neighbors has {n_queries} rows, " | ||
| f"groundtruth has {groundtruth.shape[0]} rows" | ||
| ) | ||
| gt_k = min(k, groundtruth.shape[1]) | ||
| if gt_k == 0 or n_queries == 0: | ||
| return 0.0 | ||
| n_correct = sum( | ||
| len( | ||
| set(neighbors[i, :k].tolist()) | ||
| & set(groundtruth[i, :gt_k].tolist()) | ||
| ) | ||
| for i in range(n_queries) | ||
| ) | ||
| return n_correct / (n_queries * gt_k) |
There was a problem hiding this comment.
Fail fast when neighbors has fewer columns than the evaluated k.
gt_k is derived from k and groundtruth.shape[1], but the numerator uses whatever width neighbors[:, :k] happens to have. If a backend returns too few neighbors, this silently depresses recall instead of surfacing the contract violation.
Suggested fix
n_queries = neighbors.shape[0]
if groundtruth.shape[0] != n_queries:
raise ValueError(
f"Row count mismatch: neighbors has {n_queries} rows, "
f"groundtruth has {groundtruth.shape[0]} rows"
)
- gt_k = min(k, groundtruth.shape[1])
+ gt_k = min(k, groundtruth.shape[1])
+ if neighbors.shape[1] < gt_k:
+ raise ValueError(
+ f"neighbors must have at least {gt_k} columns for recall@{k}, "
+ f"got {neighbors.shape[1]}"
+ )
if gt_k == 0 or n_queries == 0:
return 0.0As per coding guidelines, Ensure missing validation does not cause crashes on invalid input through proper size/type checks.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@python/cuvs_bench/cuvs_bench/backends/_utils.py` around lines 186 - 202, The
code silently slices neighbors[:, :k] even when neighbors has fewer than k
columns; add an explicit validation in the recall computation to check
neighbors.shape[1] >= k and raise a clear ValueError (or similar) if not,
referencing the variables neighbors, k, and groundtruth (and existing gt_k) so
the function fails fast on contract violations rather than silently depressing
recall; ensure the error message mentions the expected k and the actual
neighbors.shape[1] to aid debugging.
| def __init__( | ||
| self, | ||
| name: str, | ||
| training_vectors: Optional[np.ndarray] = None, | ||
| query_vectors: Optional[np.ndarray] = None, | ||
| groundtruth_neighbors: Optional[np.ndarray] = None, | ||
| groundtruth_distances: Optional[np.ndarray] = None, | ||
| distance_metric: str = "euclidean", | ||
| base_file: Optional[str] = None, | ||
| query_file: Optional[str] = None, | ||
| groundtruth_neighbors_file: Optional[str] = None, | ||
| groundtruth_distances_file: Optional[str] = None, | ||
| metadata: Optional[Dict[str, Any]] = None, | ||
| ): | ||
| self.name = name | ||
| # Vectors are stored privately to support lazy loading. | ||
| # If the caller provides vectors directly, they are used as-is. | ||
| # If empty and a file path is provided, the corresponding property | ||
| # loads vectors from the file on first access. This keeps the | ||
| # loading logic invisible to backends: they just access | ||
| # dataset.training_vectors and get a numpy array regardless of | ||
| # whether it was passed in or loaded from disk. | ||
| self._training_vectors = ( | ||
| training_vectors | ||
| if training_vectors is not None | ||
| else np.empty((0, 0)) | ||
| ) | ||
| self._query_vectors = ( | ||
| query_vectors if query_vectors is not None else np.empty((0, 0)) | ||
| ) | ||
| self._groundtruth_neighbors = groundtruth_neighbors | ||
| self._groundtruth_distances = groundtruth_distances | ||
| self.distance_metric = distance_metric | ||
| self.base_file = base_file | ||
| self.query_file = query_file | ||
| self.groundtruth_neighbors_file = groundtruth_neighbors_file | ||
| self.groundtruth_distances_file = groundtruth_distances_file | ||
| self.metadata = metadata or {} | ||
|
|
||
| @property | ||
| def training_vectors(self) -> np.ndarray: | ||
| """Training vectors for index building. | ||
|
|
||
| Loaded from base_file on first access if not provided directly. | ||
| """ | ||
| if self._training_vectors.size == 0 and self.base_file: | ||
| from ._utils import load_vectors | ||
|
|
||
| self._training_vectors = load_vectors( | ||
| self.base_file, self.metadata.get("subset_size") | ||
| ) | ||
| return self._training_vectors | ||
|
|
||
| @training_vectors.setter | ||
| def training_vectors(self, value: Optional[np.ndarray]) -> None: | ||
| """Set training vectors directly.""" | ||
| self._training_vectors = ( | ||
| value if value is not None else np.empty((0, 0)) | ||
| ) | ||
|
|
||
| @property | ||
| def query_vectors(self) -> np.ndarray: | ||
| """Query vectors for search. | ||
|
|
||
| Loaded from query_file on first access if not provided directly. | ||
| """ | ||
| if self._query_vectors.size == 0 and self.query_file: | ||
| from ._utils import load_vectors | ||
|
|
||
| self._query_vectors = load_vectors(self.query_file) | ||
| return self._query_vectors | ||
|
|
||
| @query_vectors.setter | ||
| def query_vectors(self, value: Optional[np.ndarray]) -> None: | ||
| """Set query vectors directly.""" | ||
| self._query_vectors = value if value is not None else np.empty((0, 0)) | ||
|
|
||
| @property | ||
| def groundtruth_neighbors(self) -> Optional[np.ndarray]: | ||
| """Ground truth neighbor IDs. | ||
|
|
||
| Loaded from groundtruth_neighbors_file on first access if not | ||
| provided directly. | ||
| """ | ||
| if ( | ||
| self._groundtruth_neighbors is None | ||
| and self.groundtruth_neighbors_file | ||
| ): | ||
| from ._utils import load_vectors | ||
|
|
||
| self._groundtruth_neighbors = load_vectors( | ||
| self.groundtruth_neighbors_file | ||
| ) | ||
| return self._groundtruth_neighbors | ||
|
|
||
| @groundtruth_neighbors.setter | ||
| def groundtruth_neighbors(self, value: Optional[np.ndarray]) -> None: | ||
| """Set ground truth neighbors directly.""" | ||
| self._groundtruth_neighbors = value | ||
|
|
||
| @property | ||
| def groundtruth_distances(self) -> Optional[np.ndarray]: | ||
| """Ground truth distances. | ||
|
|
||
| Loaded from groundtruth_distances_file on first access if not | ||
| provided directly. | ||
| """ | ||
| if ( | ||
| self._groundtruth_distances is None | ||
| and self.groundtruth_distances_file | ||
| ): | ||
| from ._utils import load_vectors | ||
|
|
||
| self._groundtruth_distances = load_vectors( | ||
| self.groundtruth_distances_file | ||
| ) | ||
| return self._groundtruth_distances | ||
|
|
||
| @groundtruth_distances.setter | ||
| def groundtruth_distances(self, value: Optional[np.ndarray]) -> None: | ||
| """Set ground truth distances directly.""" | ||
| self._groundtruth_distances = value |
There was a problem hiding this comment.
Keep base_vectors as a deprecated alias for at least one release.
This rename breaks existing callers that instantiate Dataset(base_vectors=...) or read/write dataset.base_vectors. Please keep a compatibility alias in __init__ and a forwarding property/setter while the ecosystem migrates.
Compatibility-preserving sketch
+import warnings
+
def __init__(
self,
name: str,
+ base_vectors: Optional[np.ndarray] = None,
training_vectors: Optional[np.ndarray] = None,
query_vectors: Optional[np.ndarray] = None,
groundtruth_neighbors: Optional[np.ndarray] = None,
groundtruth_distances: Optional[np.ndarray] = None,
distance_metric: str = "euclidean",
@@
+ if training_vectors is None and base_vectors is not None:
+ warnings.warn(
+ "`base_vectors` is deprecated; use `training_vectors`.",
+ DeprecationWarning,
+ stacklevel=2,
+ )
+ training_vectors = base_vectors
self._training_vectors = (
training_vectors
if training_vectors is not None
else np.empty((0, 0))
)
@@
+ `@property`
+ def base_vectors(self) -> np.ndarray:
+ warnings.warn(
+ "`base_vectors` is deprecated; use `training_vectors`.",
+ DeprecationWarning,
+ stacklevel=2,
+ )
+ return self.training_vectors
+
+ `@base_vectors.setter`
+ def base_vectors(self, value: Optional[np.ndarray]) -> None:
+ warnings.warn(
+ "`base_vectors` is deprecated; use `training_vectors`.",
+ DeprecationWarning,
+ stacklevel=2,
+ )
+ self.training_vectors = valueAs per coding guidelines, Prevent API breaking changes to public Python interfaces, including removing or renaming public methods/attributes without proper deprecation (minimum one release cycle).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@python/cuvs_bench/cuvs_bench/backends/base.py` around lines 62 - 183, Add a
deprecated alias "base_vectors" that maps to the existing training storage so
callers using Dataset(base_vectors=...) or dataset.base_vectors keep working:
accept an optional base_vectors parameter in __init__ and when provided set
self._training_vectors (same as training_vectors param handling), and add a
property and setter named base_vectors that simply forward to the existing
training_vectors property and its setter (referencing __init__,
self._training_vectors, training_vectors property, and training_vectors.setter)
so lazy loading via base_file remains unchanged; mark it as deprecated in the
docstring/comments for removal in a future release.
| if executable_dir is not None: | ||
| build_path = os.path.join(executable_dir, executable) | ||
| if os.path.exists(build_path): | ||
| print(f"-- Using cuVS bench from {build_path}.") | ||
| return build_path |
There was a problem hiding this comment.
Validate that the executable_dir candidate is actually runnable.
This new branch accepts any existing path. If <executable_dir>/<executable> is a directory or a non-executable file, the loader returns it and the failure only shows up later during process launch.
Suggested fix
if executable_dir is not None:
build_path = os.path.join(executable_dir, executable)
- if os.path.exists(build_path):
+ if os.path.isfile(build_path) and os.access(build_path, os.X_OK):
print(f"-- Using cuVS bench from {build_path}.")
return build_path
+ if os.path.exists(build_path):
+ raise ValueError(
+ f"Configured executable is not runnable: {build_path}"
+ )As per coding guidelines, Ensure missing validation does not cause crashes on invalid input through proper size/type checks.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@python/cuvs_bench/cuvs_bench/orchestrator/config_loaders.py` around lines 774
- 778, The current branch returns build_path for any existing path but doesn't
ensure it's an executable file; update the check around
executable_dir/executable (the build_path variable) to only accept regular files
that are executable (e.g., use os.path.isfile(build_path) and
os.access(build_path, os.X_OK)) and only print/return when both checks pass; if
the candidate exists but is not an executable file, do not return it and
optionally log a clear message (including build_path) indicating it was skipped
because it is not an executable.
Summary
This PR centralizes duplicated logic identified during review of the OpenSearch and Elasticsearch backend PRs. Both backends independently reimplemented common operations (vector file loading, YAML parsing, dataset lookup, DatasetConfig construction, parameter expansion, recall computation) that should be shared across all backends. This PR moves that shared logic into the framework so that backend authors focus only on backend-specific concerns. It also routes the CLI through the new pluggable benchmark API and restores a fully CLI-based workflow.
Motivation
With the pluggable backend architecture, each new Python-native backend needs to load binary vector files, parse dataset YAML configs, look up datasets by name, expand parameter combinations, and compute recall. Without centralized utilities, each backend reimplements these independently, leading to:
Changes
1. Shared vector loading utility (
backends/utils.py)Added
load_vectors()anddtype_from_filename()(originating fromgenerate_groundtruth/utils.py) for reading binary vector files. Supports all formats (.fbin,.f16bin,.u8bin,.i8bin,.ibin) withsubset_sizesupport and input validation (unsupported extensions, truncated headers/payloads). The C++ backend is unaffected since it passes file paths to the subprocess.2. Promoted config-loading methods to base
ConfigLoaderclassMoved
load_yaml_file,get_dataset_configuration, andgather_algorithm_configsfromCppGBenchConfigLoaderto the baseConfigLoaderclass. These were being reimplemented by both the OpenSearch and Elasticsearch config loaders. They are now inherited automatically. Also added a warning for invalidalgorithm_configurationpaths (previously silently ignored). Switchedgather_algorithm_configsfrom exclusion-based filtering to.yaml/.ymlinclusion.3. Dataset refactored from dataclass to regular class with transparent vector loading
The
Datasetclass now transparently loads vectors from file paths on first access via properties. Python-native backends just accessdataset.base_vectorsand get a numpy array regardless of whether vectors were provided directly or loaded from disk. The C++ backend continues to usedataset.base_fileonly, never triggering any loading.4. ConfigLoader template method
ConfigLoader.load()is now a concrete method that handles the shared steps (loading dataset YAML, finding the dataset by name, constructingDatasetConfigwith path resolution viabuild_dataset_config()) then delegates to the abstract_build_benchmark_configs()for backend-specific logic. Backend authors implement_build_benchmark_configs()and receive theDatasetConfigalready built. They cannot skip or reimplement the shared steps. The orchestrator is unchanged.5. Centralized parameter expansion (
backends/utils.py)Added
expand_param_grid()for expanding YAML parameter specs (dict-of-lists to list-of-dicts via Cartesian product). All backends use it transparently. The C++ backend uses it inprepare_indexes()(replacing inlineitertools.product) and then applies its own constraint validation on the expanded combinations. Python-native backends use it directly without filtering.6. Recall computation moved to orchestrator
Added
compute_recall()tobackends/utils.py. The orchestrator now computes recall transparently afterbackend.search()returns, for any backend that returns actual neighbor arrays. The C++ backend computes recall in the subprocess and returns empty neighbors, so this is naturally skipped for it. Backends no longer need to compute recall themselves.7. CLI routed through orchestrator
Updated
run/__main__.pyto callBenchmarkOrchestratordirectly instead of the legacyrun.py. Added new CLI flags:--mode(sweep/tune),--constraints(tune mode targets),--n-trials(Optuna trials), and--backend-config(YAML config for non-C++ backends). The deprecation warning has been removed. The legacyrun.pyandrunners.pyare kept as deprecated for now.8. Restored CLI-based docs
Updated
docs/source/cuvs_bench/index.rstto use CLI commands for step 2 (build and search) instead of PythonBenchmarkOrchestratorcode blocks. All four steps (prepare, build/search, export, plot) are now fully CLI-based.9. New and updated tests
tests/test_utils.py(45 tests): coversdtype_from_filename,load_vectors, Dataset transparent loading, base ConfigLoader methods,expand_param_grid, andcompute_recalltests/test_cli.py(5 new tests): covers--mode,--backend-config,--n-trials,--constraintsflags and invalid backend config handlingRemaining work
generate_groundtruth/utils.py,get_dataset/fbin_to_f16bin.py, andget_dataset/hdf5_to_fbin.pyinto a top-levelcuvs_bench/io.pymodulerun.pyandrunners.pyonce deprecated period is over