Refactor dataset loading, use of named tuple, and updating the available datasets everywhere#2001
Refactor dataset loading, use of named tuple, and updating the available datasets everywhere#2001sahel-sh wants to merge 7 commits intoNVIDIA:mainfrom
Conversation
Signed-off-by: Sahel Sharifymoghaddam <ssharifymogh@nvidia.com>
…iever into cleanup_dataset_utils
Greptile SummaryThis PR renames
|
| Filename | Overview |
|---|---|
| retrieval-bench/src/retrieval_bench/pipeline_evaluation/dataset_loader.py | Adds BenchmarkDatasetBundle NamedTuple; renames load_vidore_dataset to load_benchmark_dataset; fixes Optional[str] type annotation. Logic unchanged. |
| retrieval-bench/src/retrieval_bench/init.py | Updates top-level all export from load_vidore_dataset to load_benchmark_dataset; no deprecation alias provided — breaks any external caller using the old name. |
| retrieval-bench/src/retrieval_bench/cli/pipeline_evaluation.py | Extracts _resolve_selected_datasets helper to eliminate 3x duplicated dataset-filter logic; updates help text for BRIGHT; no logic changes. |
| retrieval-bench/src/retrieval_bench/cli/evaluate.py | Switches from tuple unpacking to NamedTuple attribute access after load_benchmark_dataset call; clean mechanical update. |
| retrieval-bench/src/retrieval_bench/pipeline_evaluation/init.py | Updates internal all to export load_benchmark_dataset; same breaking-rename concern applies at this sub-package level. |
| retrieval-bench/src/retrieval_bench/pipelines/init.py | Docstring update only — changes 'vidore-benchmark evaluation' to 'benchmark evaluation'. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["load_benchmark_dataset(dataset_name, split, language)"] --> B{starts with 'bright/'?}
B -- Yes --> C["_load_bright_split(task, split, language)"]
B -- No --> D["_upstream_load_vidore_dataset(...)"]
C --> E["BenchmarkDatasetBundle(...)"]
D --> F["add excluded_ids_by_query placeholder"]
F --> E
E --> G["Callers: evaluate.py via dataset.query_ids, dataset.queries, ..."]
H["_resolve_selected_datasets(all_datasets, datasets)"] --> I{all_datasets or no filter?}
I -- Yes --> J["return all ds_full"]
I -- No --> K["tokenize --datasets CSV"]
K --> L{tok in short_by_full or full_by_short?}
L -- Yes --> M["append canonical name"]
L -- No --> N["raise typer.BadParameter"]
M --> O["return selected, short_by_full"]
J --> O
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
retrieval-bench/src/retrieval_bench/__init__.py:22-29
**Breaking public API rename without deprecation alias**
`load_vidore_dataset` was the exported public symbol in `__all__` and is now fully removed with no backward-compatible alias. Any downstream code (notebooks, scripts, other packages) doing `from retrieval_bench import load_vidore_dataset` will get an `ImportError` immediately on upgrade. The codebase's backward-compatibility rule requires a deprecation cycle for renames rather than a hard removal. Adding `load_vidore_dataset = load_benchmark_dataset` alongside a deprecation warning in `dataset_loader.py` would preserve compatibility without blocking the rename.
Reviews (3): Last reviewed commit: "Merge branch 'main' into cleanup_dataset..." | Re-trigger Greptile
| class BenchmarkDatasetBundle(NamedTuple): | ||
| query_ids: List[str] | ||
| queries: List[str] | ||
| corpus_ids: List[str] | ||
| corpus_images: List[Any] | ||
| corpus_texts: List[str] | ||
| qrels: Dict[str, Dict[str, int]] | ||
| query_languages: Dict[str, str] | ||
| excluded_ids_by_query: Dict[str, List[str]] |
There was a problem hiding this comment.
Missing tests for new public API
The PR checklist acknowledges that tests are not added. BenchmarkDatasetBundle and the renamed load_benchmark_dataset (including its BRIGHT routing path) introduce new logic—the _load_bright_split data-validation path in particular has several error branches (No queries found, No relevance judgments, excluded_ids overlaps gold_ids) that are untested. Per the test-coverage-new-code rule, unit tests covering the happy path and these error branches should accompany this change.
Rule Used: New functionality must include corresponding unit ... (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: retrieval-bench/src/retrieval_bench/pipeline_evaluation/dataset_loader.py
Line: 39-47
Comment:
**Missing tests for new public API**
The PR checklist acknowledges that tests are not added. `BenchmarkDatasetBundle` and the renamed `load_benchmark_dataset` (including its BRIGHT routing path) introduce new logic—the `_load_bright_split` data-validation path in particular has several error branches (`No queries found`, `No relevance judgments`, `excluded_ids overlaps gold_ids`) that are untested. Per the `test-coverage-new-code` rule, unit tests covering the happy path and these error branches should accompany this change.
**Rule Used:** New functionality must include corresponding unit ... ([source](https://app.greptile.com/review/custom-context?memory=test-coverage-new-code))
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
This is not a new logic, it is just a name change from load_vidore_dataset to load_benchmark_dataset. _load_bright_split is not a new function
There was a problem hiding this comment.
_load_bright_split call inside this function is not a new code, the change only renames load_vidore_dataset to load_benchmark_dataset
Signed-off-by: Sahel Sharifymoghaddam <ssharifymogh@nvidia.com>
Signed-off-by: Sahel Sharifymoghaddam <ssharifymogh@nvidia.com>
Signed-off-by: Sahel Sharifymoghaddam <ssharifymogh@nvidia.com>
| __all__ = [ | ||
| "BasePipeline", | ||
| "evaluate_retrieval", | ||
| "aggregate_results", | ||
| "load_vidore_dataset", | ||
| "load_benchmark_dataset", | ||
| "get_available_datasets", | ||
| "print_dataset_info", | ||
| ] |
There was a problem hiding this comment.
Breaking public API rename without deprecation alias
load_vidore_dataset was the exported public symbol in __all__ and is now fully removed with no backward-compatible alias. Any downstream code (notebooks, scripts, other packages) doing from retrieval_bench import load_vidore_dataset will get an ImportError immediately on upgrade. The codebase's backward-compatibility rule requires a deprecation cycle for renames rather than a hard removal. Adding load_vidore_dataset = load_benchmark_dataset alongside a deprecation warning in dataset_loader.py would preserve compatibility without blocking the rename.
Prompt To Fix With AI
This is a comment left during a code review.
Path: retrieval-bench/src/retrieval_bench/__init__.py
Line: 22-29
Comment:
**Breaking public API rename without deprecation alias**
`load_vidore_dataset` was the exported public symbol in `__all__` and is now fully removed with no backward-compatible alias. Any downstream code (notebooks, scripts, other packages) doing `from retrieval_bench import load_vidore_dataset` will get an `ImportError` immediately on upgrade. The codebase's backward-compatibility rule requires a deprecation cycle for renames rather than a hard removal. Adding `load_vidore_dataset = load_benchmark_dataset` alongside a deprecation warning in `dataset_loader.py` would preserve compatibility without blocking the rename.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
@oliverholworthy should we have a proper deprecation for this name?
I expect most of the usage to be through the cli commands rather than direct imports
Description
This PR updates the load dataset function name to include BRIGHT in addition to ViDoRe and adds a named tuple for returned loaded objects. It also factors out the repeated dataset filtering code in a helper function in the cli, and updates the outdated example for list-dataset.
Checklist