Skip to content

Refactor dataset loading, use of named tuple, and updating the available datasets everywhere#2001

Open
sahel-sh wants to merge 7 commits intoNVIDIA:mainfrom
sahel-sh:cleanup_dataset_utils
Open

Refactor dataset loading, use of named tuple, and updating the available datasets everywhere#2001
sahel-sh wants to merge 7 commits intoNVIDIA:mainfrom
sahel-sh:cleanup_dataset_utils

Conversation

@sahel-sh
Copy link
Copy Markdown

@sahel-sh sahel-sh commented May 8, 2026

Description

This PR updates the load dataset function name to include BRIGHT in addition to ViDoRe and adds a named tuple for returned loaded objects. It also factors out the repeated dataset filtering code in a helper function in the cli, and updates the outdated example for list-dataset.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

sahel-sh added 3 commits May 8, 2026 15:56
Signed-off-by: Sahel Sharifymoghaddam <ssharifymogh@nvidia.com>
@sahel-sh sahel-sh requested review from a team as code owners May 8, 2026 16:11
@sahel-sh sahel-sh requested a review from edknv May 8, 2026 16:11
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 8, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 8, 2026

Greptile Summary

This PR renames load_vidore_dataset to load_benchmark_dataset, wraps its return value in a BenchmarkDatasetBundle NamedTuple, and extracts a _resolve_selected_datasets helper to eliminate three duplicated dataset-filter blocks. Help text and docstrings are updated throughout to reflect BRIGHT dataset support alongside ViDoRe.

  • NamedTuple return type: BenchmarkDatasetBundle replaces the bare 8-tuple returned by both _load_bright_split and the renamed loader, improving call-site readability and IDE support.
  • Dataset filter refactor: _resolve_selected_datasets deduplicates ~25 lines that were copy-pasted into compare_results, report_llm_usage, and purge_llm_error_traces.
  • Breaking rename: load_vidore_dataset is removed from __all__ in both retrieval_bench/__init__.py and pipeline_evaluation/__init__.py without a backward-compatible alias, which will break any existing caller that imports the old symbol.

Confidence Score: 4/5

Safe to merge after confirming no external consumers rely on the old load_vidore_dataset symbol, or after adding a deprecation alias.

The rename of load_vidore_dataset to load_benchmark_dataset is a hard removal from all in both the top-level and sub-package init files with no backward-compatible alias. Any script or notebook doing from retrieval_bench import load_vidore_dataset will fail with an ImportError after this lands. All other changes — the NamedTuple wrapper, the extracted helper, and the docstring updates — are clean and correct.

retrieval-bench/src/retrieval_bench/init.py and retrieval-bench/src/retrieval_bench/pipeline_evaluation/init.py — both drop load_vidore_dataset from their exported interface without an alias.

Important Files Changed

Filename Overview
retrieval-bench/src/retrieval_bench/pipeline_evaluation/dataset_loader.py Adds BenchmarkDatasetBundle NamedTuple; renames load_vidore_dataset to load_benchmark_dataset; fixes Optional[str] type annotation. Logic unchanged.
retrieval-bench/src/retrieval_bench/init.py Updates top-level all export from load_vidore_dataset to load_benchmark_dataset; no deprecation alias provided — breaks any external caller using the old name.
retrieval-bench/src/retrieval_bench/cli/pipeline_evaluation.py Extracts _resolve_selected_datasets helper to eliminate 3x duplicated dataset-filter logic; updates help text for BRIGHT; no logic changes.
retrieval-bench/src/retrieval_bench/cli/evaluate.py Switches from tuple unpacking to NamedTuple attribute access after load_benchmark_dataset call; clean mechanical update.
retrieval-bench/src/retrieval_bench/pipeline_evaluation/init.py Updates internal all to export load_benchmark_dataset; same breaking-rename concern applies at this sub-package level.
retrieval-bench/src/retrieval_bench/pipelines/init.py Docstring update only — changes 'vidore-benchmark evaluation' to 'benchmark evaluation'.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["load_benchmark_dataset(dataset_name, split, language)"] --> B{starts with 'bright/'?}
    B -- Yes --> C["_load_bright_split(task, split, language)"]
    B -- No --> D["_upstream_load_vidore_dataset(...)"]
    C --> E["BenchmarkDatasetBundle(...)"]
    D --> F["add excluded_ids_by_query placeholder"]
    F --> E
    E --> G["Callers: evaluate.py via dataset.query_ids, dataset.queries, ..."]

    H["_resolve_selected_datasets(all_datasets, datasets)"] --> I{all_datasets or no filter?}
    I -- Yes --> J["return all ds_full"]
    I -- No --> K["tokenize --datasets CSV"]
    K --> L{tok in short_by_full or full_by_short?}
    L -- Yes --> M["append canonical name"]
    L -- No --> N["raise typer.BadParameter"]
    M --> O["return selected, short_by_full"]
    J --> O
Loading
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
retrieval-bench/src/retrieval_bench/__init__.py:22-29
**Breaking public API rename without deprecation alias**

`load_vidore_dataset` was the exported public symbol in `__all__` and is now fully removed with no backward-compatible alias. Any downstream code (notebooks, scripts, other packages) doing `from retrieval_bench import load_vidore_dataset` will get an `ImportError` immediately on upgrade. The codebase's backward-compatibility rule requires a deprecation cycle for renames rather than a hard removal. Adding `load_vidore_dataset = load_benchmark_dataset` alongside a deprecation warning in `dataset_loader.py` would preserve compatibility without blocking the rename.

Reviews (3): Last reviewed commit: "Merge branch 'main' into cleanup_dataset..." | Re-trigger Greptile

Comment thread retrieval-bench/src/retrieval_bench/cli/pipeline_evaluation.py
Comment on lines +39 to +47
class BenchmarkDatasetBundle(NamedTuple):
query_ids: List[str]
queries: List[str]
corpus_ids: List[str]
corpus_images: List[Any]
corpus_texts: List[str]
qrels: Dict[str, Dict[str, int]]
query_languages: Dict[str, str]
excluded_ids_by_query: Dict[str, List[str]]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing tests for new public API
The PR checklist acknowledges that tests are not added. BenchmarkDatasetBundle and the renamed load_benchmark_dataset (including its BRIGHT routing path) introduce new logic—the _load_bright_split data-validation path in particular has several error branches (No queries found, No relevance judgments, excluded_ids overlaps gold_ids) that are untested. Per the test-coverage-new-code rule, unit tests covering the happy path and these error branches should accompany this change.

Rule Used: New functionality must include corresponding unit ... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: retrieval-bench/src/retrieval_bench/pipeline_evaluation/dataset_loader.py
Line: 39-47

Comment:
**Missing tests for new public API**
The PR checklist acknowledges that tests are not added. `BenchmarkDatasetBundle` and the renamed `load_benchmark_dataset` (including its BRIGHT routing path) introduce new logic—the `_load_bright_split` data-validation path in particular has several error branches (`No queries found`, `No relevance judgments`, `excluded_ids overlaps gold_ids`) that are untested. Per the `test-coverage-new-code` rule, unit tests covering the happy path and these error branches should accompany this change.

**Rule Used:** New functionality must include corresponding unit ... ([source](https://app.greptile.com/review/custom-context?memory=test-coverage-new-code))

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a new logic, it is just a name change from load_vidore_dataset to load_benchmark_dataset. _load_bright_split is not a new function

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_load_bright_split call inside this function is not a new code, the change only renames load_vidore_dataset to load_benchmark_dataset

Signed-off-by: Sahel Sharifymoghaddam <ssharifymogh@nvidia.com>
@boliu61 boliu61 requested a review from oliverholworthy May 8, 2026 16:31
@sahel-sh sahel-sh self-assigned this May 8, 2026
sahel-sh added 3 commits May 8, 2026 18:15
Signed-off-by: Sahel Sharifymoghaddam <ssharifymogh@nvidia.com>
Signed-off-by: Sahel Sharifymoghaddam <ssharifymogh@nvidia.com>
Comment on lines 22 to 29
__all__ = [
"BasePipeline",
"evaluate_retrieval",
"aggregate_results",
"load_vidore_dataset",
"load_benchmark_dataset",
"get_available_datasets",
"print_dataset_info",
]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Breaking public API rename without deprecation alias

load_vidore_dataset was the exported public symbol in __all__ and is now fully removed with no backward-compatible alias. Any downstream code (notebooks, scripts, other packages) doing from retrieval_bench import load_vidore_dataset will get an ImportError immediately on upgrade. The codebase's backward-compatibility rule requires a deprecation cycle for renames rather than a hard removal. Adding load_vidore_dataset = load_benchmark_dataset alongside a deprecation warning in dataset_loader.py would preserve compatibility without blocking the rename.

Prompt To Fix With AI
This is a comment left during a code review.
Path: retrieval-bench/src/retrieval_bench/__init__.py
Line: 22-29

Comment:
**Breaking public API rename without deprecation alias**

`load_vidore_dataset` was the exported public symbol in `__all__` and is now fully removed with no backward-compatible alias. Any downstream code (notebooks, scripts, other packages) doing `from retrieval_bench import load_vidore_dataset` will get an `ImportError` immediately on upgrade. The codebase's backward-compatibility rule requires a deprecation cycle for renames rather than a hard removal. Adding `load_vidore_dataset = load_benchmark_dataset` alongside a deprecation warning in `dataset_loader.py` would preserve compatibility without blocking the rename.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oliverholworthy should we have a proper deprecation for this name?
I expect most of the usage to be through the cli commands rather than direct imports

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant