feat(text-metrics): split oneig_alignment by davidberenstein1957 · Pull Request #646 · PrunaAI/pruna

davidberenstein1957 · 2026-04-28T13:03:58Z

Summary

Split oneig_alignment into its own stacked branch/PR
Add OneIGAlignmentMetric implementation
Add OneIG alignment subset benchmark entries and focused alignment tests

Test plan

Run uv run pytest tests/evaluation/test_text_metrics.py -k oneig_alignment

Made with Cursor

Adds oneig_alignment metric implementation, its focused tests, and benchmark subset wiring while keeping reasoning and text-rendering metrics for later stacked PRs. Made-with: Cursor

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

Comment @cursor review or bugbot run to trigger another review on this PR

^{Reviewed by Cursor Bugbot for commit 2627d78. Configure here.}

cursor · 2026-04-28T13:18:22Z

+    Benchmark(
+        name="OneIG Portrait",
+        description="OneIG subset: people and portraits.",
+        metrics=["oneig_alignment"],


OneIG subset benchmarks fail to register at import

High Severity

The new OneIG benchmarks create lookup keys (e.g., OneIGAnimeStylization) that are missing from base_datasets. This causes BenchmarkRegistry._register to raise a ValueError, failing module import for pruna.evaluation.benchmarks and its dependents. The original OneIG entry was also removed.

^{Reviewed by Cursor Bugbot for commit 2627d78. Configure here.}

cursor · 2026-04-28T13:18:22Z

+    assert "21" not in record["questions"]
+    assert "21" not in record["dependencies"]
+    assert record["questions"] == {"1": "Is there a cat?"}
+    assert record["dependencies"] == {"1": [0]}


OneIG record test calls function with wrong signature

Medium Severity

The test_to_oneig_record_strips_null_questions_and_dependencies test calls _to_oneig_record with an incorrect number of arguments, causing a TypeError. The test also asserts that null-valued questions and dependencies are stripped, a behavior not implemented in _to_oneig_record.

^{Reviewed by Cursor Bugbot for commit 2627d78. Configure here.}

cursor · 2026-04-28T13:18:22Z

+        metrics=["oneig_alignment"],
+        task_type="text_to_image",
+        reference="https://arxiv.org/abs/2506.07977",
+    ),


OneIG Multilingualism subset has no alignment questions

Medium Severity

The OneIG Multilingualism benchmark wires the oneig_alignment metric, but _CATEGORY_TO_QD in prompt.py only maps Anime_Stylization, Portrait, and General_Object to Q_D files. Multilingualism rows therefore receive an empty questions dict in _to_oneig_record, and OneIGAlignmentMetric.update raises ValueError whenever questions is empty. Running this benchmark would error out on every sample rather than produce alignment scores.

Additional Locations (1)

src/pruna/evaluation/metrics/metric_oneig_alignment.py#L213-L218

^{Reviewed by Cursor Bugbot for commit 2627d78. Configure here.}

feat(text-metrics): split oneig_alignment into dedicated branch

2627d78

Adds oneig_alignment metric implementation, its focused tests, and benchmark subset wiring while keeping reasoning and text-rendering metrics for later stacked PRs. Made-with: Cursor

This was referenced Apr 28, 2026

feat(text-metrics): add text-based VLM judge metrics #639

Closed

feat(vision-metrics): add vision-based VLM judge metrics #640

Closed

cursor Bot reviewed Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(text-metrics): split oneig_alignment#646

feat(text-metrics): split oneig_alignment#646
davidberenstein1957 wants to merge 1 commit intofeat/vlm-pr-3a-qa-accuracyfrom
feat/vlm-pr-3b-oneig-alignment

davidberenstein1957 commented Apr 28, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Apr 28, 2026

Uh oh!

cursor Bot Apr 28, 2026

Uh oh!

cursor Bot Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

davidberenstein1957 commented Apr 28, 2026

Summary

Test plan

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 28, 2026

Choose a reason for hiding this comment

OneIG subset benchmarks fail to register at import

Uh oh!

cursor Bot Apr 28, 2026

Choose a reason for hiding this comment

OneIG record test calls function with wrong signature

Uh oh!

cursor Bot Apr 28, 2026

Choose a reason for hiding this comment

OneIG Multilingualism subset has no alignment questions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant