feat(vision-metrics): split img_edit_score by davidberenstein1957 · Pull Request #651 · PrunaAI/pruna

davidberenstein1957 · 2026-04-28T13:04:13Z

Summary

Splits img_edit_score into its own stacked PR, adds ImageEditScoreMetric, and wires ImgEdit benchmark entry with clamping regression coverage.

This PR also carries benchmark-paper alignment cleanup from the umbrella work while preserving compatibility:

keeps text_to_image task_type literal behavior
introduces TASK_TYPE_* constants for readability
removes private-reference style notes

Stack Position

Base: PR feat(vision-metrics): split vie_score #650 (feat/vlm-pr-4b-vie-score)
Next: PR feat(e2e-tests): stacked e2e after split metrics #641 (feat/vlm-pr-5-e2e-tests)
Final integration: PR feat(e2e-tests): stacked e2e after split metrics #641 (feat/vlm-pr-5-e2e-tests)
Canonical umbrella reference: PR feat(evaluation): add VLMMetrics #545 (feat/metrics-vlm-support)

Files

src/pruna/evaluation/metrics/metric_img_edit_score.py
src/pruna/evaluation/benchmarks.py
tests/evaluation/test_vision_metrics.py

Test Plan

uv run pytest tests/evaluation/test_vision_metrics.py -k img_edit_score

Review Focus

ImgEdit score clamping behavior
Benchmark metadata/docs alignment without task_type breaking changes

Review Flow (Order)

Review the stack in this exact order:

feat(vendor): add LLM2Vec embedding model #637 vendor
feat(infrastructure): add VLM base classes and utilities #638 infrastructure
feat(text-metrics): split qa_accuracy #645 qa_accuracy
feat(text-metrics): split oneig_alignment #646 oneig_alignment
feat(text-metrics): split text_score pair #647 text_score pair
feat(text-metrics): split oneig_reasoning #648 oneig_reasoning
feat(vision-metrics): split vqa #649 vqa
feat(vision-metrics): split vie_score #650 vie_score
feat(vision-metrics): split img_edit_score #651 img_edit_score
feat(e2e-tests): stacked e2e after split metrics #641 e2e tests

This PR in the flow (9/10)

Review after PR feat(vision-metrics): split vie_score #650.
Next PR to review: feat(e2e-tests): stacked e2e after split metrics #641.
Confirm this PR's tests and scope before continuing.

Adds ImageEditScoreMetric with ImgEdit benchmark wiring and regression coverage for negative score clamping. Made-with: Cursor

Adopt the benchmark documentation and task-type constant cleanup from the umbrella VLM branch while keeping the legacy text_to_image literal for backward compatibility and removing private .mine references. Co-authored-by: Cursor <cursoragent@cursor.com>

feat(vision-metrics): split img_edit_score into dedicated branch

8b33b58

Adds ImageEditScoreMetric with ImgEdit benchmark wiring and regression coverage for negative score clamping. Made-with: Cursor

This was referenced Apr 28, 2026

feat(text-metrics): add text-based VLM judge metrics #639

Closed

feat(vision-metrics): add vision-based VLM judge metrics #640

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vision-metrics): split img_edit_score#651

feat(vision-metrics): split img_edit_score#651
davidberenstein1957 wants to merge 2 commits intofeat/vlm-pr-4b-vie-scorefrom
feat/vlm-pr-4c-img-edit-score

davidberenstein1957 commented Apr 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

davidberenstein1957 commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Stack Position

Files

Test Plan

Review Focus

Review Flow (Order)

This PR in the flow (9/10)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

davidberenstein1957 commented Apr 28, 2026 •

edited

Loading