Skip to content

feat(vision-metrics): split img_edit_score#651

Open
davidberenstein1957 wants to merge 2 commits intofeat/vlm-pr-4b-vie-scorefrom
feat/vlm-pr-4c-img-edit-score
Open

feat(vision-metrics): split img_edit_score#651
davidberenstein1957 wants to merge 2 commits intofeat/vlm-pr-4b-vie-scorefrom
feat/vlm-pr-4c-img-edit-score

Conversation

@davidberenstein1957
Copy link
Copy Markdown
Member

@davidberenstein1957 davidberenstein1957 commented Apr 28, 2026

Summary

Splits img_edit_score into its own stacked PR, adds ImageEditScoreMetric, and wires ImgEdit benchmark entry with clamping regression coverage.

This PR also carries benchmark-paper alignment cleanup from the umbrella work while preserving compatibility:

  • keeps text_to_image task_type literal behavior
  • introduces TASK_TYPE_* constants for readability
  • removes private-reference style notes

Stack Position

Files

  • src/pruna/evaluation/metrics/metric_img_edit_score.py
  • src/pruna/evaluation/benchmarks.py
  • tests/evaluation/test_vision_metrics.py

Test Plan

uv run pytest tests/evaluation/test_vision_metrics.py -k img_edit_score

Review Focus

  • ImgEdit score clamping behavior
  • Benchmark metadata/docs alignment without task_type breaking changes

Review Flow (Order)

Review the stack in this exact order:

  1. feat(vendor): add LLM2Vec embedding model #637 vendor
  2. feat(infrastructure): add VLM base classes and utilities #638 infrastructure
  3. feat(text-metrics): split qa_accuracy #645 qa_accuracy
  4. feat(text-metrics): split oneig_alignment #646 oneig_alignment
  5. feat(text-metrics): split text_score pair #647 text_score pair
  6. feat(text-metrics): split oneig_reasoning #648 oneig_reasoning
  7. feat(vision-metrics): split vqa #649 vqa
  8. feat(vision-metrics): split vie_score #650 vie_score
  9. feat(vision-metrics): split img_edit_score #651 img_edit_score
  10. feat(e2e-tests): stacked e2e after split metrics #641 e2e tests

This PR in the flow (9/10)

Adds ImageEditScoreMetric with ImgEdit benchmark wiring and regression coverage for negative score clamping.

Made-with: Cursor
Adopt the benchmark documentation and task-type constant cleanup from the
umbrella VLM branch while keeping the legacy text_to_image literal for
backward compatibility and removing private .mine references.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant