Skip to content

Deduplicate message pieces before batch scoring#1504

Open
biefan wants to merge 1 commit intoAzure:mainfrom
biefan:dedupe-batch-scorer-response-pieces
Open

Deduplicate message pieces before batch scoring#1504
biefan wants to merge 1 commit intoAzure:mainfrom
biefan:dedupe-batch-scorer-response-pieces

Conversation

@biefan
Copy link
Contributor

@biefan biefan commented Mar 17, 2026

Summary

  • apply BatchScorer._remove_duplicates() before grouping filtered message pieces into conversations
  • fail fast when filtering leaves no original message pieces to score
  • add a regression test that confirms duplicate message pieces are not forwarded to batch scoring

Problem

BatchScorer.score_responses_by_filters_async() fetches message pieces from memory and groups them into conversations immediately, even though the class already has a _remove_duplicates() helper. When both original and duplicate MessagePiece rows are returned, duplicate pieces are forwarded to score_prompts_batch_async() and can be scored multiple times.

Validation

  • .venv/bin/pytest tests/unit/score -q
  • .venv/bin/ruff check pyrit/score/batch_scorer.py tests/unit/score/test_batch_scorer.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant