Skip to content

Combine every eval ever#12

Open
Chibukach wants to merge 6 commits into
mainfrom
combine_every_eval_ever
Open

Combine every eval ever#12
Chibukach wants to merge 6 commits into
mainfrom
combine_every_eval_ever

Conversation

@Chibukach
Copy link
Copy Markdown
Collaborator

@Chibukach Chibukach commented May 13, 2026

This PR implements evaluation result aggregation functionality to average results across multiple random seeds. When running evaluations with different random seeds (for statistical reliability), this feature automatically combines the results.

Changes include:

  • Core functionality to read and group evaluation results by benchmark
  • Logic to compute average accuracy scores across different random seeds
  • Support for both simple tasks and tasks with subtasks
  • Updated CLI interface
  • Tests for the seed merging functionality

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant