Skip to content

evals: Run 3 judges, average their scores, return median analysis #2874

evals: Run 3 judges, average their scores, return median analysis

evals: Run 3 judges, average their scores, return median analysis #2874

Triggered via push October 31, 2025 21:57
Status Success
Total duration 12s
Artifacts

evals.yml

on: push
run-evals
8s
run-evals
Fit to window
Zoom out
Zoom in