Add descriptive dataset statistics plots#126
Conversation
tommasocerruti
left a comment
There was a problem hiding this comment.
I have just one question about how the score summaries are grouped, then this looks good :)
| 'score_summaries': grouped_summaries( | ||
| rows, | ||
| 'score', | ||
| ('benchmark', 'evaluation_name'), |
There was a problem hiding this comment.
Should these grouped score summaries include a metric identity field too (e.g., metric_id)? The code’s benchmark field comes from source_data.dataset_name in the datastore (as you defined in line 60 of this file), and some evaluations in the datastore report multiple metrics under the same dataset_name + evaluation_name pair. For example, in this arc-agi file, ARC Prize evaluations leaderboard JSON + v1_Semi_Private is used for both score (accuracy) and cost_per_task (cost). I believe grouping only by benchmark + evaluation_name would average those different quantities together, do you agree?
There was a problem hiding this comment.
This sounds reasonable. Let me investigate further.
…/yananlong/every_eval_ever into descriptive-statistics-python-pr
This reverts commit f0e5dcd.
|
@yananlong are you still working on this, or can I start reviewing it? |
|
Still working. I am making different plots now. More on this later today.
|
|
Great, feel free to ping me when you are done. |
|
Is this still WIP @yananlong ? |
Summary
Validation