Skip to content

Pull requests: huggingface/lighteval

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Fix TypeError in aa_omniscience_prompt
#1161 opened Jan 22, 2026 by pjavanrood Loading…
Fix split loading error in bigbench
#1159 opened Jan 22, 2026 by pjavanrood Loading…
Fix RecursionError in imdb_contrastset_prompt
#1155 opened Jan 22, 2026 by pjavanrood Loading…
Fix non-existent evaluation splits in lextreme
#1151 opened Jan 22, 2026 by pjavanrood Loading…
Fix evaluation split config in lsat_qa
#1149 opened Jan 22, 2026 by pjavanrood Loading…
Improve NarrativeQA metrics and prompt structure
#1147 opened Jan 22, 2026 by pjavanrood Loading…
Fix key mismatch and context access in PubMedQA
#1143 opened Jan 22, 2026 by pjavanrood Loading…
Fix TypeError in real_toxicity_prompts
#1141 opened Jan 22, 2026 by pjavanrood Loading…
Fix column mismatch and metric in SimpleQA
#1139 opened Jan 22, 2026 by pjavanrood Loading…
Fix subset names in StoryCloze
#1137 opened Jan 22, 2026 by pjavanrood Loading…
Fix hardcoded path in tiny_benchmarks
#1133 opened Jan 22, 2026 by pjavanrood Loading…
Fix KeyError in truthful_qa_generative_prompt
#1131 opened Jan 22, 2026 by pjavanrood Loading…
Fix specific error in truthfulqa
#1127 opened Jan 22, 2026 by ChenZiHong-Gavin Loading…
Integrate alyah benchmark
#1117 opened Jan 12, 2026 by amztheorytii Loading…
[EVAL] SciCode new-task
#1086 opened Nov 27, 2025 by akshathmangudi Loading…
Evals on the hub
#1082 opened Nov 24, 2025 by NathanHB Loading…
Feature/tvd mi metric feature
#1080 opened Nov 22, 2025 by zrobertson466920 Loading…
ProTip! Type g i on any issue or pull request to go back to the issue listing page.