Write up the key findings as a blog post / HF blog:
- Rankings change completely by document type (BPL vs Britannica vs UFO)
- Document type matters more than model size (0.9B beats 4B on some collections)
- Judges agree on clusters but disagree on ordering within clusters
- Tool lets anyone create their own per-collection leaderboard
Link to the viewer Space, Hub datasets, and this repo.
Write up the key findings as a blog post / HF blog:
Link to the viewer Space, Hub datasets, and this repo.