leaderboard: BEIR-style matrix UI + per-LLM/method/retriever views + reproduce snippet#26
Merged
Merged
Conversation
…copyable reproduce snippet
Full UI redesign of the leaderboard.
Home page (/) is now a single sortable matrix table: rows are
(method × model × retriever) — 120 combinations — and columns are
dataset × primary metric (nDCG@10 by default, with a chip toggle for
recall). Filter chips for retriever and model collapse the view in one
click. Every cell links to the run-detail page. Best-in-column scores
are bolded in the accent color.
Per-dataset table (/datasets/{id}) drops the `params` column that was
pushing the table off-screen; gains a `Retriever` column instead.
New per-axis detail pages:
- /models/{slug} — methods × retrievers for one LLM
- /methods/{id} — models × retrievers for one method (incl. logical
variants like Q2D-ZS/FS/CoT as separate pages)
- /retrievers/ — index of all retrievers
- /retrievers/{id} — methods × models for one retriever
Per-run page (/runs/{id}) gains a "Reproduce this run" section with
three copyable code blocks: the QueryGym Python snippet for the
reformulation, the Pyserini retrieval command for the run's retriever
(BM25/SPLADE++/BGE), and a trec_eval invocation. Each block has a copy
button and wraps inside the panel — no horizontal scroll. Artifact
links only render when present in the run's `artifacts` block.
Plumbing in build-data.ts:
- Reads retriever_registry.yaml to populate display names.
- Emits matrix.json + retrievers.json + per-model/method/retriever
view shards.
- A `logicalMethod` helper folds noisy legacy params and surfaces
query2doc's mode variants (zs/fs/cot) as distinct methods, matching
how the SIGIR paper presents them.
- Within each (logical_method × model × retriever) axis, multiple
legacy variants are deduped by max value per metric cell.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d68161e to
b298b5c
Compare
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Full UI redesign of
leaderboard.querygym.com.Home page is now a single sortable matrix table — rows are
(method × model × retriever)(120 combinations), columns aredataset × metric. Filter chips for retriever and model; chip toggle between nDCG@10 (default) and recall. Best-in-column cells are bolded. Every cell links to the per-run page.Per-dataset table (
/datasets/{id}) drops the wideparamscolumn that was pushing the table off-screen; gains aRetrievercolumn.New per-axis detail pages:
/models/{slug}/methods/{id}Q2D-ZS/FS/CoTis its own page)/retrievers//retrievers/{id}Per-run page (
/runs/{id}) gains a "Reproduce this run" section: three copyable code blocks (QueryGym Python reformulation, Pyserini retrieval per retriever,trec_eval). Each block has a copy button and wraps inside its panel — no horizontal scroll. Artifact links render only when present.Plumbing in
build-data.ts:retriever_registry.yamlfor display names + paradigms.matrix.json,retrievers.json, and per-model/method/retriever view shards.logicalMethodhelper folds noisy legacy params (e.g. extrajudge_rel_mode, machine-local paths) and surfacesquery2doc's mode variants (zs/fs/cot) as distinct logical methods. Within each axis cell, multiple legacy variants are deduped by max metric value.Test Plan
pnpm -F @qg/leaderboard build— clean, 1112 pages (+17 from new per-axis pages).build-dataproduces 120 matrix rows (10 methods × 4 models × 3 retrievers).🤖 Generated with Claude Code