Skip to content

leaderboard: BEIR-style matrix UI + per-LLM/method/retriever views + reproduce snippet#26

Merged
radinhamidi merged 1 commit into
mainfrom
leaderboard/ui-redesign
May 20, 2026
Merged

leaderboard: BEIR-style matrix UI + per-LLM/method/retriever views + reproduce snippet#26
radinhamidi merged 1 commit into
mainfrom
leaderboard/ui-redesign

Conversation

@radinhamidi
Copy link
Copy Markdown
Member

@radinhamidi radinhamidi commented May 20, 2026

Summary

Full UI redesign of leaderboard.querygym.com.

Home page is now a single sortable matrix table — rows are (method × model × retriever) (120 combinations), columns are dataset × metric. Filter chips for retriever and model; chip toggle between nDCG@10 (default) and recall. Best-in-column cells are bolded. Every cell links to the per-run page.

Per-dataset table (/datasets/{id}) drops the wide params column that was pushing the table off-screen; gains a Retriever column.

New per-axis detail pages:

Path What it shows
/models/{slug} Methods × retrievers for one LLM
/methods/{id} Models × retrievers for one method (each Q2D-ZS/FS/CoT is its own page)
/retrievers/ Index of all retrievers
/retrievers/{id} Methods × models for one retriever

Per-run page (/runs/{id}) gains a "Reproduce this run" section: three copyable code blocks (QueryGym Python reformulation, Pyserini retrieval per retriever, trec_eval). Each block has a copy button and wraps inside its panel — no horizontal scroll. Artifact links render only when present.

Plumbing in build-data.ts:

  • Reads retriever_registry.yaml for display names + paradigms.
  • Emits matrix.json, retrievers.json, and per-model/method/retriever view shards.
  • A logicalMethod helper folds noisy legacy params (e.g. extra judge_rel_mode, machine-local paths) and surfaces query2doc's mode variants (zs/fs/cot) as distinct logical methods. Within each axis cell, multiple legacy variants are deduped by max metric value.

Test Plan

  • pnpm -F @qg/leaderboard build — clean, 1112 pages (+17 from new per-axis pages).
  • build-data produces 120 matrix rows (10 methods × 4 models × 3 retrievers).
  • After merge: CF Pages auto-deploys; verify matrix renders, filters work, per-axis pages exist, run-detail Reproduce block shows the right Pyserini command per retriever.

🤖 Generated with Claude Code

…copyable reproduce snippet

Full UI redesign of the leaderboard.

Home page (/) is now a single sortable matrix table: rows are
(method × model × retriever) — 120 combinations — and columns are
dataset × primary metric (nDCG@10 by default, with a chip toggle for
recall). Filter chips for retriever and model collapse the view in one
click. Every cell links to the run-detail page. Best-in-column scores
are bolded in the accent color.

Per-dataset table (/datasets/{id}) drops the `params` column that was
pushing the table off-screen; gains a `Retriever` column instead.

New per-axis detail pages:
- /models/{slug}   — methods × retrievers for one LLM
- /methods/{id}    — models × retrievers for one method (incl. logical
                      variants like Q2D-ZS/FS/CoT as separate pages)
- /retrievers/     — index of all retrievers
- /retrievers/{id} — methods × models for one retriever

Per-run page (/runs/{id}) gains a "Reproduce this run" section with
three copyable code blocks: the QueryGym Python snippet for the
reformulation, the Pyserini retrieval command for the run's retriever
(BM25/SPLADE++/BGE), and a trec_eval invocation. Each block has a copy
button and wraps inside the panel — no horizontal scroll. Artifact
links only render when present in the run's `artifacts` block.

Plumbing in build-data.ts:
- Reads retriever_registry.yaml to populate display names.
- Emits matrix.json + retrievers.json + per-model/method/retriever
  view shards.
- A `logicalMethod` helper folds noisy legacy params and surfaces
  query2doc's mode variants (zs/fs/cot) as distinct methods, matching
  how the SIGIR paper presents them.
- Within each (logical_method × model × retriever) axis, multiple
  legacy variants are deduped by max value per metric cell.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@radinhamidi radinhamidi force-pushed the leaderboard/ui-redesign branch from d68161e to b298b5c Compare May 20, 2026 05:44
@radinhamidi radinhamidi merged commit 28fdd5e into main May 20, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant