Add objective Overall / Value / Capability score columns to the models table by huncho-tensei · Pull Request #1892 · anomalyco/models.dev

huncho-tensei · 2026-05-28T23:35:37Z

Summary

Adds three sortable, transparently-computed score columns to the models table — Overall, Value, and Capability — plus a dynamic rank (#) column that renumbers with the current sort.

The goal is to let people rank the whole catalog by objective criteria without leaving the table they already use. Scores are derived entirely from existing catalog fields — no benchmarks, no hand-grading, no external data.

New column layout:

#  |  Provider  |  Model  |  Overall  |  Value  |  Capability  |  … (all existing columns unchanged)

The table defaults to Overall, descending. All three score columns sort like every other column, so users can switch lens with one click.

How the scores are computed

Four normalized 0–100 components are calculated from objective fields, then blended with weights kept in one documented place (packages/web/src/score.ts):

Component	Source fields	Direction
capability	`tool_call`, `reasoning`, `structured_output`, `temperature` + input/output modality breadth	higher = better
cost	blended `cost.input + cost.output` ($/1M), log-scaled then inverted	cheaper = better; free = top; unknown = neutral 50
context	`limit.context` + `limit.output`, log-scaled	higher = better
recency	`release_date`	newer = better

Each component is min-max normalized across the whole dataset (a missing/unparseable field collapses to a neutral 50, so it never silently wins or loses). The three lenses are just different weightings:

Lens	capability	cost	context	recency
Overall	0.40	0.30	0.20	0.10
Value	0.35	0.50	0.10	0.05
Capability	0.60	0.15	0.20	0.05

The weights are the only opinion in the change and are isolated in a single WEIGHTS object so they're trivial to audit or tune.

What the score does — and does NOT — measure

This is important and stated up front: the catalog has no quality/benchmark field, so the score cannot and does not measure model "intelligence." "Capability" here means breadth of declared features and modalities, not how good a model's outputs are.

A direct consequence, visible in the live data: broad, cheap, omni-modal models (and meta/auto-routers, which declare every modality at low listed cost) rank at the very top, above expensive frontier models. That is correct given these inputs — it's a spec-breadth-per-dollar ranking, not a smartness ranking.

This is also the reasoning behind shipping it as sortable columns rather than one decreed ranking: the data stays neutral, and the user chooses the lens that fits their use case. If a future schema ever adds an objective quality signal, it drops straight into the existing component blend.

Scope

Web package only. The canonical api.json / core data is unchanged — scoring is a presentation-layer concern and does not pollute the data API.
Files touched: score.ts (new), shared.ts, render.tsx, index.ts, index.css.

Validation

bun validate passes.
cd packages/web && bun run build succeeds; rendered HTML contains all new columns and computed scores across the dataset.
Sort invariant verified: 28 sortable headers ↔ 28 sortValues entries.
Score distribution sanity-checked (spans ~22–90; image/TTS models correctly rank lowest).

Happy to adjust the weights, drop to two columns (Overall + Value), or gate this behind discussion if a built-in ranking isn't a direction you want — feedback welcome.

Adds three sortable, transparently-computed score columns to the models table, plus a dynamic rank (#) column that renumbers with the current sort. Scores are derived entirely from existing objective catalog fields (cost, context window, output limit, capability flags, modality breadth, release date) — no benchmarks or hand-grading. Four normalized 0-100 components (capability, cost-efficiency, context, recency) are blended into three lenses with weights documented in one place in score.ts: - Overall: well-rounded "best overall" - Value: cost-efficiency weighted (cheap-yet-capable) - Capability: feature/modality breadth weighted The table defaults to Overall (descending). All three columns sort like any other column, so users can pick the lens that fits their use case. Scope is web-only; the canonical api.json data is unchanged.

huncho-tensei · 2026-05-28T23:40:14Z

Rationale and design discussion in #1893.

huncho-tensei mentioned this pull request May 28, 2026

Proposal: objective sortable ranking columns (Overall / Value / Capability) #1893

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add objective Overall / Value / Capability score columns to the models table#1892

Add objective Overall / Value / Capability score columns to the models table#1892
huncho-tensei wants to merge 1 commit into
anomalyco:devfrom
huncho-tensei:feat/objective-model-scores

huncho-tensei commented May 28, 2026

Uh oh!

huncho-tensei commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

huncho-tensei commented May 28, 2026

Summary

How the scores are computed

What the score does — and does NOT — measure

Scope

Validation

Uh oh!

huncho-tensei commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant