Bump prompt-testing reviewer to Opus 4.7 by mattgodbolt-molty · Pull Request #16 · compiler-explorer/explain

mattgodbolt-molty · 2026-05-06T18:34:48Z

Summary

Bump the default reviewer model from claude-opus-4-6 to claude-opus-4-7 in prompt_testing/reviewer.py and both spots in prompt_testing/cli.py (run --review-model and the standalone review --model). Same $5/$25 price tier, stronger reasoning.
Drop the hard-coded temperature=0.0 from the reviewer's messages.create call — Opus 4.7 rejects the parameter (temperature is deprecated for this model).
Replace the hard-coded $15/$75 cost calc in _run_reviews with app.model_costs.get_model_cost(model). The old calc would have over-reported review cost by 3× against the new Opus pricing and would silently drift again on any future model bump.

Test plan

uv run pytest — 92 passing
uv run pre-commit run --all-files
Live smoke test against the Anthropic API (3 cases × Sonnet 4.6 explainer + Opus 4.7 reviewer):
```
Running 3 test cases with prompt: current
  [1/3] ✓ square_cpp_o1 (in=895 out=381)
  [2/3] ✓ basic_inline_001 (in=931 out=469)
  [3/3] ✓ factorial_beginner_assembly (in=1293 out=611)

Reviewing 3 results with claude-opus-4-7...
  [1/3] ✗ square_cpp_o1 (2 issues, $0.0155)
  [2/3] ✓ basic_inline_001 (0 issues, $0.0081)
  [3/3] ✓ factorial_beginner_assembly (1 issues, $0.0150)

3/3 succeeded, total cost: $0.0698
Correctness: 2/3 passed
Review cost: $0.0385 (claude-opus-4-7)
```
Cost arithmetic checks out: (1120+1254+1491) × $5/M + (394+74+300) × $25/M = $0.0385 ✓

Notable side-effect: Opus 4.7 caught a real factual error in the existing Sonnet 4.6 explanation for square_cpp_o1 — the explanation claimed imul eax, edi, edi was a valid three-operand alternative, when in fact the third operand of three-operand imul must be an immediate. Useful signal that the upgraded reviewer is paying off.

🤖 Generated with Claude Code

- Update default reviewer model in `reviewer.py`, `cli.py run --review`, and the standalone `cli.py review` command from `claude-opus-4-6` to `claude-opus-4-7`. Same `$5/$25` price tier with stronger reasoning. - Drop the hard-coded `temperature=0.0` from the reviewer's API call: Opus 4.7 rejects the parameter (`temperature is deprecated for this model`). - Replace the hard-coded `$15/$75` cost calc in `_run_reviews` with a lookup via `app.model_costs.get_model_cost(model)`. The previous calc would have over-reported review cost by 3x against the new Opus pricing and would silently drift again on any future model bump. Smoke-tested locally against three live cases (square_cpp_o1, basic_inline_001, factorial_beginner_assembly) — Opus 4.7 ran cleanly and flagged a real factual error in the Sonnet 4.6 explanation for square_cpp_o1 (incorrect three-operand `imul` form). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mattgodbolt approved these changes May 6, 2026

View reviewed changes

mattgodbolt merged commit f630c1a into main May 6, 2026
2 checks passed

mattgodbolt deleted the bump-reviewer-to-opus-4-7 branch May 6, 2026 19:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump prompt-testing reviewer to Opus 4.7#16

Bump prompt-testing reviewer to Opus 4.7#16
mattgodbolt merged 1 commit intomainfrom
bump-reviewer-to-opus-4-7

mattgodbolt-molty commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mattgodbolt-molty commented May 6, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants