Skip to content

Bump prompt-testing reviewer to Opus 4.7#16

Merged
mattgodbolt merged 1 commit intomainfrom
bump-reviewer-to-opus-4-7
May 6, 2026
Merged

Bump prompt-testing reviewer to Opus 4.7#16
mattgodbolt merged 1 commit intomainfrom
bump-reviewer-to-opus-4-7

Conversation

@mattgodbolt-molty
Copy link
Copy Markdown
Contributor

Summary

  • Bump the default reviewer model from claude-opus-4-6 to claude-opus-4-7 in prompt_testing/reviewer.py and both spots in prompt_testing/cli.py (run --review-model and the standalone review --model). Same $5/$25 price tier, stronger reasoning.
  • Drop the hard-coded temperature=0.0 from the reviewer's messages.create call — Opus 4.7 rejects the parameter (temperature is deprecated for this model).
  • Replace the hard-coded $15/$75 cost calc in _run_reviews with app.model_costs.get_model_cost(model). The old calc would have over-reported review cost by 3× against the new Opus pricing and would silently drift again on any future model bump.

Test plan

  • uv run pytest — 92 passing

  • uv run pre-commit run --all-files

  • Live smoke test against the Anthropic API (3 cases × Sonnet 4.6 explainer + Opus 4.7 reviewer):

    Running 3 test cases with prompt: current
      [1/3] ✓ square_cpp_o1 (in=895 out=381)
      [2/3] ✓ basic_inline_001 (in=931 out=469)
      [3/3] ✓ factorial_beginner_assembly (in=1293 out=611)
    
    Reviewing 3 results with claude-opus-4-7...
      [1/3] ✗ square_cpp_o1 (2 issues, $0.0155)
      [2/3] ✓ basic_inline_001 (0 issues, $0.0081)
      [3/3] ✓ factorial_beginner_assembly (1 issues, $0.0150)
    
    3/3 succeeded, total cost: $0.0698
    Correctness: 2/3 passed
    Review cost: $0.0385 (claude-opus-4-7)
    

    Cost arithmetic checks out: (1120+1254+1491) × $5/M + (394+74+300) × $25/M = $0.0385

    Notable side-effect: Opus 4.7 caught a real factual error in the existing Sonnet 4.6 explanation for square_cpp_o1 — the explanation claimed imul eax, edi, edi was a valid three-operand alternative, when in fact the third operand of three-operand imul must be an immediate. Useful signal that the upgraded reviewer is paying off.

🤖 Generated with Claude Code

- Update default reviewer model in `reviewer.py`, `cli.py run --review`,
  and the standalone `cli.py review` command from `claude-opus-4-6` to
  `claude-opus-4-7`. Same `$5/$25` price tier with stronger reasoning.
- Drop the hard-coded `temperature=0.0` from the reviewer's API call:
  Opus 4.7 rejects the parameter (`temperature is deprecated for this
  model`).
- Replace the hard-coded `$15/$75` cost calc in `_run_reviews` with a
  lookup via `app.model_costs.get_model_cost(model)`. The previous calc
  would have over-reported review cost by 3x against the new Opus pricing
  and would silently drift again on any future model bump.

Smoke-tested locally against three live cases (square_cpp_o1,
basic_inline_001, factorial_beginner_assembly) — Opus 4.7 ran cleanly
and flagged a real factual error in the Sonnet 4.6 explanation for
square_cpp_o1 (incorrect three-operand `imul` form).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mattgodbolt mattgodbolt merged commit f630c1a into main May 6, 2026
2 checks passed
@mattgodbolt mattgodbolt deleted the bump-reviewer-to-opus-4-7 branch May 6, 2026 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants