BaseInfinity · BaseInfinity · Jun 3, 2026 · Jun 3, 2026
diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json
@@ -13,7 +13,7 @@
       "name": "sdlc-wizard",
       "source": ".",
       "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
-      "version": "1.77.0",
+      "version": "1.78.0",
       "author": {
         "name": "Stefan Ayala"
       },

diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json
@@ -1,6 +1,6 @@
 {
   "name": "sdlc-wizard",
-  "version": "1.77.0",
+  "version": "1.78.0",
   "description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
   "author": {
     "name": "Stefan Ayala",

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,16 @@ All notable changes to the SDLC Wizard.
 
 > **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.
 
+## [1.78.0] - 2026-06-02
+
+### Changed
+
+- **Bumped recommended model from Opus 4.7 → Opus 4.8** (closes #365). Opus 4.8 launched 2026-05-28; day 5 in-the-wild sentiment settled positive, proof-of-life via `claude --print --model claude-opus-4-8` confirmed reachability, and the `opus[1m]` alias auto-resolves to latest Opus on CC v2.1.154+ (no settings.json template change needed — alias does the work). Updated prose across `SDLC.md` (Recommended Model row + effort warning), `CLAUDE_CODE_SDLC_WIZARD.md` (~11 mentions in effort table + autocompact + mixed-mode tier + 1M-context section), `skills/sdlc/SKILL.md`, `skills/setup/SKILL.md` (mixed-mode + flagship tier suggestions), `skills/update/SKILL.md`, `hooks/model-effort-check.sh` (warning text), and `cli/lib/repo-complexity.js` (tier comments). Min CC bumped v2.1.111+ → v2.1.154+ (required to resolve `opus[1m]` to 4.8). Effort semantics preserved — strict effort behavior introduced in 4.7 carried forward to 4.8, so `max` remains the recommended default and `xhigh` the floor.
+
+### Notes
+
+- **Un-run gates 2+3 tracked as post-deploy follow-up obligations on #365.** Gate 2 (A/B coder quality vs 4.7 on real PRs) and Gate 3 (dogfood for 24h before bump) were both deferred. If real-world use surfaces the system-card-flagged regressions for 4.8 — prompt-injection +60% on Gray Swan, file-deletion tendency, or eval-awareness affecting wizard output — revert via a single PR that flips the 4.7↔4.8 prose. Settings.json template unchanged means revert is prose-only.
+
 ## [1.77.0] - 2026-05-24
 
 ### Added

diff --git a/CI_CD.md b/CI_CD.md
@@ -250,7 +250,7 @@ The `review` job is **skipped on `BaseInfinity/claude-sdlc-wizard` self-PRs**
 
 ## Local Codex Audit of CI Logs (Cross-Model)
 
-The GH `pr-review.yml` workflow uses Claude Opus 4.7. For adversarial diversity, the local shepherd loop runs a **second pass with Codex xhigh** against the CI logs themselves — not just the code. A second model catches things the first missed (silent test exclusions, degraded E2E scores on a green checkmark, warnings promoted to errors in a later version).
+The GH `pr-review.yml` workflow uses Claude Opus 4.8 (via `claude-code-action@v1` default). For adversarial diversity, the local shepherd loop runs a **second pass with Codex xhigh** against the CI logs themselves — not just the code. A second model catches things the first missed (silent test exclusions, degraded E2E scores on a green checkmark, warnings promoted to errors in a later version).
 
 ```bash
 # After CI reports pass/fail:

diff --git a/CLAUDE_CODE_SDLC_WIZARD.md b/CLAUDE_CODE_SDLC_WIZARD.md
@@ -249,26 +249,26 @@ When Anthropic provides official plugins or tools that handle something:
 
 Claude Code's **effort level** controls how much thinking the model does before responding. Higher effort = deeper reasoning but more tokens.
 
-> ⚠️ **On Opus 4.7, effort below `xhigh` breaks SDLC compliance in practice.** Unlike 4.6, Opus 4.7 respects effort levels *strictly* — at `high` or below it scopes work tighter (shallow reasoning, skipped TDD, no self-review) rather than going above-and-beyond. Treat the table below accordingly: **`max` is the recommended default, `xhigh` is the floor**, `high` or below is for trivial grep/search subagents only.
+> ⚠️ **On Opus 4.8, effort below `xhigh` breaks SDLC compliance in practice.** Inherited from 4.7, Opus 4.8 respects effort levels *strictly* — at `high` or below it scopes work tighter (shallow reasoning, skipped TDD, no self-review) rather than going above-and-beyond. Treat the table below accordingly: **`max` is the recommended default, `xhigh` is the floor**, `high` or below is for trivial grep/search subagents only.
 
 | Level | When to Use | How to Set |
 |-------|-------------|------------|
-| `high` or below | **Not for SDLC work on Opus 4.7.** Only for trivial grep/search subagents or one-shot questions that don't require planning | `effort: high` in a specific subagent frontmatter only |
-| `xhigh` | **Floor for SDLC work on Opus 4.7.** Long-running tasks, repeated tool calls, deep exploration. Claude Code defaults to this on Opus 4.7 | `/effort xhigh` or set in skill frontmatter |
-| `max` | **Recommended default for Opus 4.7 SDLC work.** Multi-file changes, architecture decisions, debugging, cross-model reviews, any task touching wizard/skill/CI code | `/effort max` (session only — resets next session) |
-
-**Effort level changes in Opus 4.7 (April 2026):**
-- **`xhigh` is new** — sits between `high` and `max`, designed for coding and agentic work (30+ minute tasks with token budgets in the millions)
-- **Claude Code now defaults to `xhigh`** on Opus 4.7 for all plans
-- **Opus 4.7 respects effort levels more strictly** than 4.6 — at lower levels it scopes work tighter instead of going above and beyond. If you see shallow reasoning, raise effort rather than prompting around it
-- **`budget_tokens` is deprecated** on Opus 4.7 — use adaptive thinking with effort instead
+| `high` or below | **Not for SDLC work on Opus 4.8.** Only for trivial grep/search subagents or one-shot questions that don't require planning | `effort: high` in a specific subagent frontmatter only |
+| `xhigh` | **Floor for SDLC work on Opus 4.8.** Long-running tasks, repeated tool calls, deep exploration. Claude Code defaults to this on Opus 4.7+ | `/effort xhigh` or set in skill frontmatter |
+| `max` | **Recommended default for Opus 4.8 SDLC work.** Multi-file changes, architecture decisions, debugging, cross-model reviews, any task touching wizard/skill/CI code | `/effort max` (session only — resets next session) |
+
+**Strict effort behavior (Opus 4.7+, carried forward to 4.8):**
+- **`xhigh` was introduced in 4.7** — sits between `high` and `max`, designed for coding and agentic work (30+ minute tasks with token budgets in the millions)
+- **Claude Code defaults to `xhigh`** on Opus 4.7+ for all plans
+- **Opus 4.7+ respects effort levels more strictly** than 4.6 — at lower levels it scopes work tighter instead of going above and beyond. If you see shallow reasoning, raise effort rather than prompting around it
+- **`budget_tokens` is deprecated** on Opus 4.7+ — use adaptive thinking with effort instead
 - When running at `xhigh` or `max`, set a large `max_tokens` (64k+) so the model has room to think across subagents and tool calls
 
 **Why `high` was the previous default:** Claude Code uses **adaptive thinking** to dynamically allocate reasoning budget per turn. On Pro and Max plans, the default effort level was **medium (85)**, which causes the model to under-allocate reasoning on complex multi-step tasks — leading to shallow analysis, missed edge cases, and "lazy" outputs. This was [confirmed by Anthropic engineer Boris Cherny](https://github.com/anthropics/claude-code/issues/42796) and is documented at [code.claude.com](https://code.claude.com/docs/en/model-config). API, Team, and Enterprise plans default to high effort and are not affected.
 
 **Don't rely on the CC default — set effort yourself.** Anthropic's [2026-04-23 post-mortem](https://www.anthropic.com/engineering/april-23-postmortem) is independent third-party evidence that CC has flipped reasoning_effort defaults across versions (high → medium → xhigh/high). The default has changed before and will change again. The wizard's `model-effort-check.sh` hook nudges to `xhigh`/`max` at session start specifically because the in-product default is not load-bearing — it can shift release-to-release without notice. Set `/effort max` explicitly every session you do SDLC work, and treat any "I assumed the default was X" reasoning as a bug.
 
-The `/sdlc` skill sets `effort: high` in its frontmatter as a baseline, overriding the medium default on every SDLC invocation. **On Opus 4.7, run `/effort max` at session start** — the frontmatter is a floor, not a ceiling, and `max` is where SDLC-compliant work actually happens on 4.7.
+The `/sdlc` skill sets `effort: high` in its frontmatter as a baseline, overriding the medium default on every SDLC invocation. **On Opus 4.8, run `/effort max` at session start** — the frontmatter is a floor, not a ceiling, and `max` is where SDLC-compliant work actually happens on 4.8.
 
 **Nuclear option — disable adaptive thinking entirely:** Set `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1` in your environment or settings.json `env` block. This forces a fixed reasoning budget per turn instead of letting the model dynamically allocate. Use this if you observe persistent quality issues even with `effort: high`. See [Claude Code model config docs](https://code.claude.com/docs/en/model-config) for details.
 
@@ -966,7 +966,7 @@ Override the default auto-compact threshold with environment variables. These ar
 | `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` | Trigger compaction at this % of context capacity (1-100) | ~95% |
 | `CLAUDE_CODE_AUTO_COMPACT_WINDOW` | Override context capacity in tokens (useful for 1M models) | Model default |
 
-**Opt-in (issue #198):** The SDLC Wizard CLI ships `.claude/settings.json` with **no** `model` or `env` pin so Claude Code's auto-mode stays enabled. The setup skill's Step 9.5 asks whether to opt into `"model": "opus[1m]"` + `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` (tuned for the 1M window — compacts at ~300K). Default answer is **No**. Pinning the model at the top level tells Claude Code you've explicitly chosen a model and turns off per-turn model auto-selection — a real tradeoff, so we ask. Power users who want guaranteed Opus 4.7 + 1M context answer yes.
+**Opt-in (issue #198):** The SDLC Wizard CLI ships `.claude/settings.json` with **no** `model` or `env` pin so Claude Code's auto-mode stays enabled. The setup skill's Step 9.5 asks whether to opt into `"model": "opus[1m]"` + `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` (tuned for the 1M window — compacts at ~300K). Default answer is **No**. Pinning the model at the top level tells Claude Code you've explicitly chosen a model and turns off per-turn model auto-selection — a real tradeoff, so we ask. Power users who want guaranteed Opus 4.8 + 1M context answer yes.
 
 To opt in by hand, edit `.claude/settings.json`:
 
@@ -1021,13 +1021,13 @@ Claude Code supports both 200K and 1M context windows. **`opus[1m]` is an opt-in
 **Why `opus[1m]` is opt-in (issue #198):**
 - **Pinning disables auto-mode.** Max-plan users pay for Claude Code's per-turn model selection (Sonnet for cheap tasks, Opus for hard ones, plus weekly-limit smoothing). A top-level `model` gives that up.
 - **The 1M headroom has to earn it.** If your typical session stays under 150K, you're giving up auto-mode for headroom you're not using.
-- **Power users who want guaranteed Opus 4.7 + 1M** — go ahead, it's a real win for long shepherding sessions. Just make it a conscious choice, not a silent default.
+- **Power users who want guaranteed Opus 4.8 + 1M** — go ahead, it's a real win for long shepherding sessions. Just make it a conscious choice, not a silent default.
 
-**Opt in when:** you routinely cross 100K tokens in a single session (plan → TDD → review → CI shepherd on one feature), you want Opus 4.7 specifically (not Sonnet), and you're OK losing auto-mode.
+**Opt in when:** you routinely cross 100K tokens in a single session (plan → TDD → review → CI shepherd on one feature), you want Opus 4.8 specifically (not Sonnet), and you're OK losing auto-mode.
 
 **Stay on auto-mode (default) when:** you're unsure, your work is mixed short/long, or you want Claude Code to do the model math for you.
 
-**How to opt in:** run `/model opus[1m]` in your session (transient), or set `"model": "opus[1m]"` in `.claude/settings.json` (persistent). Requires Claude Code v2.1.111+ for Opus 4.7. The setup wizard's Step 9.5 also asks once, with default No.
+**How to opt in:** run `/model opus[1m]` in your session (transient), or set `"model": "opus[1m]"` in `.claude/settings.json` (persistent). Requires Claude Code v2.1.154+ for Opus 4.8 (the `opus[1m]` alias auto-resolves to the latest Opus). The setup wizard's Step 9.5 also asks once, with default No.
 
 **How to opt out:** remove the `model` line from `.claude/settings.json`, or run `/model` and pick "Default (recommended)".
 
@@ -1037,14 +1037,14 @@ Claude Code supports both 200K and 1M context windows. **`opus[1m]` is an opt-in
 
 ### Mixed-Mode Tier (Sonnet coder + Opus reviewer, roadmap #233)
 
-For trivial / blank / config-only / CRUD-style repos, full Opus 4.7 on every turn is overkill on the coder leg. The **mixed-mode tier** pins `model: "sonnet[1m]"` for in-session work while keeping the cross-model review layer (Codex / external reviewer) at the flagship — so the reviewer still catches what Sonnet missed.
+For trivial / blank / config-only / CRUD-style repos, full Opus 4.8 on every turn is overkill on the coder leg. The **mixed-mode tier** pins `model: "sonnet[1m]"` for in-session work while keeping the cross-model review layer (Codex / external reviewer) at the flagship — so the reviewer still catches what Sonnet missed.
 
 **The split:**
 
 | Layer | Mixed-mode tier | Flagship tier |
 |-------|----------------|---------------|
 | Coder (in-session CC) | `model: "sonnet[1m]"` | `model: "opus[1m]"` |
-| Cross-model reviewer (Codex etc.) | gpt-5.5 xhigh (or Opus 4.7 max via Bash) | gpt-5.5 xhigh (or Opus 4.7 max via Bash) |
+| Cross-model reviewer (Codex etc.) | gpt-5.5 xhigh (or Opus 4.8 max via Bash) | gpt-5.5 xhigh (or Opus 4.8 max via Bash) |
 | Effort floor (CC session) | xhigh; max preferred | xhigh; max preferred |
 
 The reviewer always stays at flagship — the whole point of mixed-mode is that adversarial review catches Sonnet's blind spots, so weakening the review leg defeats the savings.
@@ -1058,7 +1058,7 @@ The reviewer always stays at flagship — the whole point of mixed-mode is that
 **When to stay flagship:**
 - Stakes-flagged repo: anywhere `.env` / `secrets/` / `credentials/` exists. Force flagship even if LOC is tiny — leaks are catastrophic
 - Architecture work, debugging non-obvious bugs, security review, anything where the *coder's* judgment matters as much as the reviewer's
-- Long shepherd sessions (plan → TDD → review → CI loop) — they cross 100K tokens regularly and Opus 4.7 fits the window better in a single thread
+- Long shepherd sessions (plan → TDD → review → CI loop) — they cross 100K tokens regularly and Opus 4.8 fits the window better in a single thread
 
 **Auto-detection:** the setup wizard runs `cli/lib/repo-complexity.js` against the target repo and suggests the tier. Stakes flag (`.env` / `secrets/`) forces complex regardless of size. The user always picks the final answer — the heuristic is a hint, not a gate.
 
@@ -2983,7 +2983,7 @@ If deployment fails or post-deploy verification catches issues:
 
 **SDLC.md:**
 ```markdown
-<!-- SDLC Wizard Version: 1.77.0 -->
+<!-- SDLC Wizard Version: 1.78.0 -->
 <!-- Setup Date: [DATE] -->
 <!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
 <!-- Git Workflow: [PRs or Solo] -->

diff --git a/SDLC.md b/SDLC.md
@@ -1,4 +1,4 @@
-<!-- SDLC Wizard Version: 1.77.0 -->
+<!-- SDLC Wizard Version: 1.78.0 -->
 <!-- Setup Date: 2026-01-24 -->
 <!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
 <!-- Claude Code Baseline: v2.1.159 -->
@@ -10,14 +10,14 @@
 
 | Property | Value |
 |----------|-------|
-| Wizard Version | 1.77.0 |
-| Last Updated | 2026-06-01 |
-| Claude Code Minimum | v2.1.111+ (required for Opus 4.7 / `opus[1m]`); v2.1.105+ for `PreCompact` hook |
-| Claude Code Recommended | v2.1.159+ (latest at 2026-06-01) — unlocks Opus 4.8 (v2.1.154 — wizard holds on 4.7 pending #365 A/B), `.claude/skills` plugin auto-load (v2.1.157), enriched `tool_decision` telemetry via `OTEL_LOG_TOOL_DETAILS=1` (v2.1.157), native `/goal` (v2.1.139), `/code-review --comment` (v2.1.147), per-category `/usage` (v2.1.149), `$CLAUDE_EFFORT` env var for hooks (v2.1.133) |
-| Recommended Model | `opus[1m]` (Opus 4.7, 1M context) — run `/model opus[1m]` |
+| Wizard Version | 1.78.0 |
+| Last Updated | 2026-06-02 |
+| Claude Code Minimum | v2.1.154+ (required for Opus 4.8 / `opus[1m]`); v2.1.105+ for `PreCompact` hook |
+| Claude Code Recommended | v2.1.159+ (latest at 2026-06-02) — unlocks Opus 4.8 (v2.1.154), `.claude/skills` plugin auto-load (v2.1.157), enriched `tool_decision` telemetry via `OTEL_LOG_TOOL_DETAILS=1` (v2.1.157), native `/goal` (v2.1.139), `/code-review --comment` (v2.1.147), per-category `/usage` (v2.1.149), `$CLAUDE_EFFORT` env var for hooks (v2.1.133) |
+| Recommended Model | `opus[1m]` (Opus 4.8, 1M context) — run `/model opus[1m]` |
 | Recommended Effort | `max` (preferred) / `xhigh` (floor) — run `/effort max` |
 
-> **Effort warning (Opus 4.7):** `max` is the recommended default, `xhigh` is the absolute floor. Anything below `xhigh` (`high`, `medium`, `low`) causes Opus 4.7 to scope work tighter — shallow reasoning, skipped TDD, dropped self-review, SDLC non-compliance in practice. Use `high` or below only for trivial grep/search subagents, never for real SDLC work.
+> **Effort warning (Opus 4.8):** `max` is the recommended default, `xhigh` is the absolute floor. Anything below `xhigh` (`high`, `medium`, `low`) causes Opus 4.8 to scope work tighter — shallow reasoning, skipped TDD, dropped self-review, SDLC non-compliance in practice. Use `high` or below only for trivial grep/search subagents, never for real SDLC work.
 
 See `CLAUDE_CODE_SDLC_WIZARD.md` → "1M vs 200K Context Window" for the rationale and pricing notes.
 

diff --git a/cli/lib/repo-complexity.js b/cli/lib/repo-complexity.js
@@ -1,8 +1,8 @@
 // Roadmap #233: repo complexity heuristic for mixed-mode tier selection.
 //
 // Output: { tier: 'simple' | 'complex', score: <number>, signals: [...] }
-//   - 'simple' → setup wizard suggests mixed-mode (Sonnet 4.6 coder + Opus 4.7 reviewer)
-//   - 'complex' → setup wizard suggests full flagship (Opus 4.7 everywhere)
+//   - 'simple' → setup wizard suggests mixed-mode (Sonnet 4.6 coder + Opus 4.8 reviewer)
+//   - 'complex' → setup wizard suggests full flagship (Opus 4.8 everywhere)
 // Cross-model review (Codex / external) always stays at the flagship tier
 // regardless of coder selection — see CLAUDE_CODE_SDLC_WIZARD.md.
 //

diff --git a/hooks/model-effort-check.sh b/hooks/model-effort-check.sh
@@ -3,9 +3,9 @@
 #
 # Behavior (per ROADMAP #217):
 #   effort=max    -> silent (preferred default, above floor)
-#   effort=xhigh  -> silent (minimum floor on Opus 4.7)
+#   effort=xhigh  -> silent (minimum floor on Opus 4.8)
 #   effort=high|medium|low (or unset) -> LOUD WARNING:
-#     Opus 4.7 needs xhigh floor for SDLC compliance (TDD, self-review, deep reasoning).
+#     Opus 4.8 needs xhigh floor for SDLC compliance (TDD, self-review, deep reasoning).
 #     Recommends `/effort max`. Also reminds about recommended model `opus[1m]`.
 #
 # CC does not expose the current model to hooks, so the model nudge is emitted as
@@ -57,7 +57,7 @@ else
 fi
 
 echo "=============================================================================="
-echo " WARNING: effort '$effort_display' breaks SDLC compliance on Opus 4.7."
+echo " WARNING: effort '$effort_display' breaks SDLC compliance on Opus 4.8."
 echo " Below xhigh = shallow reasoning, skipped TDD, dropped self-review."
 echo ""
 echo " Run: /effort max    (preferred, full SDLC compliance)"

diff --git a/package.json b/package.json
@@ -1,6 +1,6 @@
 {
   "name": "agentic-sdlc-wizard",
-  "version": "1.77.0",
+  "version": "1.78.0",
   "description": "SDLC enforcement for Claude Code — hooks, skills, and wizard setup in one command",
   "bin": {
     "sdlc-wizard": "cli/bin/sdlc-wizard.js"