Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
"name": "sdlc-wizard",
"source": ".",
"description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
"version": "1.77.0",
"version": "1.78.0",
"author": {
"name": "Stefan Ayala"
},
Expand Down
2 changes: 1 addition & 1 deletion .claude-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "sdlc-wizard",
"version": "1.77.0",
"version": "1.78.0",
"description": "SDLC enforcement for AI agents — TDD, planning, self-review, CI shepherd",
"author": {
"name": "Stefan Ayala",
Expand Down
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,16 @@ All notable changes to the SDLC Wizard.

> **Note:** This changelog is for humans to read. Don't manually apply these changes - just run the wizard ("Check for SDLC wizard updates") and it handles everything automatically.

## [1.78.0] - 2026-06-02

### Changed

- **Bumped recommended model from Opus 4.7 → Opus 4.8** (closes #365). Opus 4.8 launched 2026-05-28; day 5 in-the-wild sentiment settled positive, proof-of-life via `claude --print --model claude-opus-4-8` confirmed reachability, and the `opus[1m]` alias auto-resolves to latest Opus on CC v2.1.154+ (no settings.json template change needed — alias does the work). Updated prose across `SDLC.md` (Recommended Model row + effort warning), `CLAUDE_CODE_SDLC_WIZARD.md` (~11 mentions in effort table + autocompact + mixed-mode tier + 1M-context section), `skills/sdlc/SKILL.md`, `skills/setup/SKILL.md` (mixed-mode + flagship tier suggestions), `skills/update/SKILL.md`, `hooks/model-effort-check.sh` (warning text), and `cli/lib/repo-complexity.js` (tier comments). Min CC bumped v2.1.111+ → v2.1.154+ (required to resolve `opus[1m]` to 4.8). Effort semantics preserved — strict effort behavior introduced in 4.7 carried forward to 4.8, so `max` remains the recommended default and `xhigh` the floor.

### Notes

- **Un-run gates 2+3 tracked as post-deploy follow-up obligations on #365.** Gate 2 (A/B coder quality vs 4.7 on real PRs) and Gate 3 (dogfood for 24h before bump) were both deferred. If real-world use surfaces the system-card-flagged regressions for 4.8 — prompt-injection +60% on Gray Swan, file-deletion tendency, or eval-awareness affecting wizard output — revert via a single PR that flips the 4.7↔4.8 prose. Settings.json template unchanged means revert is prose-only.

## [1.77.0] - 2026-05-24

### Added
Expand Down
2 changes: 1 addition & 1 deletion CI_CD.md
Original file line number Diff line number Diff line change
Expand Up @@ -250,7 +250,7 @@ The `review` job is **skipped on `BaseInfinity/claude-sdlc-wizard` self-PRs**

## Local Codex Audit of CI Logs (Cross-Model)

The GH `pr-review.yml` workflow uses Claude Opus 4.7. For adversarial diversity, the local shepherd loop runs a **second pass with Codex xhigh** against the CI logs themselves — not just the code. A second model catches things the first missed (silent test exclusions, degraded E2E scores on a green checkmark, warnings promoted to errors in a later version).
The GH `pr-review.yml` workflow uses Claude Opus 4.8 (via `claude-code-action@v1` default). For adversarial diversity, the local shepherd loop runs a **second pass with Codex xhigh** against the CI logs themselves — not just the code. A second model catches things the first missed (silent test exclusions, degraded E2E scores on a green checkmark, warnings promoted to errors in a later version).

```bash
# After CI reports pass/fail:
Expand Down
38 changes: 19 additions & 19 deletions CLAUDE_CODE_SDLC_WIZARD.md
Original file line number Diff line number Diff line change
Expand Up @@ -249,26 +249,26 @@ When Anthropic provides official plugins or tools that handle something:

Claude Code's **effort level** controls how much thinking the model does before responding. Higher effort = deeper reasoning but more tokens.

> ⚠️ **On Opus 4.7, effort below `xhigh` breaks SDLC compliance in practice.** Unlike 4.6, Opus 4.7 respects effort levels *strictly* — at `high` or below it scopes work tighter (shallow reasoning, skipped TDD, no self-review) rather than going above-and-beyond. Treat the table below accordingly: **`max` is the recommended default, `xhigh` is the floor**, `high` or below is for trivial grep/search subagents only.
> ⚠️ **On Opus 4.8, effort below `xhigh` breaks SDLC compliance in practice.** Inherited from 4.7, Opus 4.8 respects effort levels *strictly* — at `high` or below it scopes work tighter (shallow reasoning, skipped TDD, no self-review) rather than going above-and-beyond. Treat the table below accordingly: **`max` is the recommended default, `xhigh` is the floor**, `high` or below is for trivial grep/search subagents only.

| Level | When to Use | How to Set |
|-------|-------------|------------|
| `high` or below | **Not for SDLC work on Opus 4.7.** Only for trivial grep/search subagents or one-shot questions that don't require planning | `effort: high` in a specific subagent frontmatter only |
| `xhigh` | **Floor for SDLC work on Opus 4.7.** Long-running tasks, repeated tool calls, deep exploration. Claude Code defaults to this on Opus 4.7 | `/effort xhigh` or set in skill frontmatter |
| `max` | **Recommended default for Opus 4.7 SDLC work.** Multi-file changes, architecture decisions, debugging, cross-model reviews, any task touching wizard/skill/CI code | `/effort max` (session only — resets next session) |

**Effort level changes in Opus 4.7 (April 2026):**
- **`xhigh` is new** — sits between `high` and `max`, designed for coding and agentic work (30+ minute tasks with token budgets in the millions)
- **Claude Code now defaults to `xhigh`** on Opus 4.7 for all plans
- **Opus 4.7 respects effort levels more strictly** than 4.6 — at lower levels it scopes work tighter instead of going above and beyond. If you see shallow reasoning, raise effort rather than prompting around it
- **`budget_tokens` is deprecated** on Opus 4.7 — use adaptive thinking with effort instead
| `high` or below | **Not for SDLC work on Opus 4.8.** Only for trivial grep/search subagents or one-shot questions that don't require planning | `effort: high` in a specific subagent frontmatter only |
| `xhigh` | **Floor for SDLC work on Opus 4.8.** Long-running tasks, repeated tool calls, deep exploration. Claude Code defaults to this on Opus 4.7+ | `/effort xhigh` or set in skill frontmatter |
| `max` | **Recommended default for Opus 4.8 SDLC work.** Multi-file changes, architecture decisions, debugging, cross-model reviews, any task touching wizard/skill/CI code | `/effort max` (session only — resets next session) |

**Strict effort behavior (Opus 4.7+, carried forward to 4.8):**
- **`xhigh` was introduced in 4.7** — sits between `high` and `max`, designed for coding and agentic work (30+ minute tasks with token budgets in the millions)
- **Claude Code defaults to `xhigh`** on Opus 4.7+ for all plans
- **Opus 4.7+ respects effort levels more strictly** than 4.6 — at lower levels it scopes work tighter instead of going above and beyond. If you see shallow reasoning, raise effort rather than prompting around it
- **`budget_tokens` is deprecated** on Opus 4.7+ — use adaptive thinking with effort instead
- When running at `xhigh` or `max`, set a large `max_tokens` (64k+) so the model has room to think across subagents and tool calls

**Why `high` was the previous default:** Claude Code uses **adaptive thinking** to dynamically allocate reasoning budget per turn. On Pro and Max plans, the default effort level was **medium (85)**, which causes the model to under-allocate reasoning on complex multi-step tasks — leading to shallow analysis, missed edge cases, and "lazy" outputs. This was [confirmed by Anthropic engineer Boris Cherny](https://github.com/anthropics/claude-code/issues/42796) and is documented at [code.claude.com](https://code.claude.com/docs/en/model-config). API, Team, and Enterprise plans default to high effort and are not affected.

**Don't rely on the CC default — set effort yourself.** Anthropic's [2026-04-23 post-mortem](https://www.anthropic.com/engineering/april-23-postmortem) is independent third-party evidence that CC has flipped reasoning_effort defaults across versions (high → medium → xhigh/high). The default has changed before and will change again. The wizard's `model-effort-check.sh` hook nudges to `xhigh`/`max` at session start specifically because the in-product default is not load-bearing — it can shift release-to-release without notice. Set `/effort max` explicitly every session you do SDLC work, and treat any "I assumed the default was X" reasoning as a bug.

The `/sdlc` skill sets `effort: high` in its frontmatter as a baseline, overriding the medium default on every SDLC invocation. **On Opus 4.7, run `/effort max` at session start** — the frontmatter is a floor, not a ceiling, and `max` is where SDLC-compliant work actually happens on 4.7.
The `/sdlc` skill sets `effort: high` in its frontmatter as a baseline, overriding the medium default on every SDLC invocation. **On Opus 4.8, run `/effort max` at session start** — the frontmatter is a floor, not a ceiling, and `max` is where SDLC-compliant work actually happens on 4.8.

**Nuclear option — disable adaptive thinking entirely:** Set `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1` in your environment or settings.json `env` block. This forces a fixed reasoning budget per turn instead of letting the model dynamically allocate. Use this if you observe persistent quality issues even with `effort: high`. See [Claude Code model config docs](https://code.claude.com/docs/en/model-config) for details.

Expand Down Expand Up @@ -966,7 +966,7 @@ Override the default auto-compact threshold with environment variables. These ar
| `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` | Trigger compaction at this % of context capacity (1-100) | ~95% |
| `CLAUDE_CODE_AUTO_COMPACT_WINDOW` | Override context capacity in tokens (useful for 1M models) | Model default |

**Opt-in (issue #198):** The SDLC Wizard CLI ships `.claude/settings.json` with **no** `model` or `env` pin so Claude Code's auto-mode stays enabled. The setup skill's Step 9.5 asks whether to opt into `"model": "opus[1m]"` + `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` (tuned for the 1M window — compacts at ~300K). Default answer is **No**. Pinning the model at the top level tells Claude Code you've explicitly chosen a model and turns off per-turn model auto-selection — a real tradeoff, so we ask. Power users who want guaranteed Opus 4.7 + 1M context answer yes.
**Opt-in (issue #198):** The SDLC Wizard CLI ships `.claude/settings.json` with **no** `model` or `env` pin so Claude Code's auto-mode stays enabled. The setup skill's Step 9.5 asks whether to opt into `"model": "opus[1m]"` + `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=30` (tuned for the 1M window — compacts at ~300K). Default answer is **No**. Pinning the model at the top level tells Claude Code you've explicitly chosen a model and turns off per-turn model auto-selection — a real tradeoff, so we ask. Power users who want guaranteed Opus 4.8 + 1M context answer yes.

To opt in by hand, edit `.claude/settings.json`:

Expand Down Expand Up @@ -1021,13 +1021,13 @@ Claude Code supports both 200K and 1M context windows. **`opus[1m]` is an opt-in
**Why `opus[1m]` is opt-in (issue #198):**
- **Pinning disables auto-mode.** Max-plan users pay for Claude Code's per-turn model selection (Sonnet for cheap tasks, Opus for hard ones, plus weekly-limit smoothing). A top-level `model` gives that up.
- **The 1M headroom has to earn it.** If your typical session stays under 150K, you're giving up auto-mode for headroom you're not using.
- **Power users who want guaranteed Opus 4.7 + 1M** — go ahead, it's a real win for long shepherding sessions. Just make it a conscious choice, not a silent default.
- **Power users who want guaranteed Opus 4.8 + 1M** — go ahead, it's a real win for long shepherding sessions. Just make it a conscious choice, not a silent default.

**Opt in when:** you routinely cross 100K tokens in a single session (plan → TDD → review → CI shepherd on one feature), you want Opus 4.7 specifically (not Sonnet), and you're OK losing auto-mode.
**Opt in when:** you routinely cross 100K tokens in a single session (plan → TDD → review → CI shepherd on one feature), you want Opus 4.8 specifically (not Sonnet), and you're OK losing auto-mode.

**Stay on auto-mode (default) when:** you're unsure, your work is mixed short/long, or you want Claude Code to do the model math for you.

**How to opt in:** run `/model opus[1m]` in your session (transient), or set `"model": "opus[1m]"` in `.claude/settings.json` (persistent). Requires Claude Code v2.1.111+ for Opus 4.7. The setup wizard's Step 9.5 also asks once, with default No.
**How to opt in:** run `/model opus[1m]` in your session (transient), or set `"model": "opus[1m]"` in `.claude/settings.json` (persistent). Requires Claude Code v2.1.154+ for Opus 4.8 (the `opus[1m]` alias auto-resolves to the latest Opus). The setup wizard's Step 9.5 also asks once, with default No.

**How to opt out:** remove the `model` line from `.claude/settings.json`, or run `/model` and pick "Default (recommended)".

Expand All @@ -1037,14 +1037,14 @@ Claude Code supports both 200K and 1M context windows. **`opus[1m]` is an opt-in

### Mixed-Mode Tier (Sonnet coder + Opus reviewer, roadmap #233)

For trivial / blank / config-only / CRUD-style repos, full Opus 4.7 on every turn is overkill on the coder leg. The **mixed-mode tier** pins `model: "sonnet[1m]"` for in-session work while keeping the cross-model review layer (Codex / external reviewer) at the flagship — so the reviewer still catches what Sonnet missed.
For trivial / blank / config-only / CRUD-style repos, full Opus 4.8 on every turn is overkill on the coder leg. The **mixed-mode tier** pins `model: "sonnet[1m]"` for in-session work while keeping the cross-model review layer (Codex / external reviewer) at the flagship — so the reviewer still catches what Sonnet missed.

**The split:**

| Layer | Mixed-mode tier | Flagship tier |
|-------|----------------|---------------|
| Coder (in-session CC) | `model: "sonnet[1m]"` | `model: "opus[1m]"` |
| Cross-model reviewer (Codex etc.) | gpt-5.5 xhigh (or Opus 4.7 max via Bash) | gpt-5.5 xhigh (or Opus 4.7 max via Bash) |
| Cross-model reviewer (Codex etc.) | gpt-5.5 xhigh (or Opus 4.8 max via Bash) | gpt-5.5 xhigh (or Opus 4.8 max via Bash) |
| Effort floor (CC session) | xhigh; max preferred | xhigh; max preferred |

The reviewer always stays at flagship — the whole point of mixed-mode is that adversarial review catches Sonnet's blind spots, so weakening the review leg defeats the savings.
Expand All @@ -1058,7 +1058,7 @@ The reviewer always stays at flagship — the whole point of mixed-mode is that
**When to stay flagship:**
- Stakes-flagged repo: anywhere `.env` / `secrets/` / `credentials/` exists. Force flagship even if LOC is tiny — leaks are catastrophic
- Architecture work, debugging non-obvious bugs, security review, anything where the *coder's* judgment matters as much as the reviewer's
- Long shepherd sessions (plan → TDD → review → CI loop) — they cross 100K tokens regularly and Opus 4.7 fits the window better in a single thread
- Long shepherd sessions (plan → TDD → review → CI loop) — they cross 100K tokens regularly and Opus 4.8 fits the window better in a single thread

**Auto-detection:** the setup wizard runs `cli/lib/repo-complexity.js` against the target repo and suggests the tier. Stakes flag (`.env` / `secrets/`) forces complex regardless of size. The user always picks the final answer — the heuristic is a hint, not a gate.

Expand Down Expand Up @@ -2983,7 +2983,7 @@ If deployment fails or post-deploy verification catches issues:

**SDLC.md:**
```markdown
<!-- SDLC Wizard Version: 1.77.0 -->
<!-- SDLC Wizard Version: 1.78.0 -->
<!-- Setup Date: [DATE] -->
<!-- Completed Steps: step-0.1, step-0.2, step-0.4, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
<!-- Git Workflow: [PRs or Solo] -->
Expand Down
14 changes: 7 additions & 7 deletions SDLC.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<!-- SDLC Wizard Version: 1.77.0 -->
<!-- SDLC Wizard Version: 1.78.0 -->
<!-- Setup Date: 2026-01-24 -->
<!-- Completed Steps: step-0.1, step-0.2, step-1, step-2, step-3, step-4, step-5, step-6, step-7, step-8, step-9 -->
<!-- Claude Code Baseline: v2.1.159 -->
Expand All @@ -10,14 +10,14 @@

| Property | Value |
|----------|-------|
| Wizard Version | 1.77.0 |
| Last Updated | 2026-06-01 |
| Claude Code Minimum | v2.1.111+ (required for Opus 4.7 / `opus[1m]`); v2.1.105+ for `PreCompact` hook |
| Claude Code Recommended | v2.1.159+ (latest at 2026-06-01) — unlocks Opus 4.8 (v2.1.154 — wizard holds on 4.7 pending #365 A/B), `.claude/skills` plugin auto-load (v2.1.157), enriched `tool_decision` telemetry via `OTEL_LOG_TOOL_DETAILS=1` (v2.1.157), native `/goal` (v2.1.139), `/code-review --comment` (v2.1.147), per-category `/usage` (v2.1.149), `$CLAUDE_EFFORT` env var for hooks (v2.1.133) |
| Recommended Model | `opus[1m]` (Opus 4.7, 1M context) — run `/model opus[1m]` |
| Wizard Version | 1.78.0 |
| Last Updated | 2026-06-02 |
| Claude Code Minimum | v2.1.154+ (required for Opus 4.8 / `opus[1m]`); v2.1.105+ for `PreCompact` hook |
| Claude Code Recommended | v2.1.159+ (latest at 2026-06-02) — unlocks Opus 4.8 (v2.1.154), `.claude/skills` plugin auto-load (v2.1.157), enriched `tool_decision` telemetry via `OTEL_LOG_TOOL_DETAILS=1` (v2.1.157), native `/goal` (v2.1.139), `/code-review --comment` (v2.1.147), per-category `/usage` (v2.1.149), `$CLAUDE_EFFORT` env var for hooks (v2.1.133) |
| Recommended Model | `opus[1m]` (Opus 4.8, 1M context) — run `/model opus[1m]` |
| Recommended Effort | `max` (preferred) / `xhigh` (floor) — run `/effort max` |

> **Effort warning (Opus 4.7):** `max` is the recommended default, `xhigh` is the absolute floor. Anything below `xhigh` (`high`, `medium`, `low`) causes Opus 4.7 to scope work tighter — shallow reasoning, skipped TDD, dropped self-review, SDLC non-compliance in practice. Use `high` or below only for trivial grep/search subagents, never for real SDLC work.
> **Effort warning (Opus 4.8):** `max` is the recommended default, `xhigh` is the absolute floor. Anything below `xhigh` (`high`, `medium`, `low`) causes Opus 4.8 to scope work tighter — shallow reasoning, skipped TDD, dropped self-review, SDLC non-compliance in practice. Use `high` or below only for trivial grep/search subagents, never for real SDLC work.

See `CLAUDE_CODE_SDLC_WIZARD.md` → "1M vs 200K Context Window" for the rationale and pricing notes.

Expand Down
4 changes: 2 additions & 2 deletions cli/lib/repo-complexity.js
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
// Roadmap #233: repo complexity heuristic for mixed-mode tier selection.
//
// Output: { tier: 'simple' | 'complex', score: <number>, signals: [...] }
// - 'simple' → setup wizard suggests mixed-mode (Sonnet 4.6 coder + Opus 4.7 reviewer)
// - 'complex' → setup wizard suggests full flagship (Opus 4.7 everywhere)
// - 'simple' → setup wizard suggests mixed-mode (Sonnet 4.6 coder + Opus 4.8 reviewer)
// - 'complex' → setup wizard suggests full flagship (Opus 4.8 everywhere)
// Cross-model review (Codex / external) always stays at the flagship tier
// regardless of coder selection — see CLAUDE_CODE_SDLC_WIZARD.md.
//
Expand Down
6 changes: 3 additions & 3 deletions hooks/model-effort-check.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
#
# Behavior (per ROADMAP #217):
# effort=max -> silent (preferred default, above floor)
# effort=xhigh -> silent (minimum floor on Opus 4.7)
# effort=xhigh -> silent (minimum floor on Opus 4.8)
# effort=high|medium|low (or unset) -> LOUD WARNING:
# Opus 4.7 needs xhigh floor for SDLC compliance (TDD, self-review, deep reasoning).
# Opus 4.8 needs xhigh floor for SDLC compliance (TDD, self-review, deep reasoning).
# Recommends `/effort max`. Also reminds about recommended model `opus[1m]`.
#
# CC does not expose the current model to hooks, so the model nudge is emitted as
Expand Down Expand Up @@ -57,7 +57,7 @@ else
fi

echo "=============================================================================="
echo " WARNING: effort '$effort_display' breaks SDLC compliance on Opus 4.7."
echo " WARNING: effort '$effort_display' breaks SDLC compliance on Opus 4.8."
echo " Below xhigh = shallow reasoning, skipped TDD, dropped self-review."
echo ""
echo " Run: /effort max (preferred, full SDLC compliance)"
Expand Down
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "agentic-sdlc-wizard",
"version": "1.77.0",
"version": "1.78.0",
"description": "SDLC enforcement for Claude Code — hooks, skills, and wizard setup in one command",
"bin": {
"sdlc-wizard": "cli/bin/sdlc-wizard.js"
Expand Down
Loading