Skip to content

Commit 1d1c5ec

Browse files
committed
fix: v0.8.1 — codex round-1 cross-model review fixes (5 findings)
v0.7.0+v0.8.0 shipped without external review. Codex xhigh round-1 returned 6/10 NOT_CERTIFIED with 5 actionable findings. v0.8.1 fixes all five. F1 (P1) detect-backends.sh:140 — privacy-first cascade emitted hosted_oss/google_aistudio while configurator only had proprietary/google. Deleted the wrong-tier line; both cascades now consistently put Google in proprietary tier (closed weights). F2 (P1) detect-backends.sh — dropped NIM_API_KEY/GEMINI_API_KEY alternates entirely. Configurator hardcoded {env:NVIDIA_API_KEY} and {env:GOOGLE_API_KEY} so users with only the alt set got configs that silently failed auth at runtime. Canonical names only. F3 (P1) docs/cost-ladder.md — Cerebras llama-3.3-70b not in current catalog; Gemini 2.0 Flash deprecated. Updated to qwen-3-235b / gpt-oss-120b / gemini-2.5-flash. Added 'Verifying current model IDs' callout with live links to provider catalogs. Per-provider stale-id history in calibration notes. F4 (P2) docs/cost-ladder.md — DeepSeek price was \$0.27/M, current \$0.14/M cache-miss. Updated inline + added cache-hit qualifier. F5 (P2) validate-review-artifact.js — only handled additionalProperties=false, ignored schema-valued. test_counts uses {type:integer,minimum:0} addProps but bad values silently passed. Added object-schema branch — extra props now recursively validate. Tests: picker 29→31 (F1+F2 regressions), schemas 28→30 (F5+positive). Total 289 across 11 suites. Dogfood: schemas validated their own .reviews/handoff.json + response.json. All 5 findings have status:FIXED with fix_summary + fix_locations. Conditional validation fires correctly.
1 parent fb1e1ce commit 1d1c5ec

10 files changed

Lines changed: 297 additions & 63 deletions
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
You are reviewing v0.7.0 + v0.8.0 of opencode-sdlc-wizard (commits 6c5c4a2 and fb1e1ce on main). Read .reviews/handoff.json — it's the canonical handoff and it tells you the mission, success criteria, failure modes to hunt for, and review instructions. Then audit the changes in those two commits.
2+
3+
Concretely:
4+
1. git log --oneline v0.6.0..HEAD to see the scope
5+
2. git diff v0.6.0..HEAD --stat for the file list
6+
3. Read each changed file end to end
7+
4. Per-file: confirm the change matches the handoff's success criteria. If you find anything that violates them OR matches a failure mode in the handoff, raise a finding.
8+
5. Hot spots flagged in the handoff:
9+
- templates/schemas/{handoff,response}.schema.json — draft-07 correctness
10+
- scripts/validate-review-artifact.js — does it actually implement the draft-07 subset correctly? Try the test cases in tests/test-review-schemas.sh
11+
- scripts/configure-backend.sh — five new provider fragments (cerebras, deepseek, nvidia, google, mlx). Verify the emitted opencode.json shape matches OpenCode 1.14.x's @ai-sdk/openai-compatible contract. The F1 lesson from v0.2.0 round-2 was that custom providers MUST include "models: { [model]: {} }". Verify each new fragment does.
12+
- scripts/detect-backends.sh — --free-tier-first cascade order
13+
- docs/cost-ladder.md — sanity-check any pricing/quota numbers
14+
6. Run npm test to confirm all 285 tests pass
15+
16+
Output format: structured per-finding list. Each finding:
17+
- finding_id (F1, F2, ...)
18+
- severity (P0 / P1 / P2)
19+
- title
20+
- file:line
21+
- claim (what is wrong, why it matters)
22+
- recommendation (what should change)
23+
24+
End with:
25+
- summary score (1-10)
26+
- CERTIFIED or NOT_CERTIFIED
27+
- one-paragraph rationale
28+
29+
If you find the work solid, fewer findings is fine — do not pad. P0 = ship-blocker, P1 = should-fix-soon, P2 = nice-to-have.

.reviews/handoff.json

Lines changed: 39 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,43 +1,50 @@
11
{
2-
"review_id": "opencode-sdlc-wizard-v0.2.0-002",
2+
"review_id": "opencode-sdlc-wizard-v0.7-v0.8-001",
33
"status": "PENDING_REVIEW",
4-
"round": 2,
5-
"supersedes_round": 1,
6-
"mission": "v0.2.0 expands the v0.1.0 OpenCode port (Phase A) with a privacy-first backend picker — the differentiator that justifies this sibling's existence vs the Claude/Codex siblings. Two scripts (scripts/detect-backends.sh, scripts/configure-backend.sh) plus PRIVACY.md ship as part of the install bundle. Round-2 must (a) re-verify the 6 round-1 fixes are still in place against current main + working tree (per .reviews/response.json), (b) audit the new picker as a fresh review, (c) confirm 107/107 tests across 4 suites still pass, (d) verify docs (README.md, AGENTS.md, PRIVACY.md, HANDOFF.md addendum, CHANGELOG.md) are coherent with the code.",
7-
"success": "Reviewer confirms: (1) all 6 round-1 findings stay FIXED at the file:line pairs cited in response.json; (2) detect-backends.sh emits well-formed JSON with the four-tier shape, recommendation cascades privacy-first, no live network calls; (3) configure-backend.sh writes a valid opencode.json shape per OpenCode's documented config schema (https://opencode.ai/docs/config/, https://opencode.ai/docs/providers/), uses {env:VAR} substitution for keys, is idempotent, refuses to clobber an existing model pin without --force; (4) PRIVACY.md tier model matches the script tier names (private_local / enterprise / hosted_oss / proprietary); (5) install.sh now ships the scripts at .opencode/scripts/ + chmod +x's them, but still leaves opencode.json untouched (configure-backend.sh is opt-in); (6) setup-wizard SKILL.md walks the user through the picker with privacy-first defaults; (7) version bump from 0.1.0 → 0.2.0 in package.json + CHANGELOG.md is documented and accurate; (8) test count is 107 (57 + 11 + 13 + 15 — bundle/plugin/install/picker).",
8-
"failure": "(a) Picker JSON shape doesn't match OpenCode's actual provider plugin contract — e.g., we use `npm: \"@ai-sdk/openai-compatible\"` for local providers but the field name might actually be different (e.g., `module`); (b) {env:VAR} substitution syntax we emit doesn't match what OpenCode actually parses; (c) configure-backend's deep-merge of provider config silently drops parts of an existing custom provider block; (d) detect-backends recommendation cascade has a privacy hole — e.g., it could prefer enterprise over private_local in some edge case; (e) the picker scripts are fine standalone but the setup-wizard SKILL.md prompts for them in a way that contradicts the wizard's `Does not pin a model in opencode.json without explicit user choice` guarantee; (f) PRIVACY.md verification command (`opencode --print-config`) doesn't actually exist; (g) round-1 fixes silently regress in the file:line pairs cited.",
9-
"files_changed_v020": [
10-
"AGENTS.md (privacy-tier section + PRIVACY.md link)",
11-
"CHANGELOG.md (v0.2.0 entry)",
12-
"HANDOFF.md (v0.2.0 addendum)",
13-
"PRIVACY.md (NEW — four-tier model + Ollama walkthrough + verification)",
14-
"README.md (banner bumped, privacy-first picker section, tier table)",
15-
"install.sh (REQUIRED_SOURCES + declare_target add scripts; chmod +x picker scripts; Next Steps mention picker)",
16-
"package.json (0.1.0 → 0.2.0; files[] adds scripts/ + PRIVACY.md; test script adds picker tests)",
17-
"scripts/configure-backend.sh (NEW — writes/merges opencode.json with --tier --provider --model + --force/--print-only)",
18-
"scripts/detect-backends.sh (NEW — env+PATH probe → JSON, privacy-first recommendation cascade)",
19-
"skills/setup-wizard/SKILL.md (rewrote Step 2/Step 3 with concrete picker invocation)",
20-
"tests/test-backend-picker.sh (NEW — 15 tests for picker)",
21-
"tests/test-bundle-integrity.sh (added scripts/+PRIVACY.md presence + tier-name guards; 57 → 68)",
22-
"tests/test-install.sh (added scripts install + executable check; bumped re-install MATCH count to 15+; 12 → 13)"
23-
],
24-
"files_unchanged_v020_round1_fixes_held": [
25-
".opencode/plugins/sdlc-wizard.js (round-1 P0 #1 + P0 #2 fixes — verify still intact)",
26-
".opencode/hooks/* (verbatim from parent — not in review scope)",
27-
"skills/sdlc/SKILL.md (verbatim workflow; not in review scope)"
4+
"round": 1,
5+
"mission": "Cross-model review of v0.7.0 + v0.8.0 (commits 6c5c4a2 + fb1e1ce). v0.7.0 added JSON Schemas (draft-07) for .reviews/{handoff,response}.json + a zero-dep node validator. v0.8.0 added a --free-tier-first cascade flag, five new providers (Cerebras, DeepSeek-direct, NVIDIA NIM, Google AI Studio, MLX), and docs/cost-ladder.md. Both shipped without external review. This round confirms shape correctness, security posture, and SDLC-protocol adherence.",
6+
"success": "Reviewer confirms: (1) handoff/response schemas correctly model the artifact shapes — required fields right, conditional FIXED→fix_summary+fix_locations and REJECTED→rejection_reason actually fire, severity pattern matches both bare 'P0' and 'P0 (parenthetical)' forms; (2) validate-review-artifact.js handles the draft-07 subset correctly including $ref + allOf if/then + const + patternProperties — no false positives or silent passes on malformed artifacts; (3) the five new providers in detect-backends.sh + configure-backend.sh emit OpenCode-resolvable opencode.json (custom providers include `models: { [model]: {} }` per the F1 lesson from v0.2.0); (4) PROVIDER_ALIASES correctly maps friendly→canonical IDs (nvidia_nim→nvidia, google_aistudio→google, gemini→google); (5) baseURLs are correct (Cerebras https://api.cerebras.ai/v1, DeepSeek https://api.deepseek.com/v1, NVIDIA https://integrate.api.nvidia.com/v1, Google https://generativelanguage.googleapis.com/v1beta/openai); (6) --free-tier-first cascade order is sane (NVIDIA NIM → Cerebras → Groq → Google → OpenRouter → DeepSeek → Together → enterprise → proprietary); (7) docs/cost-ladder.md numbers are plausible and the capability-floor section is honest about the 30B+ requirement; (8) all 285 tests across 11 suites pass and meaningfully exercise the new code (no smoke-test-only coverage); (9) drift-test extension (T7 *.js + T12 schemas) catches the new file classes.",
7+
"failure": "Possible regressions: (a) Schema permissiveness — additionalProperties: true is forward-compat but means a typo in a required-ish field like 'status' could silently land as an extra prop instead of failing; verify the FIXED-conditional fires correctly in the wild; (b) Validator $ref handling could miss circular refs or refs into definitions{} that don't exist; (c) NVIDIA NIM model id format — we emit `nvidia/<vendor>/<model>` (e.g., nvidia/meta/llama-3.3-70b) — verify OpenCode actually accepts triple-segment ids; (d) Google AI Studio's OpenAI-compatible endpoint URL — verify generativelanguage.googleapis.com/v1beta/openai is current and not a deprecated v1alpha path; (e) Cerebras + DeepSeek-direct rate-limits not surfaced anywhere — users may hit them and not know why; (f) MLX baseURL defaults to 127.0.0.1:8080 but mlx_lm.server's actual default is 8080 only with --port flag; verify the doc; (g) The hybrid coder/reviewer pattern in cost-ladder.md says 'edit opencode.json or call configure-backend again' to swap — but v0.8.0 has no mixed-mode skill, so swap is manual; surface this clearly; (h) docs/cost-ladder.md cited prices may be stale within months — already noted in the doc but reviewer should flag any that look wildly off; (i) Custom-provider models block (`models: { [model]: {} }`) — verify the empty-object value still resolves correctly for Cerebras/DeepSeek/NVIDIA on opencode 1.14.x.",
8+
"files_changed": [
9+
"scripts/detect-backends.sh (mlx + 4 hosted_oss/proprietary providers + --free-tier-first flag + dual cascade)",
10+
"scripts/configure-backend.sh (PROVIDER_ALIASES + 6 new fragment cases: mlx, cerebras, deepseek, nvidia, google)",
11+
"scripts/validate-review-artifact.js (NEW — zero-dep draft-07 subset validator)",
12+
"scripts/validate-review-artifact.sh (NEW — bash wrapper)",
13+
"templates/schemas/handoff.schema.json (NEW)",
14+
"templates/schemas/response.schema.json (NEW — with allOf if/then conditionals)",
15+
"docs/cost-ladder.md (NEW)",
16+
"install.sh (REQUIRED_SOURCES + declare_target add schemas + validator)",
17+
"skills/cross-model-review/SKILL.md (Step 1.5 — validate before sending)",
18+
"skills/setup-wizard/SKILL.md (Step 4 mentions schemas)",
19+
"tests/test-review-schemas.sh (NEW — 28 tests)",
20+
"tests/test-backend-picker.sh (21 → 29: new provider tests + --free-tier-first)",
21+
"tests/test-doc-templates.sh (17 → 24: cost-ladder.md + load-bearing sections)",
22+
"tests/test-bundle-drift.sh (T7 *.js + T12 schemas)",
23+
"package.json (0.6.0 → 0.8.0; files[] += docs/; test script += test-review-schemas.sh)",
24+
"CHANGELOG.md (v0.7.0 + v0.8.0 entries)",
25+
"ROADMAP.md (marked shipped)",
26+
"README.md (status banner + provider list + cost-ladder cross-link)"
2827
],
2928
"verification_state": {
3029
"tests_green": true,
3130
"test_counts": {
32-
"test-bundle-integrity.sh": 68,
31+
"test-bundle-integrity.sh": 73,
3332
"test-plugin-shim.sh": 11,
3433
"test-install.sh": 13,
35-
"test-backend-picker.sh": 15,
36-
"total": 107
37-
}
34+
"test-backend-picker.sh": 29,
35+
"test-cli.sh": 10,
36+
"test-cross-model-review.sh": 10,
37+
"test-domain-templates.sh": 26,
38+
"test-bundle-drift.sh": 51,
39+
"test-check-cli.sh": 10,
40+
"test-doc-templates.sh": 24,
41+
"test-review-schemas.sh": 28,
42+
"total": 285
43+
},
44+
"live_e2e_verified": "Static review only — no live E2E run for v0.7.0/v0.8.0 yet. v0.2.0 baseline live E2E (opencode-ai@1.14.33) still applicable for plugin/hooks; the new scripts are pure bash/node and don't touch the plugin runtime."
3845
},
39-
"review_instructions": "Two phases: (1) RECHECK round-1 — for each of the 6 findings in .reviews/response.json, verify the FIXED claims still hold against current working tree (file:line pairs are listed). Do NOT re-review the bash hooks. (2) FRESH REVIEW v0.2.0 picker — audit scripts/{detect,configure}-backends.sh and the PRIVACY.md / setup-wizard skill / install.sh changes that wire them in. Cross-check the configurator's opencode.json output against OpenCode's actual config schema (fetch https://opencode.ai/docs/config/ and https://opencode.ai/docs/providers/ if needed). Be strict on (a) JSON shape correctness, (b) privacy guarantee — does the recommendation cascade ever leak data when a less-private tier has higher capability?, (c) idempotency claims, (d) doc/code coherence. End with: score (1-10), CERTIFIED or NOT CERTIFIED. Do NOT raise new findings unless P0 on the unchanged round-1 surface; for the new picker, all findings are in scope.",
40-
"preflight_path": ".reviews/preflight-v0.1.0.md",
46+
"review_instructions": "Two phases. (1) Schema correctness — validate templates/schemas/{handoff,response}.schema.json against the live .reviews/{handoff,response}.json artifacts using bash scripts/validate-review-artifact.sh. Confirm conditionals fire (FIXED requires fix_summary+fix_locations; REJECTED requires rejection_reason). Try a malformed handoff: missing 'failure' should rc=1. (2) Provider audit — for each new provider (cerebras/deepseek/nvidia/google/mlx), verify the emitted opencode.json fragment matches what OpenCode 1.14.x actually expects. Hot spot: NVIDIA NIM `nvidia/meta/llama-3.3-70b-instruct` model string with two slashes. Cross-check baseURLs against current provider docs if you can. (3) docs/cost-ladder.md — flag any specific price/quota numbers that look stale or wrong (DeepSeek ~$0.27/M in, Together $25 initial credits, Groq sub-second, NVIDIA NIM credits, Gemini 1500 req/day on Flash — these were calibrated 2026-05-05). End with: per-finding table (severity P0/P1/P2, status, file:line, recommendation), score (1-10), CERTIFIED or NOT_CERTIFIED. Do NOT raise findings for the bash hooks or plugin shim — those are out of scope for this round (covered by v0.2.0 review).",
47+
"preflight_path": "n/a",
4148
"response_path": ".reviews/response.json",
42-
"artifact_path": "https://github.com/BaseInfinity/opencode-sdlc-wizard"
49+
"artifact_path": "https://github.com/BaseInfinity/opencode-sdlc-wizard/compare/v0.6.0...v0.8.0"
4350
}

CHANGELOG.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,71 @@
22

33
All notable changes to opencode-sdlc-wizard.
44

5+
## [0.8.1] - 2026-05-05
6+
7+
### Fixed — codex round-1 cross-model review (5 findings, all addressed)
8+
9+
v0.7.0 + v0.8.0 shipped without external review. Codex xhigh round-1
10+
returned 6/10 NOT_CERTIFIED with 5 actionable findings. v0.8.1 fixes
11+
all five.
12+
13+
**F1 (P1)**`scripts/detect-backends.sh:140`. Privacy-first cascade
14+
emitted `hosted_oss/google_aistudio` while configurator only had
15+
`proprietary/google` — followup `--tier hosted_oss --provider
16+
google_aistudio` would fail "unsupported tier/provider". Fix: deleted
17+
the wrong-tier line; the proprietary fallthrough at line 143 was
18+
already correct. Both cascades now consistently put Google in
19+
proprietary tier.
20+
21+
**F2 (P1)** — Detector accepted `NIM_API_KEY`/`GEMINI_API_KEY` as
22+
alternates for `NVIDIA_API_KEY`/`GOOGLE_API_KEY`, but the configurator
23+
hardcoded the canonical names in `{env:NAME}` references. A user with
24+
only the alternate set got a "successfully configured" backend that
25+
silently failed auth at runtime. Fix: dropped alternate support
26+
entirely. Canonical names only — `NVIDIA_API_KEY` for NVIDIA NIM,
27+
`GOOGLE_API_KEY` for Google AI Studio. JSON output `envs: [..]` array
28+
collapsed to `env: "NAME"` singular for shape consistency.
29+
30+
**F3 (P1)**`docs/cost-ladder.md` recommended Cerebras
31+
`llama-3.3-70b` (not in current Cerebras catalog) and Gemini
32+
`gemini-2.0-flash` (deprecated, June 2026 shutdown). Fix: updated
33+
$0/mo path to current valid IDs (`qwen-3-235b-a22b-instruct-2507` /
34+
`gpt-oss-120b` / `gemini-2.5-flash`). Added a "Verifying current model
35+
IDs" callout block with live links to each provider's catalog. Added
36+
per-provider calibration history to the closing notes section.
37+
38+
**F4 (P2)** — DeepSeek price quoted `~$0.27/M`, current is `~$0.14/M`
39+
cache-miss. Fixed inline + added cache-hit qualifier to the calibration
40+
notes.
41+
42+
**F5 (P2)**`scripts/validate-review-artifact.js:133` only handled
43+
`additionalProperties === false`, ignored schema-valued
44+
`additionalProperties` used in the schemas for
45+
`verification_state.test_counts`. Bad values like
46+
`{"bad":"not-int","negative":-1}` validated successfully. Fix: added
47+
object-schema branch — extra properties now recursively validate
48+
against the addProps schema. Live `.reviews/*.json` artifacts still
49+
pass; bad test_counts now fail with type/minimum errors.
50+
51+
### Tests
52+
53+
- `test-backend-picker.sh` 29 → 31: T29 (Google in proprietary tier
54+
both cascades), T30 (alt env names not honored)
55+
- `test-review-schemas.sh` 28 → 30: T29 (schema-valued addProps
56+
enforced), T30 (positive case)
57+
- T21 in picker updated to use `GOOGLE_API_KEY` instead of
58+
`GEMINI_API_KEY`
59+
60+
**Total: 289 tests across 11 suites** (was 285 in v0.8.0).
61+
62+
### Dogfood
63+
64+
Schemas validated their own review artifacts: `.reviews/handoff.json`
65+
and `.reviews/response.json` for the v0.7-v0.8-001 review both
66+
validate against the v0.7.0 schemas. Conditional validation fired —
67+
all five findings are `status: FIXED` with `fix_summary` +
68+
`fix_locations` populated.
69+
570
## [0.8.0] - 2026-05-05
671

772
### Added — free-tier-first cascade + 5 new providers + cost ladder doc

README.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
# OpenCode SDLC Wizard
22

3-
> **Status: v0.8.0 (free-tier-first picker + cost ladder doc + Cerebras
4-
> / DeepSeek-direct / NVIDIA NIM / Google AI Studio / MLX detection +
5-
> schemas + validator + full template set + check subcommand) —
6-
> 2026-05-05.** Install with `npx opencode-sdlc-wizard init`, check
3+
> **Status: v0.8.1 (codex round-1 fixes — Google tier mismatch,
4+
> canonical env names, cost-ladder freshness, validator addProps —
5+
> on top of v0.8.0's free-tier-first picker + cost ladder + 5 new
6+
> providers + schemas + validator + full template set + check
7+
> subcommand) — 2026-05-05.** Install with `npx opencode-sdlc-wizard init`, check
78
> upstream with `npx opencode-sdlc-wizard check`. Full SDLC loop is
89
> any-backend on both coder AND reviewer (zero Anthropic+OpenAI lock-in
910
> possible); detector now picks up free-tier-friendly providers

0 commit comments

Comments
 (0)