You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* feat(gen-skill-docs): add design outside voices + hard rules resolvers
Add generateDesignOutsideVoices() — parallel Codex + Claude subagent
dispatch for cross-model design critique with litmus scorecard synthesis.
Branches per skillName (plan-design-review, design-review, design-consultation)
with task-specific reasoning effort (high for analytical, medium for creative).
Add generateDesignHardRules() — OpenAI Frontend Skill hard rules + gstack
AI slop blacklist unified into one shared block with classifier step
(landing page vs app UI vs hybrid).
Extract AI_SLOP_BLACKLIST constant from inline prose in generateDesignMethodology()
for DRY. Extend generateDesignReviewLite() with lightweight Codex block.
Extend generateDesignSketch() with outside voices opt-in after wireframe.
Source: OpenAI "Designing Delightful Frontends with GPT-5.4" (Mar 2026)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(design skills): add outside voices + hard rules to all design templates
Insert {{DESIGN_OUTSIDE_VOICES}} in plan-design-review (between Step 0D
and Pass 1), design-review (between Phase 6 and Phase 7), and
design-consultation (between Phase 2 and Phase 3).
Insert {{DESIGN_HARD_RULES}} in plan-design-review Pass 4 and design-review
Phase 3 checklist.
DESIGN_REVIEW_LITE in /ship and /review now includes a Codex design voice
block with litmus checks.
DESIGN_SKETCH in /office-hours now includes outside voices opt-in after
wireframe approval.
Regenerated all SKILL.md files (both Claude and Codex hosts).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add resolver tests + touchfiles for design outside voices
Add 18 test cases across 4 new describe blocks:
- DESIGN_OUTSIDE_VOICES: host guard, skillName branching, reasoning effort
- DESIGN_HARD_RULES: classifier, 3 rule sets, slop blacklist, OpenAI criteria
- DESIGN_SKETCH extended: outside voices step, original wireframe preserved
- DESIGN_REVIEW_LITE extended: Codex block, codex host exclusion
Update touchfiles: add scripts/gen-skill-docs.ts to design skill E2E
test dependencies for accurate diff-based test selection.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.11.3.0)
Design outside voices — parallel Codex + Claude subagent for cross-model
design critique with litmus scorecard synthesis. OpenAI hard rules + gstack
slop blacklist unified. Classifier for landing page vs app UI.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: generate .agents/ on demand in tests (not checked in since v0.11.2.0)
.agents/ is gitignored since v0.11.2.0 — tests that read Codex-host
SKILL.md files now generate them on demand via `bun run gen-skill-docs.ts
--host codex` before reading. Fixes test failures on fresh clones.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
-**Every design review now gets a second opinion.**`/plan-design-review`, `/design-review`, and `/design-consultation` dispatch both Codex (OpenAI) and a fresh Claude subagent in parallel to independently evaluate your design — then synthesize findings with a litmus scorecard showing where they agree and disagree. Cross-model agreement = high confidence; disagreement = investigate.
8
+
-**OpenAI's design hard rules baked in.** 7 hard rejection criteria, 7 litmus checks, and a landing-page vs app-UI classifier from OpenAI's "Designing Delightful Frontends" framework — merged with gstack's existing 10-item AI slop blacklist. Your design gets evaluated against the same rules OpenAI recommends for their own models.
9
+
-**Codex design voice in every PR.** The lightweight design review that runs in `/ship` and `/review` now includes a Codex design check when frontend files change — automatic, no opt-in needed.
10
+
-**Outside voices in /office-hours brainstorming.** After wireframe sketches, you can now get Codex + Claude subagent design perspectives on your approaches before committing to a direction.
11
+
-**AI slop blacklist extracted as shared constant.** The 10 anti-patterns (purple gradients, 3-column icon grids, centered everything, etc.) are now defined once and shared across all design skills. Easier to maintain, impossible to drift.
Copy file name to clipboardExpand all lines: TODOS.md
+24Lines changed: 24 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -432,6 +432,30 @@ Shipped: Default model changed to Sonnet for structure tests (~30), Opus retaine
432
432
433
433
Shipped as v0.5.0 on main. Includes `/plan-design-review` (report-only design audit), `/qa-design-review` (audit + fix loop), and `/design-consultation` (interactive DESIGN.md creation). `{{DESIGN_METHODOLOGY}}` resolver provides shared 80-item design audit checklist.
434
434
435
+
### Design outside voices in /plan-eng-review
436
+
437
+
**What:** Extend the parallel dual-voice pattern (Codex + Claude subagent) to /plan-eng-review's architecture review section.
438
+
439
+
**Why:** The design beachhead (v0.11.3.0) proves cross-model consensus works for subjective reviews. Architecture reviews have similar subjectivity in tradeoff decisions.
440
+
441
+
**Context:** Depends on learnings from the design beachhead. If the litmus scorecard format proves useful, adapt it for architecture dimensions (coupling, scaling, reversibility).
### Outside voices in /qa visual regression detection
448
+
449
+
**What:** Add Codex design voice to /qa for detecting visual regressions during bug-fix verification.
450
+
451
+
**Why:** When fixing bugs, the fix can introduce visual regressions that code-level checks miss. Codex could flag "the fix broke the responsive layout" during re-test.
452
+
453
+
**Context:** Depends on /qa having design awareness. Currently /qa focuses on functional testing.
Copy file name to clipboardExpand all lines: design-consultation/SKILL.md
+65Lines changed: 65 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -423,6 +423,71 @@ If the user said no research, skip entirely and proceed to Phase 3 using your bu
423
423
424
424
---
425
425
426
+
## Design Outside Voices (parallel)
427
+
428
+
Use AskUserQuestion:
429
+
> "Want outside design voices? Codex evaluates against OpenAI's design hard rules + litmus checks; Claude subagent does an independent design direction proposal."
430
+
>
431
+
> A) Yes — run outside design voices
432
+
> B) No — proceed without
433
+
434
+
If user chooses B, skip this step and continue.
435
+
436
+
**Check Codex availability:**
437
+
```bash
438
+
which codex 2>/dev/null &&echo"CODEX_AVAILABLE"||echo"CODEX_NOT_AVAILABLE"
439
+
```
440
+
441
+
**If Codex is available**, launch both voices simultaneously:
codex exec"Given this product context, propose a complete design direction:
447
+
- Visual thesis: one sentence describing mood, material, and energy
448
+
- Typography: specific font names (not defaults — no Inter/Roboto/Arial/system) + hex colors
449
+
- Color system: CSS variables for background, surface, primary text, muted text, accent
450
+
- Layout: composition-first, not component-first. First viewport as poster, not document
451
+
- Differentiation: 2 deliberate departures from category norms
452
+
- Anti-slop: no purple gradients, no 3-column icon grids, no centered everything, no decorative blobs
453
+
454
+
Be opinionated. Be specific. Do not hedge. This is YOUR design direction — own it." -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached 2>"$TMPERR_DESIGN"
455
+
```
456
+
Use a 5-minute timeout (`timeout: 300000`). After the command completes, read stderr:
457
+
```bash
458
+
cat "$TMPERR_DESIGN"&& rm -f "$TMPERR_DESIGN"
459
+
```
460
+
461
+
2.**Claude design subagent** (via Agent tool):
462
+
Dispatch a subagent with this prompt:
463
+
"Given this product context, propose a design direction that would SURPRISE. What would the cool indie studio do that the enterprise UI team wouldn't?
464
+
- Propose an aesthetic direction, typography stack (specific font names), color palette (hex values)
465
+
- 2 deliberate departures from category norms
466
+
- What emotional reaction should the user have in the first 3 seconds?
467
+
468
+
Be bold. Be specific. No hedging."
469
+
470
+
**Error handling (all non-blocking):**
471
+
-**Auth failure:** If stderr contains "auth", "login", "unauthorized", or "API key": "Codex authentication failed. Run `codex login` to authenticate."
472
+
-**Timeout:** "Codex timed out after 5 minutes."
473
+
-**Empty response:** "Codex returned no response."
474
+
- On any Codex error: proceed with Claude subagent output only, tagged `[single-model]`.
475
+
- If Claude subagent also fails: "Outside voices unavailable — continuing with primary review."
476
+
477
+
Present Codex output under a `CODEX SAYS (design direction):` header.
478
+
Present subagent output under a `CLAUDE SUBAGENT (design direction):` header.
479
+
480
+
**Synthesis:** Claude main references both Codex and subagent proposals in the Phase 3 proposal. Present:
481
+
- Areas of agreement between all three voices (Claude main + Codex + subagent)
482
+
- Genuine divergences as creative alternatives for the user to choose from
483
+
- "Codex and I agree on X. Codex suggested Y where I'm proposing Z — here's why..."
Copy file name to clipboardExpand all lines: design-review/SKILL.md
+150Lines changed: 150 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -856,6 +856,75 @@ Tie everything to user goals and product objectives. Always suggest specific imp
856
856
10.**Depth over breadth.** 5-10 well-documented findings with screenshots and specific suggestions > 20 vague observations.
857
857
11.**Show screenshots to the user.** After every `$B screenshot`, `$B snapshot -a -o`, or `$B responsive` command, use the Read tool on the output file(s) so the user can see them inline. For `responsive` (3 files), Read all three. This is critical — without it, screenshots are invisible to the user.
858
858
859
+
### Design Hard Rules
860
+
861
+
**Classifier — determine rule set before evaluating:**
- Motion: 2-3 intentional animations, or zero / ornamental only?
972
+
- Cards: used only when card IS the interaction? No decorative card grids?
973
+
974
+
First classify as MARKETING/LANDING PAGE vs APP UI vs HYBRID, then apply matching rules.
975
+
976
+
LITMUS CHECKS — answer YES/NO:
977
+
1. Brand/product unmistakable in first screen?
978
+
2. One strong visual anchor present?
979
+
3. Page understandable by scanning headlines only?
980
+
4. Each section has one job?
981
+
5. Are cards actually necessary?
982
+
6. Does motion improve hierarchy or atmosphere?
983
+
7. Would design feel premium with all decorative shadows removed?
984
+
985
+
HARD REJECTION — flag if ANY apply:
986
+
1. Generic SaaS card grid as first impression
987
+
2. Beautiful image with weak brand
988
+
3. Strong headline with no clear action
989
+
4. Busy imagery behind text
990
+
5. Sections repeating same mood statement
991
+
6. Carousel with no narrative purpose
992
+
7. App UI made of stacked cards instead of layout
993
+
994
+
Be specific. Reference file:line for every finding." -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_DESIGN"
995
+
```
996
+
Use a 5-minute timeout (`timeout: 300000`). After the command completes, read stderr:
997
+
```bash
998
+
cat "$TMPERR_DESIGN"&& rm -f "$TMPERR_DESIGN"
999
+
```
1000
+
1001
+
2.**Claude design subagent** (via Agent tool):
1002
+
Dispatch a subagent with this prompt:
1003
+
"Review the frontend source code in this repo. You are an independent senior product designer doing a source-code design audit. Focus on CONSISTENCY PATTERNS across files rather than individual violations:
1004
+
- Are spacing values systematic across the codebase?
1005
+
- Is there ONE color system or scattered approaches?
1006
+
- Do responsive breakpoints follow a consistent set?
1007
+
- Is the accessibility approach consistent or spotty?
1008
+
1009
+
For each finding: what's wrong, severity (critical/high/medium), and the file:line."
1010
+
1011
+
**Error handling (all non-blocking):**
1012
+
-**Auth failure:** If stderr contains "auth", "login", "unauthorized", or "API key": "Codex authentication failed. Run `codex login` to authenticate."
1013
+
-**Timeout:** "Codex timed out after 5 minutes."
1014
+
-**Empty response:** "Codex returned no response."
1015
+
- On any Codex error: proceed with Claude subagent output only, tagged `[single-model]`.
1016
+
- If Claude subagent also fails: "Outside voices unavailable — continuing with primary review."
1017
+
1018
+
Present Codex output under a `CODEX SAYS (design source audit):` header.
1019
+
Present subagent output under a `CLAUDE SUBAGENT (design consistency):` header.
1020
+
1021
+
**Synthesis — Litmus scorecard:**
1022
+
1023
+
Use the same scorecard format as /plan-design-review (shown above). Fill in from both outputs.
1024
+
Merge findings into the triage with `[codex]` / `[subagent]` / `[cross-model]` tags.
Copy file name to clipboardExpand all lines: office-hours/SKILL.md
+29Lines changed: 29 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -731,6 +731,35 @@ Reference the wireframe screenshot in the design doc's "Recommended Approach" se
731
731
The screenshot file at `/tmp/gstack-sketch.png` can be referenced by downstream skills
732
732
(`/plan-design-review`, `/design-review`) to see what was originally envisioned.
733
733
734
+
**Step 6: Outside design voices** (optional)
735
+
736
+
After the wireframe is approved, offer outside design perspectives:
737
+
738
+
```bash
739
+
which codex 2>/dev/null &&echo"CODEX_AVAILABLE"||echo"CODEX_NOT_AVAILABLE"
740
+
```
741
+
742
+
If Codex is available, use AskUserQuestion:
743
+
> "Want outside design perspectives on the chosen approach? Codex proposes a visual thesis, content plan, and interaction ideas. A Claude subagent proposes an alternative aesthetic direction."
744
+
>
745
+
> A) Yes — get outside design voices
746
+
> B) No — proceed without
747
+
748
+
If user chooses A, launch both voices simultaneously:
codex exec"For this product approach, provide: a visual thesis (one sentence — mood, material, energy), a content plan (hero → support → detail → CTA), and 2 interaction ideas that change page feel. Apply beautiful defaults: composition-first, brand-first, cardless, poster not document. Be opinionated." -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached 2>"$TMPERR_SKETCH"
754
+
```
755
+
Use a 5-minute timeout (`timeout: 300000`). After completion: `cat "$TMPERR_SKETCH" && rm -f "$TMPERR_SKETCH"`
756
+
757
+
2.**Claude subagent** (via Agent tool):
758
+
"For this product approach, what design direction would you recommend? What aesthetic, typography, and interaction patterns fit? What would make this approach feel inevitable to the user? Be specific — font names, hex colors, spacing values."
759
+
760
+
Present Codex output under `CODEX SAYS (design sketch):` and subagent output under `CLAUDE SUBAGENT (design direction):`.
761
+
Error handling: all non-blocking. On failure, skip and continue.
0 commit comments