Skip to content

feat(ce-dogfood-beta): add diff-scoped browser QA dogfood skill#848

Merged
kieranklaassen merged 3 commits into
mainfrom
skill-ce-dogfood-beta
May 21, 2026
Merged

feat(ce-dogfood-beta): add diff-scoped browser QA dogfood skill#848
kieranklaassen merged 3 commits into
mainfrom
skill-ce-dogfood-beta

Conversation

@kieranklaassen
Copy link
Copy Markdown
Collaborator

Summary

Adds ce-dogfood-beta, a [BETA] (manual-invoke) skill that dogfoods the active branch end-to-end as a QA engineer. Unlike the external dogfood skill (whole-app exploration), this is diff-scoped to what the branch changed versus main, and it self-heals: it tests, fixes the small stuff, and escalates the big stuff.

Workflow:

  • Scope — PR/branch/current; offers a ce-worktree; refuses main; resumes a prior run if one exists.
  • Analyze — full diff vs main, grounded in personas/vision (STRATEGY.md "Who it's for" → VISION.md → persona docs → inferred).
  • Map + Matrix — maps each user flow as a Mermaid flowchart, then derives the test matrix from the flows and loads it as a task list.
  • Serve — port detection + dev server (reuses ce-test-browser).
  • Executeagent-browser only; judges correctness and walks each flow per persona to flag paper cuts.
  • Fix loop — auto-fixes small/low-risk issues (fix → regression test → ce-commit); escalates large/ambiguous/architectural changes to a "Decisions for a human" section instead of charging ahead.
  • Report — finalizes a durable doc at docs/dogfood-reports/... (diff, personas, flows, matrix+results, fixes, paper cuts, human decisions, learnings, verdict).

Resumable by design: the task list is the live to-do and the report doc on disk is a durable checkpoint, so a run can stop and resume across sessions.

Files

  • skills/ce-dogfood-beta/SKILL.md
  • skills/ce-dogfood-beta/references/test-matrix-taxonomy.md
  • skills/ce-dogfood-beta/references/dogfood-report-template.md
  • README Beta/Experimental table entry

Test plan

  • bun run release:validate — in sync (49 agents, 38 skills)
  • bun test tests/frontmatter.test.ts — 343 pass
  • No broken markdown reference links; references use backtick paths
  • Manual: run /ce-dogfood-beta against a feature branch in a real app and confirm the flow-map → matrix → fix → report loop behaves

🤖 Generated with Claude Code

kieranklaassen and others added 3 commits May 4, 2026 15:48
… dev server

Upload changes:
- Add R2 (Cloudflare R2) as a permanent upload destination using AWS CLI
- In headless/background mode, auto-upload to R2 when env vars are set
  (R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, R2_BUCKET, R2_ENDPOINT, R2_PUBLIC_URL)
  without any user confirmation step
- In interactive mode, offer R2 as the first option in the destination menu
- Fall back to catbox if R2 upload fails

Browser reel tier:
- In headless mode, auto-start the dev server in background and poll for
  readiness (30s timeout) instead of asking the user to start it
- Track server PID for cleanup in Step 4

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New [BETA] skill that dogfoods the active branch end-to-end as a QA
engineer: maps user flows as Mermaid flowcharts, derives an exhaustive
browser test matrix, drives the app with agent-browser, then auto-fixes
small issues (with regression tests + commits) and escalates large or
ambiguous changes to a human-decision section.

- Persona-grounded: judges flows against STRATEGY.md/VISION.md personas
  and flags paper cuts, not just functional pass/fail
- Resumable: task list + a live report doc in docs/dogfood-reports/ act
  as a durable checkpoint so a run can stop and resume across sessions
- Orchestrates existing CE skills (ce-test-browser, ce-debug, ce-commit,
  ce-compound, ce-worktree, ce-setup) rather than reinventing them
- Adds references/test-matrix-taxonomy.md and dogfood-report-template.md
- Lists the skill in the README Beta/Experimental table

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@kieranklaassen
Copy link
Copy Markdown
Collaborator Author

testing overnight

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 45444f3dc6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- **Blank:** use the current branch.
2. **Refuse to run on `main`/`master`.** If the resolved branch is the trunk, stop and tell the user — there is no diff to dogfood.
3. **Offer isolation.** Ask whether to run in a git worktree so the main checkout stays untouched (use the platform's blocking question tool). If yes, hand off to `ce-worktree`; if no, continue in place.
4. **Resume if a prior run exists.** Look for an existing report at `docs/dogfood-reports/*-<branch-slug>-dogfood.md`. If one is found with unfinished scenarios, ask whether to resume it or start fresh. To resume, re-hydrate the task list from its matrix (Pass/Fixed/Skipped stay done; Pending/Blocked/in-progress become the remaining work) and continue from there.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep human-decision blockers closed on resume

The resume rule currently treats all Blocked scenarios as remaining work, which conflicts with Phase 5 where Blocked (human decision) is a terminal state. On a resumed run, this will re-open scenarios that were intentionally escalated to a human and can drive the agent back into re-testing or attempting autonomous changes that were explicitly deferred. Resume should keep Blocked (human decision) closed and only re-queue statuses that are actually actionable.

Useful? React with 👍 / 👎.

Comment on lines +33 to +37
1. **Upload to R2 (public URL)** -- upload to Cloudflare R2 for permanent PR embedding (available when R2 env vars are set)
2. **Upload to catbox (public URL)** -- promote to catbox permanent hosting for PR embedding
3. **Save locally** -- save to a stable OS-temp path (/tmp/compound-engineering/ce-demo-reel/)
4. **Recapture** -- provide instructions on what to change
5. **Proceed without evidence** -- set evidence to null and proceed
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid 5-option blocking menu in upload destination step

This menu now defines 5 options while still directing use of blocking question tools; in this repo's guidance, plugins/compound-engineering/AGENTS.md documents a 4-option cap for these tools and requires numbered-chat fallback for true overflow cases. Keeping 5 options here can cause tool-call failure or truncated choices in capped harnesses, which can prevent users from selecting the intended destination path (for example Proceed without evidence).

Useful? React with 👍 / 👎.

@kieranklaassen
Copy link
Copy Markdown
Collaborator Author

@cursor fix the pr comments pelase

@kieranklaassen kieranklaassen merged commit 0aa6b55 into main May 21, 2026
2 checks passed
@github-actions github-actions Bot mentioned this pull request May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant