feat(ce-dogfood-beta): add diff-scoped browser QA dogfood skill#848
Conversation
… dev server Upload changes: - Add R2 (Cloudflare R2) as a permanent upload destination using AWS CLI - In headless/background mode, auto-upload to R2 when env vars are set (R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, R2_BUCKET, R2_ENDPOINT, R2_PUBLIC_URL) without any user confirmation step - In interactive mode, offer R2 as the first option in the destination menu - Fall back to catbox if R2 upload fails Browser reel tier: - In headless mode, auto-start the dev server in background and poll for readiness (30s timeout) instead of asking the user to start it - Track server PID for cleanup in Step 4 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New [BETA] skill that dogfoods the active branch end-to-end as a QA engineer: maps user flows as Mermaid flowcharts, derives an exhaustive browser test matrix, drives the app with agent-browser, then auto-fixes small issues (with regression tests + commits) and escalates large or ambiguous changes to a human-decision section. - Persona-grounded: judges flows against STRATEGY.md/VISION.md personas and flags paper cuts, not just functional pass/fail - Resumable: task list + a live report doc in docs/dogfood-reports/ act as a durable checkpoint so a run can stop and resume across sessions - Orchestrates existing CE skills (ce-test-browser, ce-debug, ce-commit, ce-compound, ce-worktree, ce-setup) rather than reinventing them - Adds references/test-matrix-taxonomy.md and dogfood-report-template.md - Lists the skill in the README Beta/Experimental table Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
testing overnight |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 45444f3dc6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| - **Blank:** use the current branch. | ||
| 2. **Refuse to run on `main`/`master`.** If the resolved branch is the trunk, stop and tell the user — there is no diff to dogfood. | ||
| 3. **Offer isolation.** Ask whether to run in a git worktree so the main checkout stays untouched (use the platform's blocking question tool). If yes, hand off to `ce-worktree`; if no, continue in place. | ||
| 4. **Resume if a prior run exists.** Look for an existing report at `docs/dogfood-reports/*-<branch-slug>-dogfood.md`. If one is found with unfinished scenarios, ask whether to resume it or start fresh. To resume, re-hydrate the task list from its matrix (Pass/Fixed/Skipped stay done; Pending/Blocked/in-progress become the remaining work) and continue from there. |
There was a problem hiding this comment.
Keep human-decision blockers closed on resume
The resume rule currently treats all Blocked scenarios as remaining work, which conflicts with Phase 5 where Blocked (human decision) is a terminal state. On a resumed run, this will re-open scenarios that were intentionally escalated to a human and can drive the agent back into re-testing or attempting autonomous changes that were explicitly deferred. Resume should keep Blocked (human decision) closed and only re-queue statuses that are actually actionable.
Useful? React with 👍 / 👎.
| 1. **Upload to R2 (public URL)** -- upload to Cloudflare R2 for permanent PR embedding (available when R2 env vars are set) | ||
| 2. **Upload to catbox (public URL)** -- promote to catbox permanent hosting for PR embedding | ||
| 3. **Save locally** -- save to a stable OS-temp path (/tmp/compound-engineering/ce-demo-reel/) | ||
| 4. **Recapture** -- provide instructions on what to change | ||
| 5. **Proceed without evidence** -- set evidence to null and proceed |
There was a problem hiding this comment.
Avoid 5-option blocking menu in upload destination step
This menu now defines 5 options while still directing use of blocking question tools; in this repo's guidance, plugins/compound-engineering/AGENTS.md documents a 4-option cap for these tools and requires numbered-chat fallback for true overflow cases. Keeping 5 options here can cause tool-call failure or truncated choices in capped harnesses, which can prevent users from selecting the intended destination path (for example Proceed without evidence).
Useful? React with 👍 / 👎.
|
@cursor fix the pr comments pelase |
Summary
Adds
ce-dogfood-beta, a[BETA](manual-invoke) skill that dogfoods the active branch end-to-end as a QA engineer. Unlike the externaldogfoodskill (whole-app exploration), this is diff-scoped to what the branch changed versusmain, and it self-heals: it tests, fixes the small stuff, and escalates the big stuff.Workflow:
ce-worktree; refusesmain; resumes a prior run if one exists.main, grounded in personas/vision (STRATEGY.md"Who it's for" →VISION.md→ persona docs → inferred).ce-test-browser).agent-browseronly; judges correctness and walks each flow per persona to flag paper cuts.ce-commit); escalates large/ambiguous/architectural changes to a "Decisions for a human" section instead of charging ahead.docs/dogfood-reports/...(diff, personas, flows, matrix+results, fixes, paper cuts, human decisions, learnings, verdict).Resumable by design: the task list is the live to-do and the report doc on disk is a durable checkpoint, so a run can stop and resume across sessions.
Files
skills/ce-dogfood-beta/SKILL.mdskills/ce-dogfood-beta/references/test-matrix-taxonomy.mdskills/ce-dogfood-beta/references/dogfood-report-template.mdTest plan
bun run release:validate— in sync (49 agents, 38 skills)bun test tests/frontmatter.test.ts— 343 pass/ce-dogfood-betaagainst a feature branch in a real app and confirm the flow-map → matrix → fix → report loop behaves🤖 Generated with Claude Code