feat(ce-dogfood-beta): add diff-scoped browser QA dogfood skill by kieranklaassen · Pull Request #848 · EveryInc/compound-engineering-plugin

kieranklaassen · 2026-05-21T05:46:35Z

Summary

Adds ce-dogfood-beta, a [BETA] (manual-invoke) skill that dogfoods the active branch end-to-end as a QA engineer. Unlike the external dogfood skill (whole-app exploration), this is diff-scoped to what the branch changed versus main, and it self-heals: it tests, fixes the small stuff, and escalates the big stuff.

Workflow:

Scope — PR/branch/current; offers a ce-worktree; refuses main; resumes a prior run if one exists.
Analyze — full diff vs main, grounded in personas/vision (STRATEGY.md "Who it's for" → VISION.md → persona docs → inferred).
Map + Matrix — maps each user flow as a Mermaid flowchart, then derives the test matrix from the flows and loads it as a task list.
Serve — port detection + dev server (reuses ce-test-browser).
Execute — agent-browser only; judges correctness and walks each flow per persona to flag paper cuts.
Fix loop — auto-fixes small/low-risk issues (fix → regression test → ce-commit); escalates large/ambiguous/architectural changes to a "Decisions for a human" section instead of charging ahead.
Report — finalizes a durable doc at docs/dogfood-reports/... (diff, personas, flows, matrix+results, fixes, paper cuts, human decisions, learnings, verdict).

Resumable by design: the task list is the live to-do and the report doc on disk is a durable checkpoint, so a run can stop and resume across sessions.

Files

skills/ce-dogfood-beta/SKILL.md
skills/ce-dogfood-beta/references/test-matrix-taxonomy.md
skills/ce-dogfood-beta/references/dogfood-report-template.md
README Beta/Experimental table entry

Test plan

bun run release:validate — in sync (49 agents, 38 skills)
bun test tests/frontmatter.test.ts — 343 pass
No broken markdown reference links; references use backtick paths
Manual: run /ce-dogfood-beta against a feature branch in a real app and confirm the flow-map → matrix → fix → report loop behaves

🤖 Generated with Claude Code

… dev server Upload changes: - Add R2 (Cloudflare R2) as a permanent upload destination using AWS CLI - In headless/background mode, auto-upload to R2 when env vars are set (R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, R2_BUCKET, R2_ENDPOINT, R2_PUBLIC_URL) without any user confirmation step - In interactive mode, offer R2 as the first option in the destination menu - Fall back to catbox if R2 upload fails Browser reel tier: - In headless mode, auto-start the dev server in background and poll for readiness (30s timeout) instead of asking the user to start it - Track server PID for cleanup in Step 4 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ng-plugin

New [BETA] skill that dogfoods the active branch end-to-end as a QA engineer: maps user flows as Mermaid flowcharts, derives an exhaustive browser test matrix, drives the app with agent-browser, then auto-fixes small issues (with regression tests + commits) and escalates large or ambiguous changes to a human-decision section. - Persona-grounded: judges flows against STRATEGY.md/VISION.md personas and flags paper cuts, not just functional pass/fail - Resumable: task list + a live report doc in docs/dogfood-reports/ act as a durable checkpoint so a run can stop and resume across sessions - Orchestrates existing CE skills (ce-test-browser, ce-debug, ce-commit, ce-compound, ce-worktree, ce-setup) rather than reinventing them - Adds references/test-matrix-taxonomy.md and dogfood-report-template.md - Lists the skill in the README Beta/Experimental table Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

kieranklaassen · 2026-05-21T05:50:31Z

testing overnight

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 45444f3dc6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-21T05:54:39Z

+   - **Blank:** use the current branch.
+2. **Refuse to run on `main`/`master`.** If the resolved branch is the trunk, stop and tell the user — there is no diff to dogfood.
+3. **Offer isolation.** Ask whether to run in a git worktree so the main checkout stays untouched (use the platform's blocking question tool). If yes, hand off to `ce-worktree`; if no, continue in place.
+4. **Resume if a prior run exists.** Look for an existing report at `docs/dogfood-reports/*-<branch-slug>-dogfood.md`. If one is found with unfinished scenarios, ask whether to resume it or start fresh. To resume, re-hydrate the task list from its matrix (Pass/Fixed/Skipped stay done; Pending/Blocked/in-progress become the remaining work) and continue from there.


Keep human-decision blockers closed on resume

The resume rule currently treats all Blocked scenarios as remaining work, which conflicts with Phase 5 where Blocked (human decision) is a terminal state. On a resumed run, this will re-open scenarios that were intentionally escalated to a human and can drive the agent back into re-testing or attempting autonomous changes that were explicitly deferred. Resume should keep Blocked (human decision) closed and only re-queue statuses that are actually actionable.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-21T05:54:39Z

+1. **Upload to R2 (public URL)** -- upload to Cloudflare R2 for permanent PR embedding (available when R2 env vars are set)
+2. **Upload to catbox (public URL)** -- promote to catbox permanent hosting for PR embedding
+3. **Save locally** -- save to a stable OS-temp path (/tmp/compound-engineering/ce-demo-reel/)
+4. **Recapture** -- provide instructions on what to change
+5. **Proceed without evidence** -- set evidence to null and proceed


Avoid 5-option blocking menu in upload destination step

This menu now defines 5 options while still directing use of blocking question tools; in this repo's guidance, plugins/compound-engineering/AGENTS.md documents a 4-option cap for these tools and requires numbered-chat fallback for true overflow cases. Keeping 5 options here can cause tool-call failure or truncated choices in capped harnesses, which can prevent users from selecting the intended destination path (for example Proceed without evidence).

Useful? React with 👍 / 👎.

kieranklaassen · 2026-05-21T17:10:13Z

@cursor fix the pr comments pelase

kieranklaassen and others added 3 commits May 4, 2026 15:48

Merge branch 'main' of https://github.com/EveryInc/compound-engineeri…

6fd7036

…ng-plugin

chatgpt-codex-connector Bot reviewed May 21, 2026

View reviewed changes

kieranklaassen merged commit 0aa6b55 into main May 21, 2026
2 checks passed

github-actions Bot mentioned this pull request May 21, 2026

chore: release main #852

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ce-dogfood-beta): add diff-scoped browser QA dogfood skill#848

feat(ce-dogfood-beta): add diff-scoped browser QA dogfood skill#848
kieranklaassen merged 3 commits into
mainfrom
skill-ce-dogfood-beta

kieranklaassen commented May 21, 2026

Uh oh!

kieranklaassen commented May 21, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Uh oh!

kieranklaassen commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kieranklaassen commented May 21, 2026

Summary

Files

Test plan

Uh oh!

kieranklaassen commented May 21, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

kieranklaassen commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant