DefangLabs · dependabot · Apr 14, 2026 · Apr 14, 2026 · Apr 20, 2026 · Apr 20, 2026
diff --git a/.agents/skills/business-strategy/agents/openai.yaml b/.agents/skills/business-strategy/agents/openai.yaml
@@ -0,0 +1,4 @@
+interface:
+  display_name: "Business Strategy"
+  short_description: "Market sizing, pricing, business model, GTM"
+  default_prompt: "Use $business-strategy to build market sizing (TAM/SAM/SOM), business model canvas, pricing analysis, revenue models, and go-to-market plans."
diff --git a/.agents/skills/competitive-research/agents/openai.yaml b/.agents/skills/competitive-research/agents/openai.yaml
@@ -0,0 +1,4 @@
+interface:
+  display_name: "Competitive Research"
+  short_description: "Competitor profiles, feature matrices, SWOT"
+  default_prompt: "Use $competitive-research to build competitor profiles, feature matrices, SWOT analyses, and positioning maps."
diff --git a/.agents/skills/content-create/agents/openai.yaml b/.agents/skills/content-create/agents/openai.yaml
@@ -0,0 +1,4 @@
+interface:
+  display_name: "Content Creation"
+  short_description: "Social posts, blog outlines, changelogs, launch copy"
+  default_prompt: "Use $content-create to draft social media posts, blog outlines, changelog announcements, product launch copy, and developer content from strategy artifacts."
diff --git a/.agents/skills/do/SKILL.md b/.agents/skills/do/SKILL.md
@@ -10,13 +10,14 @@ Read the full workflow from `.codex/prompts/do.md` and execute it.
 ## Quick Summary
 
 1. **Research** — understand the request, search the codebase, read related docs
+   - If the user explicitly asks for local subagent critique before implementation, gather bounded local reviews and reconcile them before editing.
 2. **Task file** — create in `tasks/backlog/`, commit to main
 3. **Worktree** — create feature branch and worktree
 4. **Implement** — follow checklist, push frequently, run quality checks. **For UI changes**: run mandatory Playwright visual audit with mock data on mobile + desktop viewports (see `.claude/rules/17-ui-visual-testing.md`)
 5. **Validate** — full quality suite: lint, typecheck, test, build
 6. **Review** — invoke specialist skills ($go-specialist, $cloudflare-specialist, etc.)
-7. **Staging** — check for existing staging deploys (wait 5min if active), trigger manual deployment via `gh workflow run deploy-staging.yml --ref <branch>`, verify changed behavior end-to-end via Playwright. **For infrastructure changes** (cloud-init, VM agent, DNS, TLS, scripts/deploy): MUST provision a real VM and verify heartbeat arrives. See Phase 6b in `.codex/prompts/do.md`.
-8. **PR** — create with `gh pr create`, wait for CI, merge when green
+7. **Staging** — check for existing staging deploys (wait 5min if active), trigger manual deployment via `gh workflow run deploy-staging.yml --ref <branch>`. **Use `$CF_TOKEN` to query D1/KV/DNS directly** (see `.claude/rules/32-cf-api-debugging.md`) to verify migrations, data state, and feature flags — this is faster and more precise than UI-based checks. Then verify changed behavior end-to-end via Playwright. **For infrastructure changes** (cloud-init, VM agent, DNS, TLS, scripts/deploy): MUST provision a real VM and verify heartbeat arrives. See Phase 6b in `.codex/prompts/do.md`.
+8. **PR** — create with `gh pr create`, wait for CI, merge when green. If the user requested draft PR / do-not-merge, stop at the draft PR and do not merge.
 9. **Cleanup** — remove worktree, pull main
 
 ## ⚠️ Anti-Compaction: State File

diff --git a/.agents/skills/engineering-strategy/agents/openai.yaml b/.agents/skills/engineering-strategy/agents/openai.yaml
@@ -0,0 +1,4 @@
+interface:
+  display_name: "Engineering Strategy"
+  short_description: "Roadmap, tech radar, build-vs-buy, tech debt"
+  default_prompt: "Use $engineering-strategy to build roadmaps (Now/Next/Later), technology radar, build-vs-buy analyses, and tech debt registers."
diff --git a/.agents/skills/go-specialist/SKILL.md b/.agents/skills/go-specialist/SKILL.md
@@ -1,8 +1,8 @@
 ---
 name: go-specialist
-description: "Go code review specialist for VM Agent. Reviews PTY management, WebSocket handling, JWT validation, idle detection, and Go idioms. Use when working in packages/vm-agent/ or reviewing Go code changes."
+description: "Go code review specialist for VM Agent and CLI. Reviews PTY/WebSocket/JWT code, CLI command contracts, static-analysis findings, and Go idioms. Use when working in packages/vm-agent/, packages/cli/, or reviewing Go code changes."
 metadata:
-  short-description: "Go code review specialist for VM Agent. Reviews PTY management, "
+  short-description: "Go code review specialist for VM Agent and CLI code."
 ---
 
 # go-specialist
@@ -12,6 +12,7 @@ This is a Codex skill wrapper around the Claude Code subagent definition in:
 
 Use:
 
-1. Read CLAUDE_AGENT.md.
+1. Read `GO_SPECIALIST.md`.
 2. Follow its checklist and constraints.
-3. Report results with concrete file references.
+3. For `packages/cli`, also follow `.claude/rules/36-cli-quality.md`.
+4. Report results with concrete file references.
diff --git a/.agents/skills/marketing-strategy/agents/openai.yaml b/.agents/skills/marketing-strategy/agents/openai.yaml
@@ -0,0 +1,4 @@
+interface:
+  display_name: "Marketing Strategy"
+  short_description: "Positioning, messaging, content calendar, gap analysis"
+  default_prompt: "Use $marketing-strategy to build positioning documents, messaging guides, content calendars, channel strategy, and gap analyses."
diff --git a/.agents/skills/task-completion-validator/agents/openai.yaml b/.agents/skills/task-completion-validator/agents/openai.yaml
@@ -0,0 +1,4 @@
+interface:
+  display_name: "Task Completion Validator"
+  short_description: "Cross-references planned vs actual work"
+  default_prompt: "Use $task-completion-validator to validate task completion before archiving — checks research findings, checklist items, acceptance criteria, UI-backend paths, and multi-resource selection."
diff --git a/.agents/skills/test-engineer/SKILL.md b/.agents/skills/test-engineer/SKILL.md
@@ -14,4 +14,5 @@ Use:
 
 1. Read CLAUDE_AGENT.md.
 2. Follow its checklist and constraints.
-3. Report results with concrete file references.
+3. For any test that crosses system boundaries, follow the vertical slice testing rule in `.claude/rules/35-vertical-slice-testing.md`.
+4. Report results with concrete file references.
diff --git a/.agents/skills/workflow/SKILL.md b/.agents/skills/workflow/SKILL.md
@@ -10,7 +10,7 @@ Read the full workflow from `.codex/prompts/workflow.md` and execute it.
 ## Quick Summary
 
 1. **Decompose** — break the user's request into discrete subtasks with dependencies
-2. **Dispatch** — send subtasks to other agents via `dispatch_task` (with `/do` instructions)
+2. **Dispatch** — send subtasks to other agents via `dispatch_task` (with `/do` instructions) and verify each task started with the intended title, profile, and constraints
 3. **Poll** — foreground `sleep 300` + `get_task_details` loop keeps the session alive
 4. **React** — dispatch dependent tasks as predecessors complete, retry failures
 5. **Complete** — summarize results when all subtasks finish
@@ -19,6 +19,10 @@ Read the full workflow from `.codex/prompts/workflow.md` and execute it.
 
 When Claude Code dispatches subtasks and waits passively, the ACP session appears idle and the control plane kills it. This skill uses explicit foreground polling (Bash `sleep` + MCP tool calls) to maintain visible session activity throughout the orchestration.
 
+## Staging Debugging Access
+
+All agents have access to `$CF_TOKEN` for direct Cloudflare API queries against staging. When monitoring subtasks that deploy to staging, use the CF API to verify their work landed correctly — query D1 for data state, read KV for feature flags, check DNS for routing. See `.claude/rules/32-cf-api-debugging.md` for the full cheat sheet.
+
 ## State Persistence
 
 Maintain `.workflow-state.md` (gitignored) as external memory. Re-read it before every poll cycle. This survives context compaction. See `.codex/prompts/workflow.md` for the full state file format.
diff --git a/.claude/agents/go-specialist/GO_SPECIALIST.md b/.claude/agents/go-specialist/GO_SPECIALIST.md
@@ -1,12 +1,12 @@
 ---
 name: go-specialist
-description: Go code review specialist for VM Agent. Reviews PTY management, WebSocket handling, JWT validation, idle detection, and Go idioms. Use when working in packages/vm-agent/ or reviewing Go code changes.
+description: Go code review specialist for VM Agent and CLI. Reviews PTY management, WebSocket handling, JWT validation, CLI command contracts, static-analysis findings, and Go idioms. Use when working in packages/vm-agent/, packages/cli/, or reviewing Go code changes.
 tools: Read, Grep, Glob, Bash
 disallowedTools: Write, Edit, NotebookEdit
 model: sonnet
 ---
 
-You are a Go specialist focusing on the VM Agent codebase. Your expertise includes PTY management, WebSocket protocols, JWT validation, and Go concurrency patterns. Your role is to review code, identify issues, and recommend improvements.
+You are a Go specialist focusing on the VM Agent and SAM CLI codebases. Your expertise includes PTY management, WebSocket protocols, JWT validation, CLI command design, static-analysis remediation, and Go concurrency patterns. Your role is to review code, identify issues, and recommend improvements.
 
 ## Operating Constraints
 
@@ -22,6 +22,12 @@ The VM Agent is a single Go binary that runs on user VMs to provide:
 
 **Location**: `packages/vm-agent/`
 
+The SAM CLI is a user-facing Go command that mirrors supported UI navigation workflows and reserves future runner/harness commands until the backend contract is real.
+
+**Location**: `packages/cli/`
+
+For CLI changes, also apply `.claude/rules/36-cli-quality.md`.
+
 **Structure**:
 ```
 packages/vm-agent/
@@ -184,6 +190,19 @@ mu.Unlock()
 conn.Write(dataCopy)
 ```
 
+### 5. CLI Command Quality (`packages/cli/`)
+
+**Checklist**:
+- [ ] Commands and flags have explicit user-facing contracts and tests
+- [ ] Argument parsing remains deterministic and split into focused helpers when nested branching grows
+- [ ] HTTP, env/filesystem, stdin/stdout/stderr, and host command execution are injectable
+- [ ] API paths escape every dynamic segment
+- [ ] JSON and text output modes are both validated where user-visible
+- [ ] Secrets are redacted from stdout, stderr, returned errors, and test failure output
+- [ ] Reserved runner/harness commands fail clearly instead of simulating behavior
+- [ ] SonarCloud findings are fixed or documented with human-approved exceptions
+- [ ] Go coverage profile is generated and reviewed for touched production files
+
 ### 6. Error Handling
 
 **Go Error Idioms**:

diff --git a/.claude/agents/task-completion-validator/TASK_COMPLETION_VALIDATOR.md b/.claude/agents/task-completion-validator/TASK_COMPLETION_VALIDATOR.md
@@ -145,6 +145,26 @@ git diff main...HEAD | grep -B5 -A2 "\.limit(1)"
 
 **FAIL condition**: A function selects from a set of resources without the caller specifying which one, and no test exercises the multi-resource case.
 
+#### Check F: Vertical Slice Test Coverage
+
+If the feature crosses 2+ system boundaries (API to D1, Worker to DO, Worker to VM agent, UI to API, cron to D1+DO):
+- Does at least one test exercise the full vertical slice from entry point to final outcome?
+- Do the mocks at each boundary carry realistic state (full entity shapes, valid foreign key relationships, enough variety to exercise branching)?
+- Does the test assert both the final user-visible outcome AND the payloads sent to mocked boundaries?
+
+```bash
+# Find test files in the diff
+git diff main...HEAD --name-only | grep -E '\.test\.(ts|tsx|go)$'
+
+# Check for empty mock patterns (red flag)
+git diff main...HEAD -- '*.test.*' | grep -E 'mockResolvedValue\(\s*\{\s*\}\s*\)|as D1Database|as KVNamespace'
+
+# Check for realistic state setup (good sign)
+git diff main...HEAD -- '*.test.*' | grep -E 'make(Project|Node|Workspace|Task|Credential)|createTest(Db|App|Env)'
+```
+
+**FAIL condition**: A feature crosses 2+ boundaries but every test either (a) mocks internal functions instead of system boundaries, (b) uses empty mock objects or minimal stubs without realistic state, or (c) only tests one layer in isolation. See `.claude/rules/35-vertical-slice-testing.md`.
+
 ### Step 4: Generate Report
 
 ## Output Format

diff --git a/.claude/agents/test-engineer/TEST_ENGINEER.md b/.claude/agents/test-engineer/TEST_ENGINEER.md
@@ -260,6 +260,45 @@ cd packages/vm-agent && go test -cover ./...    # With coverage
 cd packages/vm-agent && go test -v ./internal/auth/  # Verbose specific package
 ```
 
+## Vertical Slice Testing (Mandatory for Cross-Boundary Features)
+
+Most features in this codebase cross system boundaries (API to D1, Worker to DO, Worker to VM agent, UI to API). When generating tests for these features, you MUST write vertical slice tests — not isolated unit tests with empty mocks.
+
+### What This Means
+
+1. **Identify all systems the feature touches** before writing any test. Ask: "What boundaries does data cross?"
+2. **Mock at system boundaries only** (D1, HTTP to VM agent, DO stubs) — not at internal function boundaries
+3. **Every mock must carry realistic state**: full entity shapes, valid foreign key relationships, enough variety to exercise branching logic
+4. **Assert the end-to-end outcome AND the boundary payloads**: verify both what the user sees and what was sent to each mocked system
+
+### Required State Setup
+
+For each mocked boundary, set up state that reflects what the real system would contain at that point:
+
+| Boundary | State to carry in mock |
+|----------|----------------------|
+| D1 queries | Rows in all referenced tables with valid foreign keys and realistic field values |
+| Durable Object | Internal DO state (sessions, messages, alarms) reflecting prior operations |
+| VM agent HTTP | Response with full workspace/session metadata (not just `{ id: 'ws-1' }`) |
+| API client (UI tests) | Full API response shape including nested objects, arrays, and status fields |
+
+### Anti-Patterns (BANNED)
+
+- `vi.fn().mockResolvedValue({})` — empty mock objects prove nothing
+- Mocking internal helpers instead of system boundaries — exercise your own code
+- Testing one layer when the feature spans three — if the route, service, and DB are all involved, the test must cover the full path
+- Stubs that return entities with only an `id` field — real entities have `status`, `ip`, `projectId`, etc.; code that reads those fields gets silent `undefined`
+
+### Checklist
+
+- [ ] All system boundaries identified for the feature
+- [ ] At least one test exercises the full vertical slice (entry point to final outcome)
+- [ ] Mocks carry realistic state with valid relationships between entities
+- [ ] Both success and error paths tested at each boundary
+- [ ] State variety: mocks include enough data to exercise branching (e.g., multiple nodes, active + inactive credentials)
+
+See `.claude/rules/35-vertical-slice-testing.md` for the full rule with examples and boundary pair reference.
+
 ## Test Quality Checklist
 
 When generating tests, ensure:
@@ -272,6 +311,7 @@ When generating tests, ensure:
 - [ ] Mocks reset between tests if needed
 - [ ] Test names describe the scenario clearly
 - [ ] No hardcoded secrets (use mock values)
+- [ ] For cross-boundary features: at least one vertical slice test with realistic mock state (see above)
 
 ## Output Format