Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 21 additions & 5 deletions .claude/commands/workflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,22 +35,27 @@ The SAM control plane monitors ACP sessions for activity. If your session appear
# Workflow State

## Goal

<one-line summary>

## Subtasks
| # | Description | Task ID | Status | Branch | Notes |
|---|------------|---------|--------|--------|-------|
| 1 | ... | pending | ... | ... | ... |
| 2 | ... | pending | ... | ... | ... |

| # | Description | Task ID | Status | Branch | Notes |
| --- | ----------- | ------- | ------ | ------ | ----- |
| 1 | ... | pending | ... | ... | ... |
| 2 | ... | pending | ... | ... | ... |

## Dependencies

- Task 2 depends on Task 1
- Tasks 3 and 4 can run in parallel

## Poll Count

0

## Last Poll

(not yet)
```

Expand All @@ -71,13 +76,16 @@ For each subtask that has no unmet dependencies:

3. **Verify dispatch succeeded** — call `get_task_details` on the returned task ID within 10 seconds to confirm it was picked up. If it wasn't, retry once, then report the failure.

Before retrying the same prompt, inspect the failed task/session and check `list_tasks`/`list_project_agents` for active duplicates with the same title, branch, prompt, or PR. If a duplicate is already running, coordinate with it instead of creating another copy. Do not blindly redispatch after no-workspace/startup failures or transient provider failures.

4. **Call `update_task_status`** after each dispatch: "Dispatched subtask N: <description>"

---

## Phase 3: Foreground Polling Loop (CRITICAL)

This is the most important phase. You MUST poll actively to:

- Keep the session alive (prevent timeout kills)
- Detect subtask completion and trigger dependent work
- Report progress to the user
Expand All @@ -101,7 +109,8 @@ REPEAT until all subtasks are complete or failed:
- Call get_peer_agent_output(taskId) to review the result
6. If any subtask failed:
- Review the failure via get_task_details
- Decide: retry_subtask with adjusted description, or mark as failed
- Check for duplicate active work with the same prompt, branch, title, or PR
- Decide: retry with adjusted description only after diagnosing the failure, or mark as failed
- Update .workflow-state.md
7. If all subtasks are complete: exit loop
8. If all remaining subtasks are failed and no retries are possible: exit loop
Expand All @@ -120,6 +129,7 @@ REPEAT until all subtasks are complete or failed:
### What to Do If Context Feels Fuzzy

If after context compaction you're unsure what's happening:

1. Read `.workflow-state.md` — it has the complete state
2. Call `list_tasks` to see all your subtasks
3. Call `get_task_details` for each active subtask
Expand Down Expand Up @@ -147,22 +157,26 @@ When all subtasks are complete (or all remaining ones have permanently failed):
## Handling Common Scenarios

### Subtask produces a PR that needs to merge before the next step

- After the subtask completes, check if it created a PR via `get_task_details`
- If the PR is merged, proceed with dependent subtasks
- If the PR is open, note this in your status update — the dependent subtask should be dispatched to the PR's branch

### Subtask fails

- Read the failure details via `get_task_details` and `get_peer_agent_output`
- If it's a transient failure (timeout, resource issue), retry with `retry_subtask`
- If it's a permanent failure (wrong approach, missing prerequisite), adjust the description and retry, or skip and note in the summary
- Maximum 2 retries per subtask

### You're running out of time

- Push all branches, update all task files
- Call `update_task_status` with current state: what's done, what's in progress, what's remaining
- Do NOT rush to merge incomplete work

### A subtask needs input from you

- If a subtask calls `request_human_input`, you'll see a notification
- Respond via `send_message_to_subtask` with the needed information
- Resume your polling loop
Expand All @@ -174,12 +188,14 @@ When all subtasks are complete (or all remaining ones have permanently failed):
User: "Refactor the auth middleware and update all routes that use it"

Decomposition:

1. Research current auth middleware usage (subtask)
2. Implement new auth middleware (subtask, depends on 1)
3. Update API routes to use new middleware (subtask, depends on 2)
4. Update tests (subtask, depends on 2 and 3)

Dispatch sequence:

- Dispatch subtask 1 immediately
- Poll every 300s until subtask 1 completes
- Dispatch subtask 2 with subtask 1's output as context
Expand Down
37 changes: 28 additions & 9 deletions .claude/rules/09-task-tracking.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,18 +29,18 @@ Findings that exist only in the Research section without a corresponding checkli

Before moving ANY task from `tasks/active/` to `tasks/archive/`, you MUST run the `task-completion-validator` agent (`.claude/agents/task-completion-validator/`). This agent performs six cross-reference checks:

| Check | What it catches |
|-------|----------------|
| **A: Research → Checklist** | Research findings that never became checklist items |
| **B: Checklist → Diff** | Checklist items checked off but not actually in the code changes |
| **C: Criteria → Tests** | Acceptance criteria with no test or manual verification |
| **D: UI → Backend** | UI form fields that collect input but never send it to the API |
| **E: Multi-Resource** | Selection functions that pick from a set without a discriminator |
| **F: Vertical Slice** | Cross-boundary features tested only in isolation with empty mocks instead of vertical slice tests with realistic state (see `35-vertical-slice-testing.md`) |
| Check | What it catches |
| --------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **A: Research → Checklist** | Research findings that never became checklist items |
| **B: Checklist → Diff** | Checklist items checked off but not actually in the code changes |
| **C: Criteria → Tests** | Acceptance criteria with no test or manual verification |
| **D: UI → Backend** | UI form fields that collect input but never send it to the API |
| **E: Multi-Resource** | Selection functions that pick from a set without a discriminator |
| **F: Vertical Slice** | Cross-boundary features tested only in isolation with empty mocks instead of vertical slice tests with realistic state (see `35-vertical-slice-testing.md`) |

### Validation Rules

- **CRITICAL/HIGH findings block merge.** Fix them in the branch before merging. Filing a backlog task is NOT an acceptable alternative — the validator exists to catch gaps *before* they ship, not to generate follow-up work. The only exception is explicit human approval to defer a specific finding.
- **CRITICAL/HIGH findings block merge.** Fix them in the branch before merging. Filing a backlog task is NOT an acceptable alternative — the validator exists to catch gaps _before_ they ship, not to generate follow-up work. The only exception is explicit human approval to defer a specific finding.
- **A validator FAIL means the task is not complete.** Return to implementation. Do NOT proceed to PR creation or merge.
- **Do NOT rationalize gaps.** "It works when I test it manually" is not an answer to "no test covers this acceptance criterion." Either add the test or document the manual verification with evidence.
- **"Fix or defer" is not a real choice.** If you have time to write a backlog task file, you have time to write the test or fix the gap. The backlog escape hatch has been abused in every case where it was used (PR #568, PR #570) — the follow-up tasks add friction and delay but deliver the same work that should have been done in the original PR.
Expand All @@ -54,6 +54,7 @@ Before moving ANY task from `tasks/active/` to `tasks/archive/`, you MUST run th
## Acceptance Criteria Must Be Testable

When writing acceptance criteria, each criterion must be verifiable by at least one of:

- An automated test (unit, integration, or E2E)
- A documented manual verification with evidence (screenshot, API response, log output)

Expand All @@ -63,6 +64,12 @@ Criteria like "User with both providers can select which provider to use" requir

When dispatching a task to another agent (via `dispatch_task` or any other mechanism), the task description MUST instruct the receiving agent to execute the work using the `/do` skill. The `/do` skill is the standard end-to-end workflow for implementing tasks — it handles research, planning, implementation, review, staging verification, and PR creation.

### Read-Only Requests Are Not Implementation Tasks

PR status, PR history, task status, and diagnostic/investigation questions are read-only by default. Answer them in the current session using SAM MCP tools, GitHub/`gh`, logs, and local repo evidence.

Do not create a task file, branch, commit, or PR for a read-only status/history request unless the user explicitly asks for code changes, config changes, a durable artifact, or a delegated task. Repeated recent failures came from treating simple status/history questions as full SAM task executions, which created branches and failed sessions without improving the answer.

### How to Write Dispatch Descriptions

Include an explicit instruction to use `/do` in the task description. Example:
Expand Down Expand Up @@ -92,6 +99,18 @@ If the requested specialist/profile is not available or cannot be observed from

When a dispatched task returns, treat its output as usable only after checking that it came from the intended task/profile and respected the original constraints. If the result was produced by the wrong profile, ignored `draft PR`/`do not merge`, dropped the requested branch, or skipped `/do` when required, document the mismatch and do not use it as validation evidence.

### Before Retrying a Failed Dispatch

Before retrying or redispatching the same work after a SAM task fails, diagnose the failed start:

- Call `get_task_details` for the failed task and read any output summary, branch, PR URL, and status evidence.
- If there is a session, read enough messages to distinguish no-workspace/startup failure, transient provider error, human-cancel recovery, wrong profile, or real task failure.
- Call `list_tasks`/`list_project_agents` to check for active duplicates with the same prompt, title, branch, or PR.
- If an active duplicate exists, inspect or coordinate with it instead of creating another copy.
- If the failure was a transient provider or platform startup issue, adjust the retry only after confirming the current platform behavior has not already fixed it.

Do not blindly submit the same prompt repeatedly after no-workspace/startup failures, provider overloads, or immediately failed sessions. If the cheapest evidence does not reveal why the task failed, report the failure with the exact task IDs and observed state instead of multiplying duplicate tasks.

### Why This Matters

Without the `/do` instruction, a dispatched agent may skip critical phases like staging verification, specialist review, or proper PR creation. The `/do` workflow enforces all quality gates defined in this project's rules.
Expand Down
26 changes: 21 additions & 5 deletions .codex/prompts/workflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,22 +48,27 @@ Untested assumptions are not blockers.
# Workflow State

## Goal

<one-line summary>

## Subtasks
| # | Description | Task ID | Status | Branch | Notes |
|---|------------|---------|--------|--------|-------|
| 1 | ... | pending | ... | ... | ... |
| 2 | ... | pending | ... | ... | ... |

| # | Description | Task ID | Status | Branch | Notes |
| --- | ----------- | ------- | ------ | ------ | ----- |
| 1 | ... | pending | ... | ... | ... |
| 2 | ... | pending | ... | ... | ... |

## Dependencies

- Task 2 depends on Task 1
- Tasks 3 and 4 can run in parallel

## Poll Count

0

## Last Poll

(not yet)
```

Expand Down Expand Up @@ -92,13 +97,16 @@ For each subtask that has no unmet dependencies:

If any of these checks fail, do not wait on the subtask. Re-dispatch with corrected instructions or report the failure with exact status evidence.

Before retrying the same prompt, inspect the failed task/session and check `list_tasks`/`list_project_agents` for active duplicates with the same title, branch, prompt, or PR. If a duplicate is already running, coordinate with it instead of creating another copy. Do not blindly redispatch after no-workspace/startup failures or transient provider failures.

4. **Call `update_task_status`** after each dispatch: "Dispatched subtask N: <description>"

---

## Phase 3: Foreground Polling Loop (CRITICAL)

This is the most important phase. You MUST poll actively to:

- Keep the session alive (prevent timeout kills)
- Detect subtask completion and trigger dependent work
- Report progress to the user
Expand All @@ -122,7 +130,8 @@ REPEAT until all subtasks are complete or failed:
- Call get_peer_agent_output(taskId) to review the result
6. If any subtask failed:
- Review the failure via get_task_details
- Decide: retry_subtask with adjusted description, or mark as failed
- Check for duplicate active work with the same prompt, branch, title, or PR
- Decide: retry with adjusted description only after diagnosing the failure, or mark as failed
- Update .workflow-state.md
7. If all subtasks are complete: exit loop
8. If all remaining subtasks are failed and no retries are possible: exit loop
Expand All @@ -141,6 +150,7 @@ REPEAT until all subtasks are complete or failed:
### What to Do If Context Feels Fuzzy

If after context compaction you're unsure what's happening:

1. Read `.workflow-state.md` — it has the complete state
2. Call `list_tasks` to see all your subtasks
3. Call `get_task_details` for each active subtask
Expand Down Expand Up @@ -168,22 +178,26 @@ When all subtasks are complete (or all remaining ones have permanently failed):
## Handling Common Scenarios

### Subtask produces a PR that needs to merge before the next step

- After the subtask completes, check if it created a PR via `get_task_details`
- If the PR is merged, proceed with dependent subtasks
- If the PR is open, note this in your status update — the dependent subtask should be dispatched to the PR's branch

### Subtask fails

- Read the failure details via `get_task_details` and `get_peer_agent_output`
- If it's a transient failure (timeout, resource issue), retry with `retry_subtask`
- If it's a permanent failure (wrong approach, missing prerequisite), adjust the description and retry, or skip and note in the summary
- Maximum 2 retries per subtask

### You're running out of time

- Push all branches, update all task files
- Call `update_task_status` with current state: what's done, what's in progress, what's remaining
- Do NOT rush to merge incomplete work

### A subtask needs input from you

- If a subtask calls `request_human_input`, you'll see a notification
- Respond via `send_message_to_subtask` with the needed information
- Resume your polling loop
Expand All @@ -195,12 +209,14 @@ When all subtasks are complete (or all remaining ones have permanently failed):
User: "Refactor the auth middleware and update all routes that use it"

Decomposition:

1. Research current auth middleware usage (subtask)
2. Implement new auth middleware (subtask, depends on 1)
3. Update API routes to use new middleware (subtask, depends on 2)
4. Update tests (subtask, depends on 2 and 3)

Dispatch sequence:

- Dispatch subtask 1 immediately
- Poll every 300s until subtask 1 completes
- Dispatch subtask 2 with subtask 1's output as context
Expand Down
Loading
Loading