Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions prpm.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "agent-workforce-skills",
"version": "1.0.1",
"version": "1.0.2",
"description": "Skills for multi-agent coordination - swarm patterns, workflow building, relay usage, and headless orchestration",
"author": "khaliqgant",
"organization": "agent-relay",
Expand Down Expand Up @@ -28,7 +28,7 @@
},
{
"name": "writing-agent-relay-workflows",
"version": "1.6.2",
"version": "1.6.3",
"description": "Use when building multi-agent workflows with the relay broker-sdk - covers conversation-shape vs pipeline-shape coordination, WorkflowBuilder API, DAG step dependencies, agent definitions, output chaining via {{steps.X.output}}, verification gates, evidence-based completion, channels, swarm patterns, chat-native coordination recipes (Q/A, broadcast-ack, peer review, standup, hand-off), error handling, event listeners, step sizing, lead+workers team pattern, and parallel wave planning",
"format": "claude",
"subtype": "skill",
Expand Down Expand Up @@ -107,7 +107,7 @@
},
{
"name": "relay-80-100-workflow",
"version": "1.0.0",
"version": "1.0.1",
"description": "Use when writing agent-relay workflows that must fully validate features end-to-end before merging - covers the 80-to-100 pattern with PGlite in-memory Postgres testing, mock sandbox patterns, test-fix-rerun loops, verify gates, and full lifecycle from implementation through passing tests to commit",
"format": "claude",
"subtype": "skill",
Expand Down Expand Up @@ -166,7 +166,7 @@
"id": "agent-relay-starter",
"name": "Agent Relay Starter",
"description": "Essential skills for building multi-agent systems with Agent Relay - swarm pattern selection, workflow authoring, and trail debugging",
"version": "1.0.3",
"version": "1.0.4",
"category": "development",
"tags": [
"multi-agent",
Expand Down
101 changes: 83 additions & 18 deletions skills/relay-80-100-workflow/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: relay-80-100-workflow
description: Use when writing agent-relay workflows that must fully validate features end-to-end before merging. Covers the 80-to-100 pattern - going beyond "code compiles" to "feature works, tested E2E locally." Includes PGlite for in-memory Postgres testing, mock sandbox patterns, test-fix-rerun loops, verify gates after every edit, and the full lifecycle from implementation through passing tests to commit.
description: Use when writing agent-relay workflows that must fully validate features end-to-end before merging. Covers the 80-to-100 pattern - going beyond "code compiles" to "feature works, tested E2E locally." Includes repair-before-failure validation gates, PGlite for in-memory Postgres testing, mock sandbox patterns, test-fix-rerun loops, verify gates after every edit, and the full lifecycle from implementation through passing tests to commit.
---

# Writing 80-to-100 Validated Workflows
Expand All @@ -26,9 +26,22 @@ implement → write tests → run tests → fix failures → re-run → build ch

This means the commit at the end of the workflow represents code that is **proven working**, not just code that an agent wrote and claimed works.

## Repair Before Failure

An 80-to-100 workflow should not stop merely because a test, typecheck, lint, schema, or E2E gate turns red. That red output is work for the agent team. Capture it, hand it to a repair owner, fix it, and rerun. Reserve workflow failure for cases the team cannot repair in the current run, such as missing credentials, wrong repository, exhausted repair budget, or an unsafe dirty worktree.

Use this shape for every meaningful gate:

1. `run-*`: deterministic command with `captureOutput: true` and `failOnError: false`.
2. `fix-*`: agent step that reads `{{steps.run-*.output}}`, fixes source/tests/config, and reruns the command locally until green.
3. `verify-*`: deterministic rerun, usually still `failOnError: false`, followed by a final repair step if red.
4. `commit-if-green`: deterministic step that reruns the full acceptance command and commits only when every exit code is zero.

AgentWorkforce/relay#827 added repair-aware reliability to the SDK (`.reliable()` / `.repairable()` and repair-aware retry-mode workflows). Prefer those presets when available, but still model explicit repair owners when gate output needs domain-specific fixing.

## The Test-Fix-Rerun Pattern

Every testable feature in a workflow should follow this three-step pattern:
Every testable feature in a workflow should follow this four-step pattern:

```typescript
// Step 1: Run tests (allow failure — we expect issues on first run)
Expand Down Expand Up @@ -58,20 +71,31 @@ If there are failures:
verification: { type: 'exit_code' },
})

// Step 3: Deterministic final run — this one MUST pass
// Step 3: Deterministic rerun — capture result for a final repair pass
.step('run-tests-final', {
type: 'deterministic',
dependsOn: ['fix-tests'],
command: 'npx tsx --test tests/my-feature.test.ts 2>&1',
captureOutput: true,
failOnError: true, // <-- Hard fail if tests still broken
failOnError: false,
})

// Step 4: Repair again if the rerun is still red
.step('fix-tests-final', {
agent: 'tester',
dependsOn: ['run-tests-final'],
task: `If the final test rerun passed, record the green evidence.
If it failed, fix the remaining issue and rerun until green:
{{steps.run-tests-final.output}}`,
verification: { type: 'exit_code' },
})
Comment on lines +83 to 91
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 "Three-step pattern" text not updated after adding a fourth step

The PR adds a new Step 4 (fix-tests-final) at lines 83-91, making the pattern a 4-step pattern. However, the introductory text at skills/relay-80-100-workflow/SKILL.md:44 still says "follow this three-step pattern" and the summary at line 94 still says "Why three steps instead of one?" This is a documentation inconsistency introduced by this PR — readers will be confused when the code shows 4 clearly-numbered steps but the surrounding text describes 3.

Prompt for agents
The PR adds a Step 4 (fix-tests-final) to the Test-Fix-Rerun Pattern, making it a 4-step pattern. But the text around the code block still says "three-step pattern" (line 44) and "Why three steps instead of one?" (line 94). Both of these text references need to be updated to say "four" instead of "three". Additionally, the summary section after the code block currently lists 3 bullet points explaining the 3 steps — a 4th bullet should be added explaining the role of the new repair step.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

```

**Why three steps instead of one?**
**Why four steps instead of one?**
- The first run captures output for the agent to diagnose
- The agent step can iterate (read errors, fix, re-run) multiple times
- The final deterministic run is the gate — no agent judgment, just pass/fail
- The final deterministic run is still evidence-based, but a repair agent sees it before the workflow stops
- The last repair step keeps the workflow aligned with the agent-team model instead of ending on a fixable failure

## PGlite: In-Memory Postgres for Database Testing

Expand Down Expand Up @@ -170,9 +194,15 @@ Never trust that an agent edited a file correctly. Add a deterministic verify ga
dependsOn: ['edit-schema'],
command: `if git diff --quiet packages/web/lib/db/schema.ts; then echo "NOT MODIFIED"; exit 1; fi
grep "my_new_table" packages/web/lib/db/schema.ts >/dev/null && echo "OK" || (echo "MISSING"; exit 1)`,
failOnError: true,
failOnError: false,
captureOutput: true,
})
.step('fix-schema-verification', {
agent: 'impl',
dependsOn: ['verify-schema'],
task: `Fix the schema edit if verification failed. Output:\n{{steps.verify-schema.output}}`,
verification: { type: 'exit_code' },
})
```

**What to verify:**
Expand Down Expand Up @@ -268,6 +298,7 @@ const result = await workflow('my-feature')
.channel('wf-my-feature')
.maxConcurrency(3)
.timeout(3_600_000)
.repairable()

.agent('impl', { cli: 'claude', preset: 'worker', retries: 2 })
.agent('tester', { cli: 'claude', preset: 'worker', retries: 2 })
Expand All @@ -293,9 +324,15 @@ Only edit this one file.`,
type: 'deterministic',
dependsOn: ['edit-target'],
command: 'git diff --quiet path/to/file.ts && (echo "NOT MODIFIED"; exit 1) || echo "OK"',
failOnError: true,
failOnError: false,
captureOutput: true,
})
.step('fix-target-verification', {
agent: 'impl',
dependsOn: ['verify-target'],
task: `Fix the target edit if verification failed. Output:\n{{steps.verify-target.output}}`,
verification: { type: 'exit_code' },
})

// ── Phase 3: Test infrastructure ─────────────────────────────────
.step('install-pglite', {
Expand All @@ -311,7 +348,7 @@ Only edit this one file.`,
})
.step('create-tests', {
agent: 'tester',
dependsOn: ['create-test-helpers', 'verify-target'],
dependsOn: ['create-test-helpers', 'fix-target-verification'],
task: 'Create tests/my-feature.test.ts with <test descriptions>...',
verification: { type: 'file_exists', value: 'tests/my-feature.test.ts' },
})
Expand All @@ -335,13 +372,19 @@ Only edit this one file.`,
dependsOn: ['fix-tests'],
command: 'npx tsx --test tests/my-feature.test.ts 2>&1',
captureOutput: true,
failOnError: true,
failOnError: false,
})
.step('fix-tests-final', {
agent: 'tester',
dependsOn: ['run-tests-final'],
task: `If the final test rerun is red, fix and rerun until green. Output:\n{{steps.run-tests-final.output}}`,
verification: { type: 'exit_code' },
})

// ── Phase 5: Build + regression ──────────────────────────────────
.step('build-check', {
type: 'deterministic',
dependsOn: ['run-tests-final'],
dependsOn: ['fix-tests-final'],
command: 'npx tsc --noEmit 2>&1 | tail -20; echo "EXIT: $?"',
captureOutput: true,
failOnError: false,
Expand Down Expand Up @@ -370,7 +413,28 @@ Only edit this one file.`,
.step('commit', {
type: 'deterministic',
dependsOn: ['fix-regressions'],
command: 'git add <files> && git commit -m "feat: ..."',
command: [
'npx tsx --test tests/my-feature.test.ts',
'npm test',
'git add <files>',
'git commit -m "feat: ..."',
].join(' && '),
captureOutput: true,
failOnError: false,
})
Comment thread
coderabbitai[bot] marked this conversation as resolved.
.step('repair-commit', {
agent: 'impl',
dependsOn: ['commit'],
task: `If commit failed, fix the blocker, rerun the feature and regression tests, and create the commit.
If commit passed, confirm the commit subject.
Output:
{{steps.commit.output}}`,
verification: { type: 'exit_code' },
})
.step('verify-commit-created', {
type: 'deterministic',
dependsOn: ['repair-commit'],
command: 'git log -1 --pretty=%s | grep -q "^feat: " && echo "COMMIT_OK" || (echo "COMMIT_MISSING"; exit 1)',
captureOutput: true,
failOnError: true,
})
Expand All @@ -386,21 +450,22 @@ Only edit this one file.`,
| Tests exist | `file_exists` verification on test file |
| Tests actually run | Deterministic step executes them |
| Test failures get fixed | Agent step reads output, fixes, re-runs |
| Final test run is hard-gated | `failOnError: true` on last test step |
| Final test run is repairable | Deterministic rerun captures output, then a repair owner gets one more pass |
| Build passes | `npx tsc --noEmit` deterministic step |
| No regressions | Existing test suite runs after changes |
| Every edit is verified | `git diff --quiet` + grep after each agent edit |
| Commit only happens after all gates | `dependsOn` chains to final verification |
| Every edit is verified and repairable | `git diff --quiet` + grep after each agent edit, followed by a fix step |
| Commit only happens after green evidence | Final commit step reruns acceptance checks and commits only on zero exit codes |

## Common Anti-Patterns

| Anti-pattern | Why it fails | Fix |
|-------------|-------------|-----|
| Tests written but never executed | Agent claims they pass, they don't | Add deterministic `run-tests` step |
| Single `failOnError: true` test run | First failure kills workflow, no chance to fix | Use the three-step test-fix-rerun pattern |
| Single `failOnError: true` test run | First failure kills workflow, no chance to fix | Use repairable run-fix-rerun-final-fix loops |
| No regression test | New feature works, old features break | Run `npm test` after build check |
| Agent asked to "write and run tests" in one step | Agent writes tests, runs them, they fail, it edits, output is garbled | Separate write/run/fix into distinct steps |
| PGlite DDL doesn't match Drizzle schema | Tests pass on wrong schema | Derive DDL from schema.ts or test with real migration |
| `failOnError: false` on final test run | Broken tests get committed | Always `failOnError: true` on the gate step |
| Final test output not handed to an agent | Broken tests can stop the run or get ignored | Add a final repair owner before commit |
| Testing only happy path | Edge cases break in prod | Specify edge case tests in the task prompt |
| No verify gate after agent edits | Agent exits 0 without writing anything | Add `git diff --quiet` check after every edit |
| No verify gate after agent edits | Agent exits 0 without writing anything | Add `git diff --quiet` check after every edit, then route failures to a repair step |
| Committing after `failOnError: false` without checking exits | Broken work can be committed because the shell step returned successfully | In `commit-if-green`, record each exit code and skip commit unless all are zero |
Loading