Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion prpm.json
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@
},
{
"name": "orchestrating-agent-relay",
"version": "2.0.0",
"version": "2.1.0",
"description": "The canonical way to run agent-relay - self-bootstrap the broker and autonomously spawn, monitor, and coordinate a team of worker agents without human intervention. Covers infrastructure startup, agent spawning, lifecycle monitoring, CLI-first reading, and team coordination.",
"format": "claude",
"subtype": "skill",
Expand Down
133 changes: 131 additions & 2 deletions skills/orchestrating-agent-relay/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,12 +166,68 @@ agent-relay down

**`agent-relay replies <agent>` is the canonical command for reading worker
DM replies** — it returns full text, sender-attributed, in chronological
order, with no truncation. Add `--json` for machine-readable output (full
text plus a `direction` field).
order, with no truncation. Add `--json` for machine-readable output.

`inbox --agent <name>` is legacy unread-only behavior; once read, entries
disappear. Prefer `replies` for a persistent, complete view.

#### `replies --json` schema (read this before writing a monitor)

Verified against the agent-relay CLI source (`replies` command). When there
**is** a conversation, `--json` prints a JSON array of message objects:

```json
[
{
"id": "01J...",
"from": "Implementer",
"to": "orchestrator",
"text": "ACK — starting on the auth module",
"createdAt": "2026-05-19T14:02:11.000Z",
"direction": "inbound"
}
]
```

`unread` (boolean) and/or `unread_state: "unknown"` may also be present
depending on read-state availability. Footguns that will silently break a
naive monitor:

- **The timestamp field is `createdAt`, not `ts`/`timestamp`.** It is an
ISO-8601 string.
- **In `replies --json`, `direction` is always the literal `"inbound"`** — it
is hard-coded, because `replies` only ever returns messages _from_ the
named agent. It is never `"incoming"`, `"from"`, `"in"`, nor `"outbound"`.
Filtering on `direction == "inbound"` is harmless but redundant; filtering
on any other literal yields a monitor that runs forever and never sees the
ACK or DONE. (`"outbound"` only appears in `history --to <agent> --json`,
which includes messages you sent — see below.)
- **The empty state is a plain string, not `[]`.** When there is _no
conversation at all_, the command prints the literal line
`No DM conversation with <Name>.` (exit 0) — not JSON. (If a conversation
exists but no messages match the filters, `--json` does emit a valid `[]`.)
Piping the no-conversation case straight into `jq` errors out. Guard for it:

```bash
out=$(agent-relay replies Implementer --json)
case "$out" in
"No DM conversation with"*|"") echo "no replies yet" ;;
*) echo "$out" | jq -r '.[] | "\(.createdAt) \(.direction) \(.text)"' ;;
esac
```

- **Build monitors defensively: emit-all, then eyeball.** Print every entry
with its `direction` and `createdAt` rather than hard-filtering inside
`jq`. A monitor that shows everything beats one that silently drops the
message you were waiting for because an assumption about the schema was
wrong.

`history --to <agent> --json` uses the same object shape (`id`, `from`, `to`,
`text`, `createdAt`, `direction`) but `direction` is computed:
`"outbound"` for messages you (the reader identity) sent, `"inbound"` for the
agent's. Use it when you need both sides of the thread, not just the agent's
replies.

```bash
# WRONG — history (no flags) will not show DM replies from workers
agent-relay history
Expand Down Expand Up @@ -297,13 +353,82 @@ Release when done:
## Protocol
- Workers will ACK when they receive tasks
- Workers will send DONE when complete
- Tell every worker explicitly: do NOT self-remove/release after DONE — stay
alive and idle so you can DM them review findings to fix
- After DONE, run a reviewer; on NO-GO, DM the findings back to the SAME
worker. If the worker is gone, spawn a fresh one and re-inject branch +
commit SHA + the full verdict
- Parse `replies --json` defensively: `direction` is always `"inbound"`,
timestamp is `createdAt` (not `ts`), and the no-conversation state is a
plain string, not `[]`
- Poll `agent-relay who --json` for worker liveness; set a wall-clock fallback
so a silently-dead worker can't hang the loop
- Use `agent-relay agents:logs <name>` to monitor progress
- Use `agent-relay replies <name>` to read a worker's DM replies (full text, chronological, persistent); add `--json` to parse
- Use `agent-relay history --to <name>` for the full DM conversation thread (read + unread)
- Use `agent-relay history --to '#general' --json` to see channel message flow
- Do NOT use `agent-relay history` alone to check worker replies — it only shows channel posts, DM replies are invisible there
```

## Multi-Round Review Loops (DONE → NO-GO → fix → re-review)

Spawning, monitoring, and releasing a worker is the easy path. The hard part
the basic flow does **not** cover: a worker reports DONE, a reviewer comes
back NO-GO, and now the work has to go back. Plan for this topology before you
spawn anything.

### Workers must not self-remove until you tell them

A worker's natural hygiene instinct is to call `agent.remove` on itself right
after reporting DONE. That **kills the review→fix→re-review loop**: when the
reviewer returns NO-GO there is no agent left to send the findings to, so you
are forced to spawn a fresh worker and re-inject the entire context (branch,
commit, full verdict) instead of just DMing the existing one.

**Put this in every implementer/worker task prompt explicitly:**

```text
Do NOT call agent.remove / agent-relay release on yourself. Report DONE and
stay alive and idle. The orchestrator will send you review findings to fix,
or release you when the work is fully accepted. Self-removing before then
breaks the fix loop.
```

The "release when done" guidance elsewhere in this skill applies to the
**orchestrator** releasing workers — never to a worker releasing itself
mid-loop.

### The respawn-with-full-context fallback

If a worker did self-remove (or died), you cannot just DM it. Spawn a fresh
worker and re-inject everything it needs to act with no prior memory:

```bash
agent-relay spawn Implementer2 codex "Continuation of prior work. \
Branch: feature/auth. Last commit: <sha>. \
The reviewer returned NO-GO with these findings: <full verdict text>. \
Check out the branch, address every finding, re-run tests, report DONE. \
Do NOT self-remove — stay alive for re-review."
```

Always pass branch + commit SHA + the **complete** reviewer verdict. A fresh
worker has none of the loop's history; a summarized verdict loses the
specifics it needs to fix.

### Detecting a silently-dead worker

Monitors fire on **DMs only**. A worker that exits or self-removes produces no
DM, so the monitor just goes quiet — indistinguishable from a worker still
thinking. Defenses:

- Poll `agent-relay who --json` for liveness instead of inferring it from DM
silence. A worker that vanishes from `who` is gone.
- `agent-relay agents:logs <name>` will show a self-issued `agent.remove` /
release call — but it is noisy TTY scraping, a last resort, not a signal.
- Always set a wall-clock fallback (e.g. a ScheduleWakeup ~30 min out) so a
silently-dead worker can't hang the loop forever waiting on a DM that will
never arrive.

## Lifecycle Events

The broker emits these events (available via SDK subscriptions):
Expand Down Expand Up @@ -335,6 +460,10 @@ The broker emits these events (available via SDK subscriptions):
| Need to see unread DM content | `inbox_check` / `inbox --agent` only return counts or clear on read, and the MCP `message_dm_list` tool requires a registered identity you don't have. Use `agent-relay replies <name> --json` |
| Re-reading already-read replies | `agent-relay replies <name>` is a persistent view (not unread-only); use `--since <time>` to narrow, or `agent-relay history --to <name>` for the full thread |
| Sent to wrong destination | `agent-relay send Worker1 "..."` = DM; `agent-relay send '#general' "..."` = channel broadcast. The `#` prefix is required for channels |
| Monitor never sees ACK/DONE | In `replies --json`, `direction` is always the literal `"inbound"` (never `"incoming"`/`"from"`/`"outbound"`); timestamp field is `createdAt`, not `ts`. See the `replies --json` schema section |
| `jq` errors on empty `replies --json` | Empty state is the plain string `No DM conversation with <Name>.`, not `[]`. Guard before piping to `jq` |
| Worker self-removed; can't send review fixes | Instruct workers not to self-remove until told. If already gone, spawn a fresh worker and re-inject branch + commit SHA + full verdict (see Multi-Round Review Loops) |
| Worker died silently; loop hangs | DM monitors fire on DMs only. Poll `agent-relay who --json` for liveness and set a wall-clock fallback (~30 min ScheduleWakeup) |

## Prerequisites

Expand Down