-
Notifications
You must be signed in to change notification settings - Fork 0
Add spawn-cloud-swarm skill #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kjgbot
wants to merge
1
commit into
main
Choose a base branch
from
feat/spawn-cloud-swarm-skill
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,128 @@ | ||
| --- | ||
| name: spawn-cloud-swarm | ||
| description: Use when spawning a swarm of cloud agents that share the user's local working directory via relayfile. Orchestrates local-mount lifecycle (ensure -> monitor -> teardown) around N cloud.agent.spawn calls. Requires cloud.local-mount.* MCP tools and must be installed before using agent-relay cloud swarms. | ||
| tags: cloud, multi-agent, relayfile, mount, swarm | ||
| --- | ||
|
|
||
| # Spawn Cloud Swarm | ||
|
|
||
| ## When to Use | ||
|
|
||
| The user asks to spawn one or more cloud-hosted agents that need to read or | ||
| write files from the current working directory - the project on their laptop. | ||
| Examples: | ||
|
|
||
| - "spawn 4 cloud workers to fix these typecheck errors" | ||
| - "kick off a swarm in the cloud against this repo" | ||
| - "run claude in the cloud on this project" | ||
|
|
||
| This skill drives the full lifecycle so the user's host edits and the cloud | ||
| workers' edits stay in sync through relayfile. | ||
|
|
||
| ## Install Requirement | ||
|
|
||
| This skill is not bundled with `agent-relay`. Install it before trying to use a | ||
| cloud swarm: | ||
|
|
||
| ```bash | ||
| npx skills add https://github.com/agentworkforce/skills --skill spawn-cloud-swarm | ||
| ``` | ||
|
|
||
| If the skill is unavailable in the current assistant session, stop and ask the | ||
| user to install it with the command above before spawning workers. | ||
|
|
||
| ## Prereqs the Skill Enforces | ||
|
|
||
| Before spawning anything, confirm the host has the artifacts and MCP tools the | ||
| cloud spawn flow depends on. Use the relaycast MCP `cloud.*` tools to probe; do | ||
| not read the underlying files directly from this skill. | ||
|
|
||
| 1. The relaycast MCP surface exposes every tool this skill drives. Confirm | ||
| `cloud.local-mount.ensure`, `cloud.local-mount.status`, and | ||
| `cloud.local-mount.stop` are all registered on the MCP client. If any are | ||
| missing, surface `MCP_LOCAL_MOUNT_TOOLS_MISSING` and the verbatim | ||
| remediation: "Upgrade `@relaycast/mcp` to a build that includes | ||
| `cloud.local-mount.*` (see relaycast PR `feat/cloud-local-mount-tools`)." | ||
| Then stop. | ||
| 2. `cloud.json` exists at `~/.config/agent-relay/cloud.json` (proves | ||
| `agent-relay cloud login` has run). If missing, surface the verbatim | ||
| remediation code `NEEDS_CLOUD_LOGIN` and the exact command the user must | ||
| run: `agent-relay cloud login`. Then stop. | ||
| 3. `connections.json` exists at `~/.config/agent-relay/connections.json` | ||
| (proves `agent-relay cloud connect <provider>` has run for at least one | ||
| CLI). If missing, surface `NEEDS_CLI_CONNECTION` and instruct the user to | ||
| run `agent-relay cloud connect claude` (or codex, etc.). Then stop. | ||
| 4. The current working directory must be usable as a relayfile workspace. Call | ||
| `cloud.local-mount.status { localDir: CWD }` to read the current mount | ||
| state. A stopped mount is not a failure; `cloud.local-mount.ensure` starts | ||
| it in the procedure below. If the MCP returns `NEEDS_RELAYFILE_SETUP`, | ||
| surface that code and instruct the user to run | ||
| `relayfile setup --local-dir <CWD>`. Then stop. | ||
|
|
||
| All four remediation strings must be quoted verbatim so downstream tooling | ||
| can match on them. | ||
|
|
||
| ## Procedure | ||
|
|
||
| Once prereqs pass: | ||
|
|
||
| 1. Ensure local mount. Call | ||
| `cloud.local-mount.ensure { localDir: CWD }`. The response shape is | ||
| `{ mountPath, workspaceId, status: "running" | "started" }`. Record | ||
| `workspaceId`; every spawned agent must be pinned to this workspace so | ||
| their sandbox mounts the same tree the host is editing. | ||
|
|
||
| 2. Spawn workers. Loop N times calling `cloud.agent.spawn` with the user's | ||
| requested `cli`, `model`, `task`, and the `workspaceId` from step 1. On | ||
| `QUOTA_EXCEEDED` from the server, wait 10 seconds and retry the failed | ||
| spawn; allow up to 3 backoff cycles total for that worker before failing. | ||
| Surface a one-line note to the assistant for each successful spawn including | ||
| the returned agent name and sandbox URL. | ||
|
|
||
| 3. Monitor. Until every spawned agent has exited: | ||
| - Every 5 seconds call `cloud.local-mount.status { localDir: CWD }`. | ||
| If `conflictCount > 0`, surface a one-line warning naming the conflict | ||
| count and tell the user to inspect `.relay/conflicts/` in the local | ||
| working directory. | ||
| If `running` flips to false, surface a critical warning and stop the | ||
| poll loop; the mount has died and the workers can no longer see host | ||
| edits. | ||
| - Every 5 seconds call `cloud.agent.list`. When an agent transitions to | ||
| a terminal status (`completed`, `failed`, `killed`), report it once. | ||
| - Stop polling when the count of non-terminal agents reaches zero. | ||
|
|
||
| 4. Teardown. When the swarm has exited, ask the user one question: | ||
| "Persist the local mount for next time, or stop it now? [persist]" The | ||
| default (no answer / Enter / "persist") is to leave the mount running so | ||
| the next spawn is faster. On an explicit "stop" / "no" / "tear down", | ||
| call `cloud.local-mount.stop { localDir: CWD }` and confirm the response | ||
| shape `{ stopped: true }`. | ||
|
|
||
| ## Error Handling | ||
|
|
||
| | Error code | Where it surfaces | Verbatim remediation | | ||
| |---|---|---| | ||
| | `NEEDS_CLOUD_LOGIN` | Prereq 2 | "Run `agent-relay cloud login` to authenticate." | | ||
| | `NEEDS_CLI_CONNECTION` | Prereq 3 | "Run `agent-relay cloud connect claude` (or your CLI) to register a provider." | | ||
| | `NEEDS_RELAYFILE_SETUP` | Prereq 4 | "Run `relayfile setup --local-dir <CWD>` to register this directory as a relayfile workspace." | | ||
| | `MCP_LOCAL_MOUNT_TOOLS_MISSING` | Prereq 1 | "Upgrade `@relaycast/mcp` to a build that includes `cloud.local-mount.*` (see relaycast PR `feat/cloud-local-mount-tools`)." | | ||
| | `MOUNT_FAILED` | Step 1 | Report the MCP-returned error string verbatim and instruct the user to inspect `relayfile status`. Do not retry; the user's environment needs attention. | | ||
| | `QUOTA_EXCEEDED` | Step 2 | Internal: wait 10s and retry up to 3 cycles for the failed worker. If still failing, surface "Your cloud plan's concurrent worker limit was reached. Wait for a worker to finish or upgrade your plan." | | ||
| | `MOUNT_DIED` | Step 3 | "The local mount stopped while workers were running. Their later writes did not reach your host. Run `relayfile status` and `cloud.local-mount.ensure` before re-spawning." | | ||
| | `STOP_FAILED` | Step 4 | Report verbatim, then suggest `relayfile stop <workspaceId>` as a manual fallback. | | ||
|
|
||
| ## Boundaries | ||
|
|
||
| This skill does not: | ||
|
|
||
| - Spin up the relayfile mount process itself. That is the | ||
| `cloud.local-mount.ensure` MCP tool's job; the skill only orchestrates. | ||
| - Make decisions about how many workers to spawn. The user or calling | ||
| assistant chooses N. | ||
| - Persist any state between runs other than what relayfile / cloud already | ||
| persist. The skill is stateless across invocations. | ||
| - Edit any file in the user's working directory. Reads only what it needs | ||
| to confirm prereqs, and that read goes through the MCP, not direct fs. | ||
| - Touch the `cloud.*` MCP surface beyond `cloud.local-mount.ensure`, | ||
| `cloud.local-mount.status`, `cloud.local-mount.stop`, `cloud.agent.spawn`, | ||
| and `cloud.agent.list`. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 Prereq 1 claims to verify "every tool this skill drives" but only checks 3 of 5 tools
Prereq 1 opens with "The relaycast MCP surface exposes every tool this skill drives" then only verifies
cloud.local-mount.ensure,cloud.local-mount.status, andcloud.local-mount.stop. However, the Boundaries section (skills/spawn-cloud-swarm/SKILL.md:126-128) confirms the skill also drivescloud.agent.spawn(step 2) andcloud.agent.list(step 3). If either agent tool is missing from the MCP surface, the prereq check passes, the mount gets started in step 1, and the skill then fails at step 2 with no structured error code or verbatim remediation — leaving the user with an active mount but a broken spawn flow.Was this helpful? React with 👍 or 👎 to provide feedback.