agent-runner

Warning

Agent Runner is early experimental software. Its CLI, config schema, manifest schema, hooks, daemon API, web UI behavior, and internal workflow contracts may change without compatibility guarantees. Expect migrations and breaking changes while the project is still taking shape.

agent-runner runs a coding agent against a structured checklist and keeps the run going — re-prompting and retrying — until every task is actually done, blocked, or out of retries. Instead of trusting the agent's final "all done" message, it inspects the task list the agent updated and acts on what it finds.

Every run is recorded durably: per-task final status, the agent's notes, and an audit trail of how the run got there. When a run ends you get a structured pass/fail result and exit code, not a chat log to re-read.

Want to keep working with the agent outside agent-runner? Copy the backend session id from the run's detail pane in the dashboard and resume the conversation directly in the underlying tool — Claude, Codex, and the other backends keep their own native sessions.

An optional local daemon serves a web dashboard for live run state — the runs board, run detail, and a mobile layout.

Board view	List view	Mobile

Why

If you've used a coding agent for any non-trivial task, you've seen this loop:

You give the agent a list of things to do.
The agent confidently announces "all done!"
You check, and two of the five things weren't actually done.
You write another prompt: "you didn't finish X and Y, try again."
Repeat.

agent-runner wraps that loop. The task list is structured — each task has a stable id, a title, and a status the agent updates in place. The runner inspects state after every turn, and a partial completion becomes another iteration with a programmatic nudge instead of a hand-typed follow-up. When the agent gets it right, the run ends and the runner emits a structured record with the per-task final statuses and the agent's notes.

It is also a useful primitive for orchestration: an outer agent can compose an assignment, hand it to agent-runner, and get back a structured success/failure without parsing free-form chat output.

Two ways to use it

Sidecar mode — initialize and inspect runs here, but drive the work from your existing interactive coding tool, updating task state through the task CLI. You get task tracking, briefs, attachments, and durable run state without handing execution over to agent-runner.
Active backend mode — execute the run through a supported backend (claude, codex, cursor, opencode, pi). agent-runner performs the run/retry loop itself and validates whether tasks were actually marked complete before the run is treated as done. Backend-native capabilities (skills, subagents, MCP servers, custom slash commands) keep working — agent-runner controls when and how the backend is invoked, not what it does once running.

Validation today is task-state based: did the worker actually complete the checklist it was given? Assignments can also declare deterministic hooks that run at prepare time, around attempts, or during task transitions — see Beyond the basics.

How it works

Two definitions and one record:

An agent supplies the backend, model, and role instructions.
An assignment supplies a reusable task list and work context.
A run is the persisted execution instance created from one agent and (optionally) one assignment. Task state is canonical in the run's run.json manifest; workers mutate it through the agent-runner task CLI, not workspace files.

Start with docs/concepts.md for the full mental model.

Standalone CLI or local daemon

agent-runner works as a standalone CLI with no daemon. Commands run embedded — they read and write run state directly on the filesystem, so a single terminal needs nothing else running.

A local daemon is optional. agent-runner serve adds the browser dashboard and broadcasts run changes in real time, so multiple terminals and the web UI stay in sync. The recommended local setup runs one daemon and points the CLI at it:

export AGENT_RUNNER_LISTEN=ws://127.0.0.1:4773/
export AGENT_RUNNER_CONNECT=ws://127.0.0.1:4773/
export AGENT_RUNNER_MAX_CALL_DEPTH=2

For persistence, put those exports in the startup file for the shell or service that launches agent-runner: ~/.bashrc for interactive Bash, ~/.zshrc for Zsh, ~/.config/fish/config.fish for Fish, or a systemd user service / environment file when agent-runner serve is supervised. See docs/configuration.md.

AGENT_RUNNER_LISTEN is the address agent-runner serve binds; AGENT_RUNNER_CONNECT routes CLI commands through that daemon so its projections and the dashboard stay current. AGENT_RUNNER_MAX_CALL_DEPTH=2 raises the recursion cap one level so a run can launch a nested agent-runner run, which is useful for orchestration. With AGENT_RUNNER_CONNECT exported, keep agent-runner serve running — connected commands fail fast if the daemon is unreachable. For pure standalone use, leave AGENT_RUNNER_CONNECT unset.

Scope and direction

agent-runner is an orchestration and state-tracking layer for agent runs, not an interactive coding environment. Upstream design and ideation happen elsewhere; agent-runner takes a plan (or the requirements to produce one), runs it, and surfaces durable state, audit, and structured handoffs at each user gate. It does not replace the interactive tool you use to converse with an agent in the moment.

See docs/scope.md for the full product stance, including a triage heuristic for evaluating feature requests against scope.

Install

Requirements:

Node.js 20.19+ or 22.12+
a supported backend when you want live execution: claude, codex (or a Codex app-server), cursor-agent, opencode, or pi

Option 1: local build / linked development install

npm install
npm run build
npm link --workspace @kcosr/agent-runner

The built CLI entrypoint is node apps/cli/dist/cli.js. The workspace also exposes npm run agent-runner -- <args>.

Option 2: package-style invocation

Once agent-runner is published as a package, the intended no-install path is:

npx @kcosr/agent-runner <args>

Quickstart

Set up your config directory

Named definitions and trusted extension code — agents, assignments, tasks, and hooks — live in your config directory (~/.config/agent-runner/ by default). Copy the bundled definitions there from the repository root so you can refer to them by name from any directory:

mkdir -p ~/.config/agent-runner
cp -R agents assignments tasks hooks ~/.config/agent-runner/

--agent implementer and --assignment repo-orientation now resolve by name, and the same directory is where you author your own definitions. A --agent / --assignment value is instead treated as a file path when it is absolute or starts with ./ or ../, so you can still point at a definition outside the config directory.

The bundled definitions are working examples shaped to this project's own development workflow — several assignments perform real side effects such as creating git worktrees, pushing branches, opening pull requests, or merging after approval. Read what an agent or assignment does before you run it, and treat the bundled set as a starting point for definitions of your own rather than a required path.

The repository also ships skills under skills/ for the coding agent that drives agent-runner — they seed plan-feature and plan-implement-feature runs. Copy them into your coding agent's skills directory (for example ~/.agents/skills/) to make them available:

mkdir -p ~/.agents/skills
cp -R skills/* ~/.agents/skills/

The same caveat applies — read a skill before using it and adapt it to your own setup.

Smoke-check without a backend

agent-runner init --backend passive --assignment test --name smoke
agent-runner run brief <run-id>
agent-runner task list <run-id>

This does not invoke Claude, Codex, or any other backend. init prints the new run id; use it in the follow-up commands to confirm definition loading, run creation, brief rendering, and task-state reads.

Run an agent against an assignment

agent-runner run --agent implementer --assignment repo-orientation

This requires the selected backend from agents/implementer/agent.md to be installed and authenticated. The runner executes that backend, inspects task state after each turn, retries incomplete work, and exits with a status code reflecting the outcome. The text output prints the new run id; use that id with the inspection commands below.

Inspect a run

agent-runner status                       # system / environment context
agent-runner run status <run-id>          # lifecycle and task state
agent-runner run brief <run-id>           # the worker handoff
agent-runner run audit <run-id>           # persisted audit history
agent-runner task list <run-id>
agent-runner task show <run-id> <task-id>

Prepare a run without executing it

agent-runner init --agent implementer --assignment repo-orientation

agent-runner run ready <run-id>
agent-runner run --resume-run <run-id>

init creates the run in the initialized state, run ready promotes it to ready, and run --resume-run starts it. An initialized run can still be adjusted before it starts with run reconfigure — see docs/cli.md.

Drive a run yourself (sidecar mode)

agent-runner init \
  --backend passive \
  --assignment plan-feature \
  --name "Web dashboard" \
  "Design the dashboard work"

agent-runner run brief <run-id>
agent-runner task set <run-id> <task-id> --status in_progress
agent-runner task append-notes <run-id> <task-id> --text "Observed ..."
agent-runner task set <run-id> <task-id> --status completed

A passive run invokes no backend. You do the work in your own tool and report progress through the task CLI; agent-runner keeps the durable record. See docs/backends.md for the full passive / external-driver workflow.

Use the local daemon

For the live dashboard and run state shared across terminals, export the recommended environment and start one daemon:

agent-runner serve
# Serves the daemon and web dashboard; open the printed HTTP URL.

Leave that running. In another terminal — with the same environment exported — ordinary commands route through that daemon (via AGENT_RUNNER_CONNECT), so the dashboard and every CLI client stay in sync:

agent-runner run --agent implementer --assignment repo-orientation
agent-runner run status <run-id>

Run only one daemon. The web UI talks to that same daemon and is not a standalone app. See docs/web-dashboard.md.

Beyond the basics

agent-runner has a deeper feature set than the quickstart shows. Each row links to the section that documents it:

Topic	What it does	Where it's documented
Hooks	Deterministic checks around attempts and task transitions	docs/hooks.md
Custom backends	Author your own backend module	docs/custom-backends.md
Launchers	Wrap subprocess backends (e.g. SSH into a worker)	docs/agents-and-assignments.md (Launcher definitions)
Container environments	Run inside a managed container	docs/execution-environments.md
Scheduling	One-time and recurring (cron) runs	docs/runs.md (Scheduled runs)
Queued messages	Queue resume messages for a live run	docs/resume.md (Queued resume messages)
Attachments	File handoff between runs	docs/attachments.md
Dependencies	Gate a run until upstream runs succeed	docs/dependencies.md
Connected mode	Route CLI commands through the daemon, optionally over SSH	docs/daemon.md (CLI clients)

Command index

Command	Purpose
`run`	Execute a fresh run, promote an initialized run to ready, start a ready run, or resume
`init`	Prepare a run workspace without invoking the backend
`serve`	Start the local daemon (WS JSON-RPC + HTTP/SSE + web UI)
`status`	Print system/environment status
`run status\|brief\|audit`	Print run state, the composed worker handoff, or persisted audit history
`run environment status\|validate\|cleanup`	Inspect, validate, or clean up a run execution environment
`task list\|show\|set\|append-notes\|add`	Run task-state inspection and mutation
`attachment add\|list\|download\|remove`	Attachment management
`list agents\|assignments\|launchers\|environments\|tasks\|runs`	Enumerate definitions and runs
`show agent\|assignment\|launcher\|environment\|task`	Render a single definition
`run reconfigure`	Patch vars/message on an unarchived initialized run
`run queue-message\|queued-messages\|remove-queued-message`	Manage queued resume messages for live runs
`run reset\|archive\|unarchive\|delete`	Lifecycle mutations
`run schedule`, `run schedule enable\|disable\|clear`	Schedule mutations
`run set-name`	Set/clear persisted display name
`run set-note\|clear-note`	Set/clear persisted human note metadata
`run pin\|unpin`	Set/clear persisted pin metadata
`run set-backend-session\|clear-backend-session`	Passive-only session metadata
`run set-group\|clear-group`	Set/clear a run's group
`run add-dep\|remove-dep\|clear-deps`	Dependency graph mutations

See docs/cli.md for the full flag-by-flag reference, including per-command rules and JSON output shapes.

Documentation

Start with docs/concepts.md for the mental model. The rest are focused topic pages:

Doc	Topic
docs/concepts.md	Mental model — agents, assignments, runs, briefs
docs/agents-and-assignments.md	Definition format, locked fields, prompt composition
docs/hooks.md	Hook phases, built-in hooks, the authoring API
docs/tasks.md	Task model, status values, task CLI, mutation rules
docs/runs.md	Workspace layout, manifest, lifecycle, capabilities
docs/scope.md	Product scope, non-goals, and feature triage stance
docs/variables.md	Typed vars, resolution, interpolation, redaction
docs/resume.md	Resume rules, ready-start, retry nudges
docs/dependencies.md	Dependency graph and execution gate
docs/attachments.md	File handoff, run group scope, limits
docs/backends.md	Built-in backends, selection, per-backend notes
docs/custom-backends.md	Authoring a custom backend module
docs/execution-environments.md	Container execution environments — definition and lifecycle
docs/configuration.md	Env vars, XDG roots, manifest upgrades
docs/cli.md	Full CLI reference — every command and flag
docs/daemon.md	Control plane, HTTP/SSE, JSON-RPC
docs/web-dashboard.md	Bundled browser UI
docs/examples.md	Bundled agents and assignments

Exit codes

Code	Meaning
`0`	Success (all tasks completed, or initialized run)
`1`	Retries exhausted with incomplete tasks
`2`	One or more tasks reported blocked
`3`	Validation, config, or daemon connectivity error
`4`	Backend or runtime failure
`130`	User cancellation / confirmed interrupt

Environment variables

Variable	Effect
`AGENT_RUNNER_CONFIG_DIR`	Agent/assignment definitions root
`AGENT_RUNNER_STATE_DIR`	Run workspaces root
`AGENT_RUNNER_CMD`	CLI command string injected into generated worker instructions and child-run templates
`AGENT_RUNNER_CONNECT`	Route client commands through a daemon
`AGENT_RUNNER_CONNECT_HOST`	SSH host used to create an invocation-scoped local forward for connected commands
`AGENT_RUNNER_CONNECT_LOCAL_PORT`	Loopback port for the `AGENT_RUNNER_CONNECT_HOST` SSH forward
`AGENT_RUNNER_LISTEN`	Daemon listen URL
`AGENT_RUNNER_DAEMON_AUTH_ENABLED`	Set to `true` in the daemon environment to require bearer-token auth for daemon API and WebSocket access
`AGENT_RUNNER_DAEMON_TOKEN`	Shared daemon bearer token for auth-enabled daemon servers and clients
`AGENT_RUNNER_WEB_BASE_PATH`	External mount path for the bundled web dashboard when served behind a reverse proxy, for example `/agent-runner`
`AGENT_RUNNER_DAEMON_FILESYSTEM_LOCKS`	Set to `true` to make daemon projection refreshes wait on task-state filesystem locks
`AGENT_RUNNER_PARENT_RUN_ID`	Default lineage parent for fresh runs when `--parent-run` is omitted; detached daemon children notify this parent on completion unless opted out, and `--no-inherit-run-group` keeps lineage while starting a singleton run group
`AGENT_RUNNER_RUN_ID`	Active run id provided to backend wrapper processes
`AGENT_RUNNER_RUN_GROUP_ID`	Default run group for fresh runs when `--group-id` is omitted; active run group id provided to backend wrapper processes
`AGENT_RUNNER_CWD`	Active backend attempt cwd provided to backend wrapper processes
`AGENT_RUNNER_CLAUDE_BIN`	Claude CLI binary
`AGENT_RUNNER_CODEX_BIN`	Codex stdio binary
`AGENT_RUNNER_CODEX_UDS_PATH`	Default WebSocket-over-UDS transport socket path for fresh Codex runs when no explicit `backendConfig.codex.transport` was authored
`AGENT_RUNNER_CODEX_WS_URL`	Default websocket transport for fresh Codex runs when no explicit `backendConfig.codex.transport` was authored
`AGENT_RUNNER_CURSOR_BIN`	Cursor CLI binary
`AGENT_RUNNER_OPENCODE_BIN`	OpenCode CLI binary
`AGENT_RUNNER_OPENCODE_DATA_DIR`	OpenCode data directory for session-history validation/sync; falls back to `OPENCODE_DATA_DIR`
`AGENT_RUNNER_CAPTURE_BACKEND_STDOUT`	Write raw backend stdout sidecars to `attempts/NN.stdout.log` for local debugging
`AGENT_RUNNER_BACKEND_SESSION_SYNC`	Set to `false`, `0`, `no`, or `off` to disable backend-owned session history import/sync
`AGENT_RUNNER_MIN_SCHEDULE_DELAY_SEC`	Minimum accepted one-time schedule delay (default `300`)
`AGENT_RUNNER_MIN_RECURRENCE_INTERVAL_SEC`	Minimum accepted recurring schedule interval, sampled across cron occurrences (default `300`)
`AGENT_RUNNER_PI_BIN`	Pi CLI binary
`PI_HOME`	Pi session storage root (default `~/.pi`)
`AGENT_RUNNER_MAX_CALL_DEPTH`	Recursion cap (default `1`)

See docs/configuration.md for XDG resolution and full details.

Bundled definitions

Agents (under agents/):

generic, planner, implementer, code-reviewer, doc-reviewer, test

Assignments (under assignments/):

repo-orientation, test, plan-feature, plan-implement-feature, plan-review, code-review, code-review-direct, doc-review, familiarize

Shared task definitions (under tasks/):

reusable review/architecture through review/docs-drift code-review dimensions used by both code-review assignments
reusable feature-plan/* and feature-implement/* task definitions used by bundled feature-planning and single-run implementation flows
inspect reusable task definitions with agent-runner list tasks and agent-runner show task <name|path>; these are read-only definition surfaces, distinct from agent-runner task ... run task-state commands

Walkthrough in docs/examples.md.

Development

npm install
npm run build
npm run lint
npm run lint:fix
npm run format
npm run format:check
npm run imports:fix
npm run imports:check
npm run test:node
npm run test:web
npm run test:all:local
npm run check:knip
npm run check

npm run lint runs Biome linting with warnings treated as failures, and npm run lint:fix applies Biome lint autofixes. npm run format writes Biome formatting, npm run format:check checks formatting without writing, npm run imports:fix applies Biome import organization, and npm run imports:check verifies import organization without writing. npm run test:all:local runs the Node and web tests locally. npm run check:knip runs the unused-file/export/dependency baseline. For the standard pre-commit gate, run npm run check. npm test runs build plus tests, and npm run check runs build, lint, format-check, import-check, and tests. Set AGENT_RUNNER_TEST_REMOTE_HOST to sync the worktree and run the test gate on a remote host; otherwise tests run locally.

Primary entry points:

apps/cli/src/cli.ts
apps/cli/src/daemon/
packages/core/src/core/run/run-loop.ts
packages/core/src/core/run/manifest.ts
packages/core/src/core/commands/service.ts

Roadmap

Directions agent-runner is likely to grow in next. Nothing below is implemented yet — the list is here so you can see where things are heading and flag anything that conflicts with how you're using agent-runner today.

More backends — Gemini, ACP-style integrations, and an in-process SDK client (or similar) so callers can embed a backend directly instead of always shelling out to a CLI or RPC server.
Pluggable storage backend — today the run manifest and workspace live on the filesystem. A sqlite or postgres backend would make larger run populations, richer queries, and multi-host scenarios tractable.
Definitions and run creation from the web dashboard — browse and manage agents and assignment templates in the UI, and kick off new runs from there instead of dropping to the CLI.
Richer attachment previews in the web dashboard — text and Mermaid previews render inline today; image and PDF previews would close the loop.
Improved run provenance tracking — today manifests record which host/controller executed the latest session (execution.hostMode, execution.controller.daemonInstanceId). A richer audit trail — who/what originally launched the run, which parent spawned which nested child, which caller issued which mid-run mutation — would make cross-run forensics and orchestration replay tractable.
External webhook support — emit run lifecycle events (run_started, run_finished, attempt_started, per-task updates, etc.) to configured external HTTP endpoints so agent-runner can notify CI systems, chat-ops bots, or dashboards without each consumer having to subscribe over SSE directly.
Agent and assignment inheritance — let an agent or assignment declare an extends: parent and inherit frontmatter defaults, locked fields, role instructions, and (for assignments) task lists. Child definitions could override individual fields, append tasks, or redact inherited locks, avoiding today's copy-paste when you want a family of related agents/assignments that share a base.

Related project

agent-pack is a lighter, more standalone context and task management tool that borrows and improves on concepts from agent-runner.

Acknowledgements

agent-runner referenced these projects during early design and for style bootstrapping:

earendil-works/pi
pingdotgg/t3code
openai/codex
The web dashboard Diffs view is powered by Pierre's @pierre/diffs and @pierre/trees packages.

Name		Name	Last commit message	Last commit date
Latest commit History 910 Commits
.husky		.husky
agents		agents
apps		apps
assignments		assignments
docs		docs
hooks/derive-worktree-vars		hooks/derive-worktree-vars
packages/core		packages/core
scripts		scripts
skills		skills
tasks		tasks
test		test
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
knip.json		knip.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.base.json		tsconfig.base.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agent-runner

Why

Two ways to use it

How it works

Standalone CLI or local daemon

Scope and direction

Install

Option 1: local build / linked development install

Option 2: package-style invocation

Quickstart

Set up your config directory

Smoke-check without a backend

Run an agent against an assignment

Inspect a run

Prepare a run without executing it

Drive a run yourself (sidecar mode)

Use the local daemon

Beyond the basics

Command index

Documentation

Exit codes

Environment variables

Bundled definitions

Development

Roadmap

Related project

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

agent-runner

Why

Two ways to use it

How it works

Standalone CLI or local daemon

Scope and direction

Install

Option 1: local build / linked development install

Option 2: package-style invocation

Quickstart

Set up your config directory

Smoke-check without a backend

Run an agent against an assignment

Inspect a run

Prepare a run without executing it

Drive a run yourself (sidecar mode)

Use the local daemon

Beyond the basics

Command index

Documentation

Exit codes

Environment variables

Bundled definitions

Development

Roadmap

Related project

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages