feat: parallel multi-agent runs, stop button, grid layout by proofmancer · Pull Request #17 · proofmancer/argus

proofmancer · 2026-05-25T11:31:02Z

Closes #6.

Summary

Grid layout: workspace page renders agents in a 2-column grid on xl+ screens, single column below. Hit Run on multiple agents and watch them stream side-by-side.
Process registry: new src/lib/run-registry.ts is a process-local Map<runId, ChildProcess>, cached on globalThis for hot-reload survival. Auto-cleans on child exit. Single-process only (revisit if Argus ever runs multi-process).
Detached child + group kill: runAgent now spawns with detached: true so SIGTERM hits the whole process group, killing any tool subprocesses claude itself spawned.
onChild callback: runAgent exposes the spawned child via an optional onChild callback. The run route uses it to register the child and to detect SIGTERM-vs-natural-exit (so it can record status='cancelled' instead of 'failed').
Stop endpoint: new POST /api/runs/[id]/stop. Looks up the runId, SIGTERMs the process group, returns { ok, stopped }. Idempotent: stopping an already-finished run returns 200 with stopped: false.
UI: AgentPanel captures the runId from the run:start SSE frame. While running, a red Stop button appears next to Run. Cancelled runs render in amber in both the live event log and the history panel.

What's still single-process

The registry is in-memory, so a next start restart loses the map and any leftover child becomes an orphan (cleaned up by the OS or by exiting on its own). Documented in the file. Fine for v1 local single-user; revisit if argus ever runs distributed.

Test plan

Create two agents in the same workspace. Hit Run on both back-to-back. Both panels stream output independently.
On a long-running agent, click Stop. The output ends with run cancelled in amber, and the history row shows cancelled (not failed).
Stop, then try Stop again (after the run already exited): the endpoint returns 200 without erroring.
Workspace with 3+ agents lays out as 2 columns on a wide screen, stacks to 1 below xl breakpoint.

Summary by CodeRabbit

Release Notes

New Features
- Added Stop button to cancel running agents mid-execution
- New endpoint to stop and cancel active runs
Improvements
- Redesigned agents section with responsive grid layout and descriptive text
- Run status tracking now distinguishes cancelled runs with proper visual styling (amber)
- Stop action is idempotent—safely works on both active and completed runs

coderabbitai · 2026-05-25T11:31:16Z

📝 Walkthrough

Walkthrough

This PR implements run termination by adding a process registry to track spawned child processes, updating the run endpoint to register children and track cancellation, adding a stop API endpoint, and updating the frontend to trigger termination and display cancelled status.

Changes

Run lifecycle tracking and termination

Layer / File(s)	Summary
Run registry infrastructure `src/lib/run-registry.ts`	Process-local in-memory registry stored on `globalThis` maps `runId` to spawned child processes. Provides `registerRun` (with auto-cleanup on exit), `unregisterRun`, `isRunning`, and `killRun` (sends SIGTERM to process group).
Process group configuration for termination `src/lib/claude.ts`	`AgentRunOptions` now accepts optional `onChild` callback. Child process spawned with `detached: true` to run in own process group; `runAgent` invokes `onChild(child)` after spawn.
Run endpoint integration with registry `src/app/api/agents/[id]/run/route.ts`	Imports registry functions, initializes `cancelled` flag, wires `onChild` handler to register child and listen for SIGTERM exit. Computes final `status` as `cancelled` if flag is set; otherwise `completed`/`failed` based on exit code.
Stop endpoint for termination `src/app/api/runs/[id]/stop/route.ts`	New `POST /api/runs/[id]/stop` checks `isRunning(id)`, calls `killRun(id)` if running, returns `{ ok: true, stopped: boolean }` with idempotent semantics.
Frontend stop control and run tracking `src/components/AgentPanel.tsx`	Tracks current `runId` state. Parses `run:start` events to extract `runId`. Implements `stop()` to POST to stop endpoint using `runId`. Clears `runId` on completion. Stop button renders conditionally when `running` and enabled when `runId` is known.
Status display for cancelled runs `src/components/AgentPanel.tsx`, `src/components/RunHistory.tsx`	`EventLine` computes `color` from `status` field. `RunHistory` applies amber styling when `status` is `running` or `cancelled`.
Workspace grid layout for multiple agents `src/app/workspace/[id]/page.tsx`	Agents section adds descriptive paragraph and switches container from vertical flex to responsive two-column grid. Empty-state card spans two columns on extra-large screens.

Sequence Diagram(s)

The PR implements a complete run lifecycle from spawn through potential termination. The backend spans a detached child process, registers it, and tracks cancellation state. The frontend captures the run ID from the first SSE event, enables a Stop button, and can trigger termination via the stop endpoint.

sequenceDiagram
  participant AgentPanel as Frontend: AgentPanel
  participant RunAPI as Backend: run endpoint
  participant Registry as Registry
  participant Child as Child Process
  participant StopAPI as Backend: stop endpoint
  
  AgentPanel->>RunAPI: POST /api/agents/[id]/run (SSE)
  RunAPI->>Child: spawn(claude, { detached: true, onChild })
  Child->>Registry: registerRun(runId, child)
  Registry->>Child: attach exit listener
  RunAPI->>AgentPanel: run:start { runId }
  AgentPanel->>AgentPanel: capture runId state
  AgentPanel->>AgentPanel: enable Stop button
  AgentPanel->>StopAPI: POST /api/runs/[id]/stop
  StopAPI->>Registry: isRunning(runId)
  Registry-->>StopAPI: true
  StopAPI->>Registry: killRun(runId)
  Registry->>Child: process.kill(-pid, SIGTERM)
  Child->>Registry: exit with signal=SIGTERM
  Registry->>Registry: set cancelled flag
  RunAPI->>AgentPanel: run:end { status: cancelled }
  AgentPanel->>AgentPanel: clear runId

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 Processes spawned in groups, now tracked and tamed,
With SIGTERM's gentle hand and a registry named,
Stop buttons blooming on grids of green,
The finest parallel agents ever seen! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly reflects the main changes: parallel multi-agent runs (grid layout enabling side-by-side), stop button (per-pane process control), and grid layout (workspace UI redesign).
Linked Issues check	✅ Passed	All core requirements from issue `#6` are met: backend spawns per-run child processes with independent SSE channels [run route], UI shows grid layout for concurrent viewing [workspace page], and Stop button kills process groups [stop endpoint + AgentPanel UI].
Out of Scope Changes check	✅ Passed	All changes directly support issue `#6` objectives: run registry and detached spawning enable process control, RunHistory styling supports cancelled state visibility, and component refactors support the stop functionality.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/parallel-runs

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/components/RunHistory.tsx (1)

135-137: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Render an explicit cancelled label instead of exit … for cancelled runs.

The amber color helps, but showing exit … hides the cancellation outcome in text.

💡 Proposed fix

         <span className={`font-mono text-[10px] ${statusColor}`}>
-          {status === 'running' ? 'running' : `exit ${exit ?? '?'}`}
+          {status === 'running'
+            ? 'running'
+            : status === 'cancelled'
+              ? 'cancelled'
+              : `exit ${exit ?? '?'}`}
         </span>

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/components/RunHistory.tsx` around lines 135 - 137, Update the span that
currently renders `{status === 'running' ? 'running' : `exit ${exit ?? '?'`}` in
the RunHistory.tsx component to explicitly render "cancelled" when `status ===
'cancelled'` (using the same `statusColor` class); otherwise fall back to the
existing `exit ${exit ?? '?'}` behavior. Locate the JSX span that uses `status`,
`exit`, and `statusColor` and add the `status === 'cancelled'` branch so
cancelled runs display the text "cancelled" instead of an "exit …" message.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/app/api/agents/`[id]/run/route.ts:
- Around line 131-135: The code sets status and implicitly treats exitCode === 0
as success even when cancelled, which misleads history/live output; update the
logic around the status and exitCode handling (the variables status, cancelled,
exitCode and the places where you persist/emit run results) so that when
cancelled is true you set status = 'cancelled' and avoid emitting or persisting
a success exit code (e.g., set exitCode to null/undefined or a dedicated
cancelled marker and ensure emit/persist code checks cancelled before using
exitCode); apply the same change to the other occurrences that compute
status/emit results using cancelled and exitCode.

In `@src/lib/claude.ts`:
- Line 117: The call to opts.onChild(child) is unguarded and if that callback
throws the spawned child process can be left running; wrap the invocation of
opts.onChild in a try/catch around the site where the child is created (the code
that calls opts.onChild(child) in src/lib/claude.ts) and on error ensure you
clean up the child (e.g., child.kill()/child.removeAllListeners() or call the
same shutdown path used elsewhere) and rethrow or handle the error; this keeps
the child from leaking while preserving the original error behavior.

In `@src/lib/run-registry.ts`:
- Around line 55-60: The current killRun function uses process.kill(-child.pid,
...) which fails on Windows; update killRun (and its use of registry, child.pid,
process.kill) to branch by platform: on POSIX keep sending signal to the
negative pid to target the process group, but on Windows call
process.kill(child.pid, 'SIGTERM') and then implement or call a cross-platform
descendant-termination routine (e.g., a recursive child-enumeration/tree-kill
utility) to ensure the whole subprocess tree is terminated; ensure errors are
propagated/logged rather than silently swallowed so callers can know if stopping
actually failed.

---

Outside diff comments:
In `@src/components/RunHistory.tsx`:
- Around line 135-137: Update the span that currently renders `{status ===
'running' ? 'running' : `exit ${exit ?? '?'`}` in the RunHistory.tsx component
to explicitly render "cancelled" when `status === 'cancelled'` (using the same
`statusColor` class); otherwise fall back to the existing `exit ${exit ?? '?'}`
behavior. Locate the JSX span that uses `status`, `exit`, and `statusColor` and
add the `status === 'cancelled'` branch so cancelled runs display the text
"cancelled" instead of an "exit …" message.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 275f7b1f-cb18-49a1-b520-b4a7106579a2

📥 Commits

Reviewing files that changed from the base of the PR and between 614ad51 and e412edd.

📒 Files selected for processing (7)

src/app/api/agents/[id]/run/route.ts
src/app/api/runs/[id]/stop/route.ts
src/app/workspace/[id]/page.tsx
src/components/AgentPanel.tsx
src/components/RunHistory.tsx
src/lib/claude.ts
src/lib/run-registry.ts

coderabbitai · 2026-05-25T11:36:18Z

+      const status: 'completed' | 'failed' | 'cancelled' = cancelled
+        ? 'cancelled'
+        : exitCode === 0
+          ? 'completed'
+          : 'failed'


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid reporting successful-looking exit codes for cancelled runs.

When status is cancelled, persisting/emitting a normal exit code is misleading in history and live output.

💡 Proposed fix

const status: 'completed' | 'failed' | 'cancelled' = cancelled ? 'cancelled' : exitCode === 0 ? 'completed' : 'failed' + const effectiveExitCode = status === 'cancelled' ? null : exitCode try { await db .update(schema.runs) .set({ output: collected.join('\n'), status, - exitCode: exitCode ?? -1, + exitCode: effectiveExitCode ?? -1, endedAt: new Date(), }) .where(eq(schema.runs.id, run.id)) } catch { // best-effort persistence; ignore. } - send({ type: 'run:end', runId: run.id, status, exitCode }) + send({ type: 'run:end', runId: run.id, status, exitCode: effectiveExitCode })

Also applies to: 140-143, 150-150

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/app/api/agents/`[id]/run/route.ts around lines 131 - 135, The code sets status and implicitly treats exitCode === 0 as success even when cancelled, which misleads history/live output; update the logic around the status and exitCode handling (the variables status, cancelled, exitCode and the places where you persist/emit run results) so that when cancelled is true you set status = 'cancelled' and avoid emitting or persisting a success exit code (e.g., set exitCode to null/undefined or a dedicated cancelled marker and ensure emit/persist code checks cancelled before using exitCode); apply the same change to the other occurrences that compute status/emit results using cancelled and exitCode.

coderabbitai · 2026-05-25T11:36:18Z

    yield { type: 'exit', code: 1 }
    return
  }
+  if (opts.onChild) opts.onChild(child)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard onChild callback failures to prevent leaked child processes.

If Line 117 throws, the stream setup aborts but the spawned process can keep running unmanaged.

💡 Proposed fix

- if (opts.onChild) opts.onChild(child) + if (opts.onChild) { + try { + opts.onChild(child) + } catch (err) { + try { + child.kill('SIGTERM') + } catch { + // ignore best-effort cleanup failure + } + yield { + type: 'error', + message: `onChild callback failed: ${err instanceof Error ? err.message : String(err)}`, + } + yield { type: 'exit', code: 1 } + return + } + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (opts.onChild) opts.onChild(child)

if (opts.onChild) {

try {

opts.onChild(child)

} catch (err) {

try {

child.kill('SIGTERM')

} catch {

// ignore best-effort cleanup failure

}

yield {

type: 'error',

message: `onChild callback failed: ${err instanceof Error ? err.message : String(err)}`,

}

yield { type: 'exit', code: 1 }

return

}

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/claude.ts` at line 117, The call to opts.onChild(child) is unguarded and if that callback throws the spawned child process can be left running; wrap the invocation of opts.onChild in a try/catch around the site where the child is created (the code that calls opts.onChild(child) in src/lib/claude.ts) and on error ensure you clean up the child (e.g., child.kill()/child.removeAllListeners() or call the same shutdown path used elsewhere) and rethrow or handle the error; this keeps the child from leaking while preserving the original error behavior.

coderabbitai · 2026-05-25T11:36:18Z

+export function killRun(runId: string): boolean {
+  const child = registry.get(runId)
+  if (!child || child.pid == null) return false
+  try {
+    process.kill(-child.pid, 'SIGTERM')
+    return true


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

In Node.js, does process.kill() support negative PIDs (process group signaling) on Windows, and what is the recommended cross-platform approach to terminate a child process tree?

💡 Result:

No. On Windows, Node’s process.kill does not support killing a “process group” via a negative PID; the Node docs state that Windows will throw an error if the pid is used to kill a process group [1]. Recommended cross-platform approach to terminate a child process tree: - Don’t rely on negative-PID / process-group signaling for tree termination; it’s Unix-oriented and not supported on Windows by Node [1]. - Instead, track the spawned child and explicitly terminate descendants in a platform-appropriate way (e.g., enumerate children and kill them recursively, or use a cross-platform library that does this). For example, libraries like node-tree-kill/execa’s “tree kill” approach exist specifically because Windows and Unix have very different process/tree and signaling semantics [2]. - If you control the child process, implement cooperative shutdown (listen for a shutdown message/signal and exit), and/or use an out-of-band watchdog so the parent/manager can kill the entire subtree when the parent is terminated [3]. If you want a practical pattern in Node: spawn with child_process (optionally {detached: true} only if you understand the lifecycle implications), then perform explicit recursive cleanup rather than process.kill(-pid) on Windows; use a maintained process-tree kill utility so behavior is consistent across OSes [2][4].

Citations:

1: https://nodejs.org/docs/latest-v18.x/api/process.html

2: Ability to kill all descendents of the child process sindresorhus/execa#96

3: https://stackoverflow.com/questions/79931431/is-there-a-cross-platform-way-to-terminate-all-child-and-grandchild-processes

4: https://stackoverflow.com/questions/63470352/kill-detached-child-process-in-node

Fix Windows process termination: avoid negative-PID process-group kill in killRun

File: src/lib/run-registry.ts (lines 55-60)

export function killRun(runId: string): boolean { const child = registry.get(runId) if (!child || child.pid == null) return false try { process.kill(-child.pid, 'SIGTERM') return true

process.kill(-child.pid, ...) relies on Unix process-group semantics; Node does not support this “negative PID” process-group signaling on Windows (it will error / not behave as intended). Since this function catches errors and returns false, Stop can report stopped: false while the run is still alive.

export function killRun(runId: string): boolean { const child = registry.get(runId) if (!child || child.pid == null) return false try { - process.kill(-child.pid, 'SIGTERM') + if (process.platform === 'win32') { + // Best-effort fallback: terminate the child process itself. + // (Process-group kill via negative PID is POSIX-specific.) + process.kill(child.pid, 'SIGTERM') + } else { + process.kill(-child.pid, 'SIGTERM') + } return true } catch {

This fallback avoids the Windows negative-PID issue but only targets the parent process; if the goal is to stop the whole process tree consistently across OSes, implement cross-platform descendant termination (e.g., recursive child enumeration or a maintained “tree kill” utility) instead of relying on process-group signaling.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/run-registry.ts` around lines 55 - 60, The current killRun function uses process.kill(-child.pid, ...) which fails on Windows; update killRun (and its use of registry, child.pid, process.kill) to branch by platform: on POSIX keep sending signal to the negative pid to target the process group, but on Windows call process.kill(child.pid, 'SIGTERM') and then implement or call a cross-platform descendant-termination routine (e.g., a recursive child-enumeration/tree-kill utility) to ensure the whole subprocess tree is terminated; ensure errors are propagated/logged rather than silently swallowed so callers can know if stopping actually failed.

feat: parallel multi-agent runs with stop + grid layout (closes #6)

e412edd

coderabbitai Bot reviewed May 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: parallel multi-agent runs, stop button, grid layout#17

feat: parallel multi-agent runs, stop button, grid layout#17
proofmancer wants to merge 1 commit into
mainfrom
feat/parallel-runs

proofmancer commented May 25, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 25, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 25, 2026

Uh oh!

coderabbitai Bot May 25, 2026

Uh oh!

coderabbitai Bot May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-  if (opts.onChild) opts.onChild(child)
+  if (opts.onChild) {
+    try {
+      opts.onChild(child)
+    } catch (err) {
+      try {
+        child.kill('SIGTERM')
+      } catch {
+        // ignore best-effort cleanup failure
+      }
+      yield {
+        type: 'error',
+        message: `onChild callback failed: ${err instanceof Error ? err.message : String(err)}`,
+      }
+      yield { type: 'exit', code: 1 }
+      return
+    }
+  }

Conversation

proofmancer commented May 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's still single-process

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

proofmancer commented May 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 25, 2026 •

edited

Loading