Skip to content

feat: parallel multi-agent runs, stop button, grid layout#17

Open
proofmancer wants to merge 1 commit into
mainfrom
feat/parallel-runs
Open

feat: parallel multi-agent runs, stop button, grid layout#17
proofmancer wants to merge 1 commit into
mainfrom
feat/parallel-runs

Conversation

@proofmancer
Copy link
Copy Markdown
Owner

@proofmancer proofmancer commented May 25, 2026

Closes #6.

Summary

  • Grid layout: workspace page renders agents in a 2-column grid on xl+ screens, single column below. Hit Run on multiple agents and watch them stream side-by-side.
  • Process registry: new src/lib/run-registry.ts is a process-local Map<runId, ChildProcess>, cached on globalThis for hot-reload survival. Auto-cleans on child exit. Single-process only (revisit if Argus ever runs multi-process).
  • Detached child + group kill: runAgent now spawns with detached: true so SIGTERM hits the whole process group, killing any tool subprocesses claude itself spawned.
  • onChild callback: runAgent exposes the spawned child via an optional onChild callback. The run route uses it to register the child and to detect SIGTERM-vs-natural-exit (so it can record status='cancelled' instead of 'failed').
  • Stop endpoint: new POST /api/runs/[id]/stop. Looks up the runId, SIGTERMs the process group, returns { ok, stopped }. Idempotent: stopping an already-finished run returns 200 with stopped: false.
  • UI: AgentPanel captures the runId from the run:start SSE frame. While running, a red Stop button appears next to Run. Cancelled runs render in amber in both the live event log and the history panel.

What's still single-process

The registry is in-memory, so a next start restart loses the map and any leftover child becomes an orphan (cleaned up by the OS or by exiting on its own). Documented in the file. Fine for v1 local single-user; revisit if argus ever runs distributed.

Test plan

  • Create two agents in the same workspace. Hit Run on both back-to-back. Both panels stream output independently.
  • On a long-running agent, click Stop. The output ends with run cancelled in amber, and the history row shows cancelled (not failed).
  • Stop, then try Stop again (after the run already exited): the endpoint returns 200 without erroring.
  • Workspace with 3+ agents lays out as 2 columns on a wide screen, stacks to 1 below xl breakpoint.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added Stop button to cancel running agents mid-execution
    • New endpoint to stop and cancel active runs
  • Improvements

    • Redesigned agents section with responsive grid layout and descriptive text
    • Run status tracking now distinguishes cancelled runs with proper visual styling (amber)
    • Stop action is idempotent—safely works on both active and completed runs

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

📝 Walkthrough

Walkthrough

This PR implements run termination by adding a process registry to track spawned child processes, updating the run endpoint to register children and track cancellation, adding a stop API endpoint, and updating the frontend to trigger termination and display cancelled status.

Changes

Run lifecycle tracking and termination

Layer / File(s) Summary
Run registry infrastructure
src/lib/run-registry.ts
Process-local in-memory registry stored on globalThis maps runId to spawned child processes. Provides registerRun (with auto-cleanup on exit), unregisterRun, isRunning, and killRun (sends SIGTERM to process group).
Process group configuration for termination
src/lib/claude.ts
AgentRunOptions now accepts optional onChild callback. Child process spawned with detached: true to run in own process group; runAgent invokes onChild(child) after spawn.
Run endpoint integration with registry
src/app/api/agents/[id]/run/route.ts
Imports registry functions, initializes cancelled flag, wires onChild handler to register child and listen for SIGTERM exit. Computes final status as cancelled if flag is set; otherwise completed/failed based on exit code.
Stop endpoint for termination
src/app/api/runs/[id]/stop/route.ts
New POST /api/runs/[id]/stop checks isRunning(id), calls killRun(id) if running, returns { ok: true, stopped: boolean } with idempotent semantics.
Frontend stop control and run tracking
src/components/AgentPanel.tsx
Tracks current runId state. Parses run:start events to extract runId. Implements stop() to POST to stop endpoint using runId. Clears runId on completion. Stop button renders conditionally when running and enabled when runId is known.
Status display for cancelled runs
src/components/AgentPanel.tsx, src/components/RunHistory.tsx
EventLine computes color from status field. RunHistory applies amber styling when status is running or cancelled.
Workspace grid layout for multiple agents
src/app/workspace/[id]/page.tsx
Agents section adds descriptive paragraph and switches container from vertical flex to responsive two-column grid. Empty-state card spans two columns on extra-large screens.

Sequence Diagram(s)

The PR implements a complete run lifecycle from spawn through potential termination. The backend spans a detached child process, registers it, and tracks cancellation state. The frontend captures the run ID from the first SSE event, enables a Stop button, and can trigger termination via the stop endpoint.

sequenceDiagram
  participant AgentPanel as Frontend: AgentPanel
  participant RunAPI as Backend: run endpoint
  participant Registry as Registry
  participant Child as Child Process
  participant StopAPI as Backend: stop endpoint
  
  AgentPanel->>RunAPI: POST /api/agents/[id]/run (SSE)
  RunAPI->>Child: spawn(claude, { detached: true, onChild })
  Child->>Registry: registerRun(runId, child)
  Registry->>Child: attach exit listener
  RunAPI->>AgentPanel: run:start { runId }
  AgentPanel->>AgentPanel: capture runId state
  AgentPanel->>AgentPanel: enable Stop button
  AgentPanel->>StopAPI: POST /api/runs/[id]/stop
  StopAPI->>Registry: isRunning(runId)
  Registry-->>StopAPI: true
  StopAPI->>Registry: killRun(runId)
  Registry->>Child: process.kill(-pid, SIGTERM)
  Child->>Registry: exit with signal=SIGTERM
  Registry->>Registry: set cancelled flag
  RunAPI->>AgentPanel: run:end { status: cancelled }
  AgentPanel->>AgentPanel: clear runId
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 Processes spawned in groups, now tracked and tamed,
With SIGTERM's gentle hand and a registry named,
Stop buttons blooming on grids of green,
The finest parallel agents ever seen!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly reflects the main changes: parallel multi-agent runs (grid layout enabling side-by-side), stop button (per-pane process control), and grid layout (workspace UI redesign).
Linked Issues check ✅ Passed All core requirements from issue #6 are met: backend spawns per-run child processes with independent SSE channels [run route], UI shows grid layout for concurrent viewing [workspace page], and Stop button kills process groups [stop endpoint + AgentPanel UI].
Out of Scope Changes check ✅ Passed All changes directly support issue #6 objectives: run registry and detached spawning enable process control, RunHistory styling supports cancelled state visibility, and component refactors support the stop functionality.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/parallel-runs

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/components/RunHistory.tsx (1)

135-137: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Render an explicit cancelled label instead of exit … for cancelled runs.

The amber color helps, but showing exit … hides the cancellation outcome in text.

💡 Proposed fix
         <span className={`font-mono text-[10px] ${statusColor}`}>
-          {status === 'running' ? 'running' : `exit ${exit ?? '?'}`}
+          {status === 'running'
+            ? 'running'
+            : status === 'cancelled'
+              ? 'cancelled'
+              : `exit ${exit ?? '?'}`}
         </span>
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/components/RunHistory.tsx` around lines 135 - 137, Update the span that
currently renders `{status === 'running' ? 'running' : `exit ${exit ?? '?'`}` in
the RunHistory.tsx component to explicitly render "cancelled" when `status ===
'cancelled'` (using the same `statusColor` class); otherwise fall back to the
existing `exit ${exit ?? '?'}` behavior. Locate the JSX span that uses `status`,
`exit`, and `statusColor` and add the `status === 'cancelled'` branch so
cancelled runs display the text "cancelled" instead of an "exit …" message.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/app/api/agents/`[id]/run/route.ts:
- Around line 131-135: The code sets status and implicitly treats exitCode === 0
as success even when cancelled, which misleads history/live output; update the
logic around the status and exitCode handling (the variables status, cancelled,
exitCode and the places where you persist/emit run results) so that when
cancelled is true you set status = 'cancelled' and avoid emitting or persisting
a success exit code (e.g., set exitCode to null/undefined or a dedicated
cancelled marker and ensure emit/persist code checks cancelled before using
exitCode); apply the same change to the other occurrences that compute
status/emit results using cancelled and exitCode.

In `@src/lib/claude.ts`:
- Line 117: The call to opts.onChild(child) is unguarded and if that callback
throws the spawned child process can be left running; wrap the invocation of
opts.onChild in a try/catch around the site where the child is created (the code
that calls opts.onChild(child) in src/lib/claude.ts) and on error ensure you
clean up the child (e.g., child.kill()/child.removeAllListeners() or call the
same shutdown path used elsewhere) and rethrow or handle the error; this keeps
the child from leaking while preserving the original error behavior.

In `@src/lib/run-registry.ts`:
- Around line 55-60: The current killRun function uses process.kill(-child.pid,
...) which fails on Windows; update killRun (and its use of registry, child.pid,
process.kill) to branch by platform: on POSIX keep sending signal to the
negative pid to target the process group, but on Windows call
process.kill(child.pid, 'SIGTERM') and then implement or call a cross-platform
descendant-termination routine (e.g., a recursive child-enumeration/tree-kill
utility) to ensure the whole subprocess tree is terminated; ensure errors are
propagated/logged rather than silently swallowed so callers can know if stopping
actually failed.

---

Outside diff comments:
In `@src/components/RunHistory.tsx`:
- Around line 135-137: Update the span that currently renders `{status ===
'running' ? 'running' : `exit ${exit ?? '?'`}` in the RunHistory.tsx component
to explicitly render "cancelled" when `status === 'cancelled'` (using the same
`statusColor` class); otherwise fall back to the existing `exit ${exit ?? '?'}`
behavior. Locate the JSX span that uses `status`, `exit`, and `statusColor` and
add the `status === 'cancelled'` branch so cancelled runs display the text
"cancelled" instead of an "exit …" message.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 275f7b1f-cb18-49a1-b520-b4a7106579a2

📥 Commits

Reviewing files that changed from the base of the PR and between 614ad51 and e412edd.

📒 Files selected for processing (7)
  • src/app/api/agents/[id]/run/route.ts
  • src/app/api/runs/[id]/stop/route.ts
  • src/app/workspace/[id]/page.tsx
  • src/components/AgentPanel.tsx
  • src/components/RunHistory.tsx
  • src/lib/claude.ts
  • src/lib/run-registry.ts

Comment on lines +131 to +135
const status: 'completed' | 'failed' | 'cancelled' = cancelled
? 'cancelled'
: exitCode === 0
? 'completed'
: 'failed'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid reporting successful-looking exit codes for cancelled runs.

When status is cancelled, persisting/emitting a normal exit code is misleading in history and live output.

💡 Proposed fix
       const status: 'completed' | 'failed' | 'cancelled' = cancelled
         ? 'cancelled'
         : exitCode === 0
           ? 'completed'
           : 'failed'
+      const effectiveExitCode = status === 'cancelled' ? null : exitCode

       try {
         await db
           .update(schema.runs)
           .set({
             output: collected.join('\n'),
             status,
-            exitCode: exitCode ?? -1,
+            exitCode: effectiveExitCode ?? -1,
             endedAt: new Date(),
           })
           .where(eq(schema.runs.id, run.id))
       } catch {
         // best-effort persistence; ignore.
       }

-      send({ type: 'run:end', runId: run.id, status, exitCode })
+      send({ type: 'run:end', runId: run.id, status, exitCode: effectiveExitCode })

Also applies to: 140-143, 150-150

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/app/api/agents/`[id]/run/route.ts around lines 131 - 135, The code sets
status and implicitly treats exitCode === 0 as success even when cancelled,
which misleads history/live output; update the logic around the status and
exitCode handling (the variables status, cancelled, exitCode and the places
where you persist/emit run results) so that when cancelled is true you set
status = 'cancelled' and avoid emitting or persisting a success exit code (e.g.,
set exitCode to null/undefined or a dedicated cancelled marker and ensure
emit/persist code checks cancelled before using exitCode); apply the same change
to the other occurrences that compute status/emit results using cancelled and
exitCode.

Comment thread src/lib/claude.ts
yield { type: 'exit', code: 1 }
return
}
if (opts.onChild) opts.onChild(child)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard onChild callback failures to prevent leaked child processes.

If Line 117 throws, the stream setup aborts but the spawned process can keep running unmanaged.

💡 Proposed fix
-  if (opts.onChild) opts.onChild(child)
+  if (opts.onChild) {
+    try {
+      opts.onChild(child)
+    } catch (err) {
+      try {
+        child.kill('SIGTERM')
+      } catch {
+        // ignore best-effort cleanup failure
+      }
+      yield {
+        type: 'error',
+        message: `onChild callback failed: ${err instanceof Error ? err.message : String(err)}`,
+      }
+      yield { type: 'exit', code: 1 }
+      return
+    }
+  }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (opts.onChild) opts.onChild(child)
if (opts.onChild) {
try {
opts.onChild(child)
} catch (err) {
try {
child.kill('SIGTERM')
} catch {
// ignore best-effort cleanup failure
}
yield {
type: 'error',
message: `onChild callback failed: ${err instanceof Error ? err.message : String(err)}`,
}
yield { type: 'exit', code: 1 }
return
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/claude.ts` at line 117, The call to opts.onChild(child) is unguarded
and if that callback throws the spawned child process can be left running; wrap
the invocation of opts.onChild in a try/catch around the site where the child is
created (the code that calls opts.onChild(child) in src/lib/claude.ts) and on
error ensure you clean up the child (e.g.,
child.kill()/child.removeAllListeners() or call the same shutdown path used
elsewhere) and rethrow or handle the error; this keeps the child from leaking
while preserving the original error behavior.

Comment thread src/lib/run-registry.ts
Comment on lines +55 to +60
export function killRun(runId: string): boolean {
const child = registry.get(runId)
if (!child || child.pid == null) return false
try {
process.kill(-child.pid, 'SIGTERM')
return true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

In Node.js, does process.kill() support negative PIDs (process group signaling) on Windows, and what is the recommended cross-platform approach to terminate a child process tree?

💡 Result:

No. On Windows, Node’s process.kill does not support killing a “process group” via a negative PID; the Node docs state that Windows will throw an error if the pid is used to kill a process group [1]. Recommended cross-platform approach to terminate a child process tree: - Don’t rely on negative-PID / process-group signaling for tree termination; it’s Unix-oriented and not supported on Windows by Node [1]. - Instead, track the spawned child and explicitly terminate descendants in a platform-appropriate way (e.g., enumerate children and kill them recursively, or use a cross-platform library that does this). For example, libraries like node-tree-kill/execa’s “tree kill” approach exist specifically because Windows and Unix have very different process/tree and signaling semantics [2]. - If you control the child process, implement cooperative shutdown (listen for a shutdown message/signal and exit), and/or use an out-of-band watchdog so the parent/manager can kill the entire subtree when the parent is terminated [3]. If you want a practical pattern in Node: spawn with child_process (optionally {detached: true} only if you understand the lifecycle implications), then perform explicit recursive cleanup rather than process.kill(-pid) on Windows; use a maintained process-tree kill utility so behavior is consistent across OSes [2][4].

Citations:


Fix Windows process termination: avoid negative-PID process-group kill in killRun

File: src/lib/run-registry.ts (lines 55-60)

export function killRun(runId: string): boolean {
  const child = registry.get(runId)
  if (!child || child.pid == null) return false
  try {
    process.kill(-child.pid, 'SIGTERM')
    return true

process.kill(-child.pid, ...) relies on Unix process-group semantics; Node does not support this “negative PID” process-group signaling on Windows (it will error / not behave as intended). Since this function catches errors and returns false, Stop can report stopped: false while the run is still alive.

 export function killRun(runId: string): boolean {
   const child = registry.get(runId)
   if (!child || child.pid == null) return false
   try {
-    process.kill(-child.pid, 'SIGTERM')
+    if (process.platform === 'win32') {
+      // Best-effort fallback: terminate the child process itself.
+      // (Process-group kill via negative PID is POSIX-specific.)
+      process.kill(child.pid, 'SIGTERM')
+    } else {
+      process.kill(-child.pid, 'SIGTERM')
+    }
     return true
   } catch {

This fallback avoids the Windows negative-PID issue but only targets the parent process; if the goal is to stop the whole process tree consistently across OSes, implement cross-platform descendant termination (e.g., recursive child enumeration or a maintained “tree kill” utility) instead of relying on process-group signaling.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/run-registry.ts` around lines 55 - 60, The current killRun function
uses process.kill(-child.pid, ...) which fails on Windows; update killRun (and
its use of registry, child.pid, process.kill) to branch by platform: on POSIX
keep sending signal to the negative pid to target the process group, but on
Windows call process.kill(child.pid, 'SIGTERM') and then implement or call a
cross-platform descendant-termination routine (e.g., a recursive
child-enumeration/tree-kill utility) to ensure the whole subprocess tree is
terminated; ensure errors are propagated/logged rather than silently swallowed
so callers can know if stopping actually failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

v0.3: parallel multi-agent runs, side-by-side output

1 participant