feat: parallel multi-agent runs, stop button, grid layout#17
feat: parallel multi-agent runs, stop button, grid layout#17proofmancer wants to merge 1 commit into
Conversation
📝 WalkthroughWalkthroughThis PR implements run termination by adding a process registry to track spawned child processes, updating the run endpoint to register children and track cancellation, adding a stop API endpoint, and updating the frontend to trigger termination and display cancelled status. ChangesRun lifecycle tracking and termination
Sequence Diagram(s)The PR implements a complete run lifecycle from spawn through potential termination. The backend spans a detached child process, registers it, and tracks cancellation state. The frontend captures the run ID from the first SSE event, enables a Stop button, and can trigger termination via the stop endpoint. sequenceDiagram
participant AgentPanel as Frontend: AgentPanel
participant RunAPI as Backend: run endpoint
participant Registry as Registry
participant Child as Child Process
participant StopAPI as Backend: stop endpoint
AgentPanel->>RunAPI: POST /api/agents/[id]/run (SSE)
RunAPI->>Child: spawn(claude, { detached: true, onChild })
Child->>Registry: registerRun(runId, child)
Registry->>Child: attach exit listener
RunAPI->>AgentPanel: run:start { runId }
AgentPanel->>AgentPanel: capture runId state
AgentPanel->>AgentPanel: enable Stop button
AgentPanel->>StopAPI: POST /api/runs/[id]/stop
StopAPI->>Registry: isRunning(runId)
Registry-->>StopAPI: true
StopAPI->>Registry: killRun(runId)
Registry->>Child: process.kill(-pid, SIGTERM)
Child->>Registry: exit with signal=SIGTERM
Registry->>Registry: set cancelled flag
RunAPI->>AgentPanel: run:end { status: cancelled }
AgentPanel->>AgentPanel: clear runId
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/components/RunHistory.tsx (1)
135-137:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRender an explicit
cancelledlabel instead ofexit …for cancelled runs.The amber color helps, but showing
exit …hides the cancellation outcome in text.💡 Proposed fix
<span className={`font-mono text-[10px] ${statusColor}`}> - {status === 'running' ? 'running' : `exit ${exit ?? '?'}`} + {status === 'running' + ? 'running' + : status === 'cancelled' + ? 'cancelled' + : `exit ${exit ?? '?'}`} </span>🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/components/RunHistory.tsx` around lines 135 - 137, Update the span that currently renders `{status === 'running' ? 'running' : `exit ${exit ?? '?'`}` in the RunHistory.tsx component to explicitly render "cancelled" when `status === 'cancelled'` (using the same `statusColor` class); otherwise fall back to the existing `exit ${exit ?? '?'}` behavior. Locate the JSX span that uses `status`, `exit`, and `statusColor` and add the `status === 'cancelled'` branch so cancelled runs display the text "cancelled" instead of an "exit …" message.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/app/api/agents/`[id]/run/route.ts:
- Around line 131-135: The code sets status and implicitly treats exitCode === 0
as success even when cancelled, which misleads history/live output; update the
logic around the status and exitCode handling (the variables status, cancelled,
exitCode and the places where you persist/emit run results) so that when
cancelled is true you set status = 'cancelled' and avoid emitting or persisting
a success exit code (e.g., set exitCode to null/undefined or a dedicated
cancelled marker and ensure emit/persist code checks cancelled before using
exitCode); apply the same change to the other occurrences that compute
status/emit results using cancelled and exitCode.
In `@src/lib/claude.ts`:
- Line 117: The call to opts.onChild(child) is unguarded and if that callback
throws the spawned child process can be left running; wrap the invocation of
opts.onChild in a try/catch around the site where the child is created (the code
that calls opts.onChild(child) in src/lib/claude.ts) and on error ensure you
clean up the child (e.g., child.kill()/child.removeAllListeners() or call the
same shutdown path used elsewhere) and rethrow or handle the error; this keeps
the child from leaking while preserving the original error behavior.
In `@src/lib/run-registry.ts`:
- Around line 55-60: The current killRun function uses process.kill(-child.pid,
...) which fails on Windows; update killRun (and its use of registry, child.pid,
process.kill) to branch by platform: on POSIX keep sending signal to the
negative pid to target the process group, but on Windows call
process.kill(child.pid, 'SIGTERM') and then implement or call a cross-platform
descendant-termination routine (e.g., a recursive child-enumeration/tree-kill
utility) to ensure the whole subprocess tree is terminated; ensure errors are
propagated/logged rather than silently swallowed so callers can know if stopping
actually failed.
---
Outside diff comments:
In `@src/components/RunHistory.tsx`:
- Around line 135-137: Update the span that currently renders `{status ===
'running' ? 'running' : `exit ${exit ?? '?'`}` in the RunHistory.tsx component
to explicitly render "cancelled" when `status === 'cancelled'` (using the same
`statusColor` class); otherwise fall back to the existing `exit ${exit ?? '?'}`
behavior. Locate the JSX span that uses `status`, `exit`, and `statusColor` and
add the `status === 'cancelled'` branch so cancelled runs display the text
"cancelled" instead of an "exit …" message.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 275f7b1f-cb18-49a1-b520-b4a7106579a2
📒 Files selected for processing (7)
src/app/api/agents/[id]/run/route.tssrc/app/api/runs/[id]/stop/route.tssrc/app/workspace/[id]/page.tsxsrc/components/AgentPanel.tsxsrc/components/RunHistory.tsxsrc/lib/claude.tssrc/lib/run-registry.ts
| const status: 'completed' | 'failed' | 'cancelled' = cancelled | ||
| ? 'cancelled' | ||
| : exitCode === 0 | ||
| ? 'completed' | ||
| : 'failed' |
There was a problem hiding this comment.
Avoid reporting successful-looking exit codes for cancelled runs.
When status is cancelled, persisting/emitting a normal exit code is misleading in history and live output.
💡 Proposed fix
const status: 'completed' | 'failed' | 'cancelled' = cancelled
? 'cancelled'
: exitCode === 0
? 'completed'
: 'failed'
+ const effectiveExitCode = status === 'cancelled' ? null : exitCode
try {
await db
.update(schema.runs)
.set({
output: collected.join('\n'),
status,
- exitCode: exitCode ?? -1,
+ exitCode: effectiveExitCode ?? -1,
endedAt: new Date(),
})
.where(eq(schema.runs.id, run.id))
} catch {
// best-effort persistence; ignore.
}
- send({ type: 'run:end', runId: run.id, status, exitCode })
+ send({ type: 'run:end', runId: run.id, status, exitCode: effectiveExitCode })Also applies to: 140-143, 150-150
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/app/api/agents/`[id]/run/route.ts around lines 131 - 135, The code sets
status and implicitly treats exitCode === 0 as success even when cancelled,
which misleads history/live output; update the logic around the status and
exitCode handling (the variables status, cancelled, exitCode and the places
where you persist/emit run results) so that when cancelled is true you set
status = 'cancelled' and avoid emitting or persisting a success exit code (e.g.,
set exitCode to null/undefined or a dedicated cancelled marker and ensure
emit/persist code checks cancelled before using exitCode); apply the same change
to the other occurrences that compute status/emit results using cancelled and
exitCode.
| yield { type: 'exit', code: 1 } | ||
| return | ||
| } | ||
| if (opts.onChild) opts.onChild(child) |
There was a problem hiding this comment.
Guard onChild callback failures to prevent leaked child processes.
If Line 117 throws, the stream setup aborts but the spawned process can keep running unmanaged.
💡 Proposed fix
- if (opts.onChild) opts.onChild(child)
+ if (opts.onChild) {
+ try {
+ opts.onChild(child)
+ } catch (err) {
+ try {
+ child.kill('SIGTERM')
+ } catch {
+ // ignore best-effort cleanup failure
+ }
+ yield {
+ type: 'error',
+ message: `onChild callback failed: ${err instanceof Error ? err.message : String(err)}`,
+ }
+ yield { type: 'exit', code: 1 }
+ return
+ }
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (opts.onChild) opts.onChild(child) | |
| if (opts.onChild) { | |
| try { | |
| opts.onChild(child) | |
| } catch (err) { | |
| try { | |
| child.kill('SIGTERM') | |
| } catch { | |
| // ignore best-effort cleanup failure | |
| } | |
| yield { | |
| type: 'error', | |
| message: `onChild callback failed: ${err instanceof Error ? err.message : String(err)}`, | |
| } | |
| yield { type: 'exit', code: 1 } | |
| return | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/lib/claude.ts` at line 117, The call to opts.onChild(child) is unguarded
and if that callback throws the spawned child process can be left running; wrap
the invocation of opts.onChild in a try/catch around the site where the child is
created (the code that calls opts.onChild(child) in src/lib/claude.ts) and on
error ensure you clean up the child (e.g.,
child.kill()/child.removeAllListeners() or call the same shutdown path used
elsewhere) and rethrow or handle the error; this keeps the child from leaking
while preserving the original error behavior.
| export function killRun(runId: string): boolean { | ||
| const child = registry.get(runId) | ||
| if (!child || child.pid == null) return false | ||
| try { | ||
| process.kill(-child.pid, 'SIGTERM') | ||
| return true |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Node.js, does process.kill() support negative PIDs (process group signaling) on Windows, and what is the recommended cross-platform approach to terminate a child process tree?
💡 Result:
No. On Windows, Node’s process.kill does not support killing a “process group” via a negative PID; the Node docs state that Windows will throw an error if the pid is used to kill a process group [1]. Recommended cross-platform approach to terminate a child process tree: - Don’t rely on negative-PID / process-group signaling for tree termination; it’s Unix-oriented and not supported on Windows by Node [1]. - Instead, track the spawned child and explicitly terminate descendants in a platform-appropriate way (e.g., enumerate children and kill them recursively, or use a cross-platform library that does this). For example, libraries like node-tree-kill/execa’s “tree kill” approach exist specifically because Windows and Unix have very different process/tree and signaling semantics [2]. - If you control the child process, implement cooperative shutdown (listen for a shutdown message/signal and exit), and/or use an out-of-band watchdog so the parent/manager can kill the entire subtree when the parent is terminated [3]. If you want a practical pattern in Node: spawn with child_process (optionally {detached: true} only if you understand the lifecycle implications), then perform explicit recursive cleanup rather than process.kill(-pid) on Windows; use a maintained process-tree kill utility so behavior is consistent across OSes [2][4].
Citations:
- 1: https://nodejs.org/docs/latest-v18.x/api/process.html
- 2: Ability to kill all descendents of the child process sindresorhus/execa#96
- 3: https://stackoverflow.com/questions/79931431/is-there-a-cross-platform-way-to-terminate-all-child-and-grandchild-processes
- 4: https://stackoverflow.com/questions/63470352/kill-detached-child-process-in-node
Fix Windows process termination: avoid negative-PID process-group kill in killRun
File: src/lib/run-registry.ts (lines 55-60)
export function killRun(runId: string): boolean {
const child = registry.get(runId)
if (!child || child.pid == null) return false
try {
process.kill(-child.pid, 'SIGTERM')
return trueprocess.kill(-child.pid, ...) relies on Unix process-group semantics; Node does not support this “negative PID” process-group signaling on Windows (it will error / not behave as intended). Since this function catches errors and returns false, Stop can report stopped: false while the run is still alive.
export function killRun(runId: string): boolean {
const child = registry.get(runId)
if (!child || child.pid == null) return false
try {
- process.kill(-child.pid, 'SIGTERM')
+ if (process.platform === 'win32') {
+ // Best-effort fallback: terminate the child process itself.
+ // (Process-group kill via negative PID is POSIX-specific.)
+ process.kill(child.pid, 'SIGTERM')
+ } else {
+ process.kill(-child.pid, 'SIGTERM')
+ }
return true
} catch {This fallback avoids the Windows negative-PID issue but only targets the parent process; if the goal is to stop the whole process tree consistently across OSes, implement cross-platform descendant termination (e.g., recursive child enumeration or a maintained “tree kill” utility) instead of relying on process-group signaling.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/lib/run-registry.ts` around lines 55 - 60, The current killRun function
uses process.kill(-child.pid, ...) which fails on Windows; update killRun (and
its use of registry, child.pid, process.kill) to branch by platform: on POSIX
keep sending signal to the negative pid to target the process group, but on
Windows call process.kill(child.pid, 'SIGTERM') and then implement or call a
cross-platform descendant-termination routine (e.g., a recursive
child-enumeration/tree-kill utility) to ensure the whole subprocess tree is
terminated; ensure errors are propagated/logged rather than silently swallowed
so callers can know if stopping actually failed.
Closes #6.
Summary
src/lib/run-registry.tsis a process-local Map<runId, ChildProcess>, cached on globalThis for hot-reload survival. Auto-cleans on child exit. Single-process only (revisit if Argus ever runs multi-process).runAgentnow spawns withdetached: trueso SIGTERM hits the whole process group, killing any tool subprocesses claude itself spawned.runAgentexposes the spawned child via an optionalonChildcallback. The run route uses it to register the child and to detect SIGTERM-vs-natural-exit (so it can recordstatus='cancelled'instead of'failed').POST /api/runs/[id]/stop. Looks up the runId, SIGTERMs the process group, returns{ ok, stopped }. Idempotent: stopping an already-finished run returns 200 withstopped: false.run:startSSE frame. While running, a red Stop button appears next to Run. Cancelled runs render in amber in both the live event log and the history panel.What's still single-process
The registry is in-memory, so a
next startrestart loses the map and any leftover child becomes an orphan (cleaned up by the OS or by exiting on its own). Documented in the file. Fine for v1 local single-user; revisit if argus ever runs distributed.Test plan
run cancelledin amber, and the history row showscancelled(notfailed).xlbreakpoint.Summary by CodeRabbit
Release Notes
New Features
Improvements