feat: add local code-node bridge for external-agent dev#2767
feat: add local code-node bridge for external-agent dev#2767
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
🦋 Changeset detectedLatest commit: 58c8005 The changes in this PR will be included in the next version bump. This PR includes changesets to release 10 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
Medium-priority — solid feature addition with a few issues worth addressing before merge.
The code-node bridge is well-structured: clean A2A compliance, good test coverage (blocking, streaming, cancel, workspace guard), and proper shutdown semantics. The main concerns are a default port collision with OTLP, a zsh fallback that will fail on most CI/server environments, unbounded growth of the taskRecords map, and per-request runnerArgs overrides accepting arbitrary argument injection from callers.
| .description('Start a local A2A coding-agent node for local development') | ||
| .requiredOption('--workspace <path>', 'Workspace root the node may access') | ||
| .option('--host <host>', 'Host to bind the local server to', '127.0.0.1') | ||
| .option('--port <port>', 'Port to run the local server on', '4318') |
There was a problem hiding this comment.
Port 4318 is the well-known OTLP/HTTP port. The SDK docs (agents-docs/content/get-started/traces.mdx) and packages/agents-sdk/src/telemetry-provider.ts both reference localhost:4318 as the OTEL collector endpoint. Running inkeep code-node on the same machine as a local SigNoz/OTEL collector will produce a silent port conflict.
Consider a different default (e.g., 4320 or 14318) to avoid the collision.
| .option('--port <port>', 'Port to run the local server on', '4318') | |
| .option('--port <port>', 'Port to run the local server on', '4320') |
| verification = { | ||
| command: options.verificationCommand, | ||
| ...(await runCommand( | ||
| process.env.SHELL || 'zsh', |
There was a problem hiding this comment.
zsh as a fallback is fragile — most CI containers, minimal Docker images, and many Linux distributions don't ship zsh. POSIX sh (or /bin/sh) is universally available.
| process.env.SHELL || 'zsh', | |
| process.env.SHELL || '/bin/sh', |
| const beforeStatus = snapshotGitStatus(workspace); | ||
| const rawPrompt = getPromptText(params.message); | ||
| const prompt = buildPrompt(rawPrompt, workspace, mode); | ||
| const runnerArgs = metadata.runnerArgs?.length ? metadata.runnerArgs : options.runnerArgs; |
There was a problem hiding this comment.
Per-request runnerArgs overrides from untrusted callers let anyone inject arbitrary arguments into the spawned process. Since the runnerCommand is controlled by the server operator, the impact is limited — but a caller could still pass args that change behavior in unexpected ways (e.g., --dangerously-skip-permissions for Claude Code). If this is intentional, a brief code comment explaining the trust model would help future readers; otherwise, consider dropping per-request arg overrides or gating them behind an opt-in flag.
| export async function startCodeNodeServer( | ||
| options: CodeNodeServerOptions | ||
| ): Promise<StartedCodeNodeServer> { | ||
| const taskRecords = new Map<string, TaskRecord>(); |
There was a problem hiding this comment.
taskRecords is never pruned — completed tasks accumulate indefinitely. For a long-running dev server this is a slow memory leak. Consider evicting terminal-state tasks after a TTL (e.g., 5 minutes) or capping the map size.
| let stdout = ''; | ||
| let stderr = ''; | ||
| const timeout = setTimeout(() => { | ||
| record.canceled = true; | ||
| terminateChild(child); | ||
| }, timeoutMs); |
There was a problem hiding this comment.
stdout/stderr are accumulated without any size bound. A pathological runner could produce gigabytes of output, exhausting process memory. Consider capping the accumulated buffer size (e.g., to MAX_OUTPUT_CHARS) and discarding excess.
| throw new Error('Write mode requested, but this node was started without write access'); | ||
| } | ||
|
|
||
| const timeoutMs = metadata.timeoutMs ?? options.defaultTimeoutMs; |
There was a problem hiding this comment.
metadata.timeoutMs comes from the untrusted request. A caller can set timeoutMs: Number.MAX_SAFE_INTEGER to effectively disable the timeout. Consider clamping to a server-side maximum (e.g., Math.min(metadata.timeoutMs, options.defaultTimeoutMs * 10)).
| ) | ||
| : root; | ||
|
|
||
| if (resolved !== root && !resolved.startsWith(`${root}${path.sep}`)) { |
There was a problem hiding this comment.
The startsWith check uses path.resolve which follows symlinks. A symlink inside the workspace root that points outside it would pass this check. Since this is a local-dev tool the risk is low, but worth noting for the trust model.
| id: "local-code-node", | ||
| name: "Local Code Node", | ||
| description: "Delegates coding tasks into a local workspace during development", | ||
| baseUrl: "http://127.0.0.1:4318", |
There was a problem hiding this comment.
If the default port changes (see port-conflict comment), update this to match.
| id: 'local-code-node', | ||
| name: 'Local Code Node', | ||
| description: 'Delegates coding tasks into a locally running code-node bridge', | ||
| baseUrl: 'http://127.0.0.1:4318', |
There was a problem hiding this comment.
Same as above — keep this in sync if the default port changes.
|
TL;DR — Adds an experimental Key changes
Summary | 10 files | 3 commits | base:
|
| Method | Behavior |
|---|---|
message/send (blocking) |
Awaits runner completion, returns final task |
message/send (non-blocking) |
Returns immediately in working state, poll via tasks/get |
message/stream |
Streams status-update and artifact-update SSE events |
Security is enforced by resolveWorkspace(), which validates that any per-request workspace override resolves within the configured root — preventing directory traversal. The ExecutionMode type (read / write) gates whether the runner is allowed to modify files, controlled by the --write flag or per-request metadata.
How does per-request metadata work?
Callers can pass a
codeNodekey in message or params metadata to overrideworkspace,timeoutMs,mode, andrunnerArgsper request. ThegetMetadata()function merges params-level and message-level metadata, with message-level taking precedence.
agents-cli/src/utils/code-node-server.ts · agents-cli/src/commands/code-node.ts · agents-cli/src/index.ts
code-node-server.test.ts — test suite
Before: No test coverage for code-node functionality.
After: Five tests validate the full A2A lifecycle using mocked child processes and real temp directories.
Tests cover: blocking message/send with artifact validation, write-mode git diff detection with verification output, workspace-escape rejection, non-blocking send with tasks/cancel + child process kill, and SSE message/stream event shape. Child process mocking uses an EventEmitter-based spawn mock with configurable MockSpawnBehavior and queued spawnSync responses for git status.
agents-cli/src/__tests__/code-node-server.test.ts · agents-cli/src/__tests__/cli.test.ts
Reference agent + documentation
Before: No example of how to use a local code-node bridge with the agent framework.
After: AlocalCodeNodeAgenttest agent shows theexternalAgent()→subAgent()→agent()wiring, and docs explain the CLI flags and SDK registration.
The test agent defines three layers: an externalAgent pointing at http://127.0.0.1:4318, a coordinator subAgent that routes coding tasks to it, and a top-level agent that uses the coordinator as its default. Documentation is added to both the external-agents MDX page and the root README.
test-agents/agents/local-code-node-agent.ts · agents-docs/content/typescript-sdk/external-agents.mdx · README.md
There was a problem hiding this comment.
PR Review Summary
(7) Total Issues | Risk: Medium
🟠⚠️ Major (4) 🟠⚠️
Inline Comments:
- 🟠 Major:
code-node-server.ts:473Task records Map grows unbounded without cleanup - 🟠 Major:
code-node-server.ts:85-105Path traversal via symlinks can bypass workspace guard - 🟠 Major:
code-node-server.ts:241-248Request body reading has no size limit - 🟠 Major:
code-node-server.test.ts:188-444Missing test coverage for write mode enforcement and failure paths
🟡 Minor (3) 🟡
Inline Comments:
- 🟡 Minor:
code-node-server.ts:353User-supplied timeoutMs lacks validation bounds - 🟡 Minor:
code-node-server.ts:370-382stdout/stderr buffers grow unbounded during execution
🟡 1) code-node-server.ts:598-618 Missing server-side error logging
Issue: Errors during task execution (both streaming and non-blocking modes) are not logged server-side. Errors are only sent to clients or update task state silently.
Why: For incident debugging, operators need server-side visibility into failures. If client connections are lost, failures become invisible.
Fix: Add console.error('Task failed:', taskId, error) in error handlers at lines 598-607 and 616-618.
Refs: code-node-server.ts:598-618
💭 Consider (4) 💭
💭 1) index.ts:38-58 Command registration position
Issue: code-node is registered at the top of the command list, before lifecycle commands like add, init, push.
Fix: Consider moving near the dev command (~line 203) to group development utilities together.
💭 2) code-node.ts:55-68 Output format lacks emoji prefix
Issue: Other server commands like dev use emoji prefixes (e.g., 🚀). code-node uses plain chalk output.
Fix: Add emoji for visual consistency: console.log(chalk.green('🚀 Local code node listening at...')).
💭 3) code-node.ts:70-80 Shutdown handler lacks feedback
Issue: The dev command logs "🛑 Stopping dashboard server..." during shutdown. code-node exits silently.
Fix: Add shutdown message before await started.close().
💭 4) external-agents.mdx:56 Missing CLI options in docs
Issue: The --timeout-ms flag (default 2 min) is not documented. Users with long-running tasks will hit timeouts without knowing the option exists.
Fix: Add note: "Use --timeout-ms for tasks longer than 2 minutes. See inkeep code-node --help for all options."
💡 APPROVE WITH SUGGESTIONS
Summary: This is a well-structured implementation of a local A2A bridge with good test coverage for the happy paths. However, there are several resource management and security concerns that should be addressed:
- Memory leak from unbounded task record storage needs cleanup logic
- Path traversal via symlinks could bypass the workspace security boundary
- Unbounded request body could enable DoS
- Missing failure path tests for write-mode enforcement and runner failures
The fixes are straightforward and the PR is close to ready. The architectural approach is sound — the concerns are implementation details rather than design issues.
Discarded (5)
| Location | Issue | Reason Discarded |
|---|---|---|
code-node-server.ts:357 |
runnerArgs override from request | Intentional per PR description ("Allow per-request code-node runner args") |
code-node-server.ts:412-422 |
Verification shares timeout with main task | Acceptable for local dev; separate timeout would over-complicate |
code-node-server.ts:542-557 |
Race condition between cancel and completion | Minor race with safe fallback; acceptable for local dev |
index.ts:42 |
Host 127.0.0.1 vs localhost | Functionally equivalent; 127.0.0.1 is more explicit |
index.ts:43 |
Port 4318 conflicts with OTEL | Valid concern but 4318 is common for local dev servers |
Reviewers (5)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
pr-review-standards |
5 | 0 | 0 | 0 | 4 | 0 | 1 |
pr-review-sre |
8 | 1 | 0 | 0 | 2 | 0 | 5 |
pr-review-tests |
5 | 0 | 0 | 0 | 1 | 0 | 4 |
pr-review-consistency |
6 | 0 | 3 | 0 | 0 | 0 | 3 |
pr-review-docs |
2 | 0 | 1 | 0 | 0 | 0 | 1 |
| Total | 26 | 1 | 4 | 0 | 7 | 0 | 14 |
| export async function startCodeNodeServer( | ||
| options: CodeNodeServerOptions | ||
| ): Promise<StartedCodeNodeServer> { | ||
| const taskRecords = new Map<string, TaskRecord>(); |
There was a problem hiding this comment.
🟠 MAJOR: Task records Map grows unbounded
Issue: The taskRecords Map stores all tasks indefinitely with no cleanup. Each message/send or message/stream creates a new task record that is never removed, even after completion.
Why: In long-running deployments or under sustained load, this causes memory growth proportional to task count. Each record contains the full Task object with artifacts, history, and metadata.
Fix: Add TTL-based cleanup for completed tasks. Options:
- Periodic interval that removes terminal-state tasks older than a TTL (e.g., 1 hour)
- LRU cache with max size limit
- Clean up immediately on task completion with optional retention for debugging
// Option 1: Periodic cleanup
const TASK_RETENTION_MS = 60 * 60 * 1000; // 1 hour
setInterval(() => {
const now = Date.now();
for (const [id, record] of taskRecords) {
if (isTerminalState(record.task.status.state)) {
const timestamp = new Date(record.task.status.timestamp).getTime();
if (now - timestamp > TASK_RETENTION_MS) {
taskRecords.delete(id);
}
}
}
}, 60_000).unref();Refs:
| function resolveWorkspace(workspaceRoot: string, requestedWorkspace?: string): string { | ||
| const root = path.resolve(workspaceRoot); | ||
| const resolved = requestedWorkspace | ||
| ? path.resolve( | ||
| path.isAbsolute(requestedWorkspace) | ||
| ? requestedWorkspace | ||
| : path.join(root, requestedWorkspace) | ||
| ) | ||
| : root; | ||
|
|
||
| if (resolved !== root && !resolved.startsWith(`${root}${path.sep}`)) { | ||
| throw new Error(`Workspace must be within ${root}`); | ||
| } | ||
|
|
||
| const stats = statSync(resolved, { throwIfNoEntry: false }); | ||
| if (!stats || !stats.isDirectory()) { | ||
| throw new Error(`Workspace does not exist or is not a directory: ${resolved}`); | ||
| } | ||
|
|
||
| return resolved; | ||
| } |
There was a problem hiding this comment.
🟠 MAJOR: Path traversal via symlinks can bypass workspace guard
Issue: The resolveWorkspace function uses path.resolve() which does not resolve symlinks. An attacker could create a symlink inside the workspace root that points outside it, then request that symlinked path as the workspace.
Why: The string-based containment check on line 95 would pass because the resolved path starts with the root, but the actual filesystem target could be outside the allowed workspace. This bypasses the security boundary intended to restrict file access.
Fix: Use fs.realpathSync() to resolve symlinks before performing the path containment check:
import { realpathSync } from 'node:fs';
function resolveWorkspace(workspaceRoot: string, requestedWorkspace?: string): string {
const root = realpathSync(path.resolve(workspaceRoot));
const resolved = requestedWorkspace
? path.resolve(
path.isAbsolute(requestedWorkspace)
? requestedWorkspace
: path.join(root, requestedWorkspace)
)
: root;
// Resolve symlinks in the requested path too
let realResolved: string;
try {
realResolved = realpathSync(resolved);
} catch {
throw new Error(`Workspace does not exist: ${resolved}`);
}
if (realResolved !== root && !realResolved.startsWith(`${root}${path.sep}`)) {
throw new Error(`Workspace must be within ${root}`);
}
// ... rest of function
}Refs:
| async function readBody(request: IncomingMessage): Promise<string> { | ||
| return await new Promise((resolve, reject) => { | ||
| const chunks: Buffer[] = []; | ||
| request.on('data', (chunk) => chunks.push(Buffer.from(chunk))); | ||
| request.on('end', () => resolve(Buffer.concat(chunks).toString('utf8'))); | ||
| request.on('error', reject); | ||
| }); | ||
| } |
There was a problem hiding this comment.
🟠 MAJOR: Request body reading has no size limit
Issue: The readBody function accumulates all request data into memory without a size limit. A malicious or buggy client could send an arbitrarily large payload, causing memory exhaustion before JSON parsing even attempts.
Why: While the server binds to 127.0.0.1 by default, it could be misconfigured to bind publicly. Even locally, a runaway script could DoS the server.
Fix: Add a maximum body size check and abort if exceeded:
const MAX_BODY_SIZE = 10 * 1024 * 1024; // 10MB
async function readBody(request: IncomingMessage): Promise<string> {
return await new Promise((resolve, reject) => {
const chunks: Buffer[] = [];
let totalSize = 0;
request.on('data', (chunk) => {
totalSize += chunk.length;
if (totalSize > MAX_BODY_SIZE) {
request.destroy(new Error('Request body too large'));
reject(new Error('Request body too large'));
return;
}
chunks.push(Buffer.from(chunk));
});
request.on('end', () => resolve(Buffer.concat(chunks).toString('utf8')));
request.on('error', reject);
});
}Then handle the error in the request handler to return 413 Payload Too Large.
| throw new Error('Write mode requested, but this node was started without write access'); | ||
| } | ||
|
|
||
| const timeoutMs = metadata.timeoutMs ?? options.defaultTimeoutMs; |
There was a problem hiding this comment.
🟡 Minor: User-supplied timeoutMs lacks validation bounds
Issue: The timeoutMs value from request metadata is used directly without validation. A client could pass timeoutMs: 0 (immediate cancellation), negative values (undefined setTimeout behavior), or extremely large values (disabling timeout protection).
Why: Could lead to DoS via hung processes or unexpected immediate task cancellation.
Fix:
| const timeoutMs = metadata.timeoutMs ?? options.defaultTimeoutMs; | |
| const timeoutMs = Math.max(1000, Math.min(metadata.timeoutMs ?? options.defaultTimeoutMs, 86400000)); |
This bounds timeout to 1 second minimum, 24 hours maximum.
| let stdout = ''; | ||
| let stderr = ''; | ||
| const timeout = setTimeout(() => { | ||
| record.canceled = true; | ||
| terminateChild(child); | ||
| }, timeoutMs); | ||
|
|
||
| child.stdout.on('data', (chunk) => { | ||
| stdout += chunk.toString(); | ||
| }); | ||
| child.stderr.on('data', (chunk) => { | ||
| stderr += chunk.toString(); | ||
| }); |
There was a problem hiding this comment.
🟡 Minor: stdout/stderr buffers grow unbounded during execution
Issue: The stdout and stderr variables accumulate all output from the child process without size limit. A misbehaving runner could produce gigabytes of output, leading to memory exhaustion.
Why: While output is truncated to MAX_OUTPUT_CHARS before being sent in the artifact, the full output is buffered in memory first.
Fix: Track buffer size and stop accumulating once a threshold is reached:
const MAX_BUFFER_SIZE = MAX_OUTPUT_CHARS * 2; // 40KB
child.stdout.on('data', (chunk) => {
if (stdout.length < MAX_BUFFER_SIZE) {
stdout += chunk.toString().slice(0, MAX_BUFFER_SIZE - stdout.length);
}
});
child.stderr.on('data', (chunk) => {
if (stderr.length < MAX_BUFFER_SIZE) {
stderr += chunk.toString().slice(0, MAX_BUFFER_SIZE - stderr.length);
}
});| describe('code node server', () => { | ||
| it('serves an agent card and handles blocking message/send', async () => { | ||
| const workspace = await makeWorkspace(); | ||
| mockChildProcessState.gitStatuses.push('', ''); | ||
| mockChildProcessState.spawnBehaviors.push({ stdout: 'Runner finished' }); | ||
|
|
||
| const started = await startCodeNodeServer({ | ||
| host: '127.0.0.1', | ||
| port: 0, | ||
| workspaceRoot: workspace, | ||
| runnerCommand: 'node', | ||
| runnerArgs: ['-e', 'process.stdout.write(process.argv[1])', '{prompt}'], | ||
| allowWrite: false, | ||
| defaultTimeoutMs: 5_000, | ||
| name: 'Test Code Node', | ||
| description: 'Test bridge', | ||
| version: 'test', | ||
| }); | ||
| startedServers.push(started); | ||
|
|
||
| const agentCard = JSON.parse( | ||
| (await requestText(`${started.baseUrl}/.well-known/agent.json`)).body | ||
| ); | ||
| expect(agentCard.url).toBe(`${started.baseUrl}/a2a`); | ||
| expect(agentCard.skills[0].id).toBe('local-code-execution'); | ||
|
|
||
| const result = await postJson(`${started.baseUrl}/a2a`, { | ||
| jsonrpc: '2.0', | ||
| id: 1, | ||
| method: 'message/send', | ||
| params: { | ||
| message: { | ||
| kind: 'message', | ||
| messageId: 'msg-1', | ||
| role: 'user', | ||
| contextId: 'ctx-1', | ||
| parts: [{ kind: 'text', text: 'Say hello' }], | ||
| metadata: { | ||
| codeNode: { | ||
| runnerArgs: ['-e', 'process.stdout.write(process.argv[1])', 'override {prompt}'], | ||
| }, | ||
| }, | ||
| }, | ||
| configuration: { | ||
| blocking: true, | ||
| acceptedOutputModes: ['text'], | ||
| }, | ||
| }, | ||
| }); | ||
|
|
||
| expect(result.result.status.state).toBe('completed'); | ||
| expect(result.result.artifacts[0].parts[0].text).toContain('Runner finished'); | ||
| expect(mockChildProcessState.spawnCalls[0].command).toBe('node'); | ||
| expect(mockChildProcessState.spawnCalls[0].args[2]).toContain('override'); | ||
| expect(mockChildProcessState.spawnCalls[0].args[2]).toContain('Task:\n\nSay hello'); | ||
| }); | ||
|
|
||
| it('includes changed files and verification output in a successful write run', async () => { | ||
| const workspace = await makeWorkspace(); | ||
| mockChildProcessState.gitStatuses.push('', '?? generated.txt\n'); | ||
| mockChildProcessState.spawnBehaviors.push({ stdout: 'done' }, { stdout: 'lint ok' }); | ||
|
|
||
| const started = await startCodeNodeServer({ | ||
| host: '127.0.0.1', | ||
| port: 0, | ||
| workspaceRoot: workspace, | ||
| runnerCommand: 'node', | ||
| runnerArgs: ['-e', "process.stdout.write('done')"], | ||
| verificationCommand: 'pnpm lint', | ||
| allowWrite: true, | ||
| defaultTimeoutMs: 5_000, | ||
| name: 'Writer Node', | ||
| description: 'Writes files', | ||
| version: 'test', | ||
| }); | ||
| startedServers.push(started); | ||
|
|
||
| const result = await postJson(`${started.baseUrl}/a2a`, { | ||
| jsonrpc: '2.0', | ||
| id: 2, | ||
| method: 'message/send', | ||
| params: { | ||
| message: { | ||
| kind: 'message', | ||
| messageId: 'msg-2', | ||
| role: 'user', | ||
| contextId: 'ctx-2', | ||
| parts: [{ kind: 'text', text: 'Create a file' }], | ||
| metadata: { codeNode: { mode: 'write' } }, | ||
| }, | ||
| configuration: { | ||
| blocking: true, | ||
| acceptedOutputModes: ['text'], | ||
| }, | ||
| }, | ||
| }); | ||
|
|
||
| const dataPart = result.result.artifacts[0].parts.find((part: any) => part.kind === 'data'); | ||
| expect(dataPart.data.changedFiles).toEqual(['generated.txt']); | ||
| expect(dataPart.data.verification.command).toBe('pnpm lint'); | ||
| expect(dataPart.data.verification.stdout).toContain('lint ok'); | ||
| expect(mockChildProcessState.spawnCalls[1].args).toEqual(['-lc', 'pnpm lint']); | ||
| }); | ||
|
|
||
| it('rejects workspaces outside the configured root', async () => { | ||
| const workspace = await makeWorkspace(); | ||
| const outside = await makeWorkspace(); | ||
| mockChildProcessState.gitStatuses.push('', ''); | ||
|
|
||
| const started = await startCodeNodeServer({ | ||
| host: '127.0.0.1', | ||
| port: 0, | ||
| workspaceRoot: workspace, | ||
| runnerCommand: 'node', | ||
| runnerArgs: ['-e', "process.stdout.write('ok')"], | ||
| allowWrite: false, | ||
| defaultTimeoutMs: 5_000, | ||
| name: 'Guarded Node', | ||
| description: 'Guards workspace', | ||
| version: 'test', | ||
| }); | ||
| startedServers.push(started); | ||
|
|
||
| const result = await postJson(`${started.baseUrl}/a2a`, { | ||
| jsonrpc: '2.0', | ||
| id: 3, | ||
| method: 'message/send', | ||
| params: { | ||
| message: { | ||
| kind: 'message', | ||
| messageId: 'msg-3', | ||
| role: 'user', | ||
| contextId: 'ctx-3', | ||
| parts: [{ kind: 'text', text: 'Outside root' }], | ||
| metadata: { codeNode: { workspace: outside } }, | ||
| }, | ||
| configuration: { | ||
| blocking: true, | ||
| acceptedOutputModes: ['text'], | ||
| }, | ||
| }, | ||
| }); | ||
|
|
||
| expect(result.result.status.state).toBe('failed'); | ||
| expect(result.result.status.message.parts[0].text).toContain('Workspace must be within'); | ||
| expect(mockChildProcessState.spawnCalls).toHaveLength(0); | ||
| }); | ||
|
|
||
| it('supports non-blocking send plus cancel', async () => { | ||
| const workspace = await makeWorkspace(); | ||
| mockChildProcessState.gitStatuses.push('', ''); | ||
| mockChildProcessState.spawnBehaviors.push({ autoClose: false }); | ||
|
|
||
| const started = await startCodeNodeServer({ | ||
| host: '127.0.0.1', | ||
| port: 0, | ||
| workspaceRoot: workspace, | ||
| runnerCommand: 'node', | ||
| runnerArgs: ['-e', "setTimeout(() => process.stdout.write('done'), 60000)"], | ||
| allowWrite: false, | ||
| defaultTimeoutMs: 60_000, | ||
| name: 'Cancelable Node', | ||
| description: 'Cancelable run', | ||
| version: 'test', | ||
| }); | ||
| startedServers.push(started); | ||
|
|
||
| const startResult = await postJson(`${started.baseUrl}/a2a`, { | ||
| jsonrpc: '2.0', | ||
| id: 4, | ||
| method: 'message/send', | ||
| params: { | ||
| message: { | ||
| kind: 'message', | ||
| messageId: 'msg-4', | ||
| role: 'user', | ||
| contextId: 'ctx-4', | ||
| parts: [{ kind: 'text', text: 'Long task' }], | ||
| }, | ||
| configuration: { | ||
| blocking: false, | ||
| acceptedOutputModes: ['text'], | ||
| }, | ||
| }, | ||
| }); | ||
|
|
||
| expect(startResult.result.status.state).toBe('working'); | ||
| const taskId = startResult.result.id; | ||
|
|
||
| const cancelResult = await postJson(`${started.baseUrl}/a2a`, { | ||
| jsonrpc: '2.0', | ||
| id: 5, | ||
| method: 'tasks/cancel', | ||
| params: { id: taskId }, | ||
| }); | ||
|
|
||
| expect(cancelResult.result.status.state).toBe('canceled'); | ||
| expect(mockChildProcessState.spawnCalls[0].child.kill).toHaveBeenCalledWith('SIGTERM'); | ||
|
|
||
| await new Promise((resolve) => setTimeout(resolve, 0)); | ||
| const taskResult = await postJson(`${started.baseUrl}/a2a`, { | ||
| jsonrpc: '2.0', | ||
| id: 6, | ||
| method: 'tasks/get', | ||
| params: { id: taskId }, | ||
| }); | ||
|
|
||
| expect(taskResult.result.status.state).toBe('canceled'); | ||
| }); | ||
|
|
||
| it('streams status and artifact events for message/stream', async () => { | ||
| const workspace = await makeWorkspace(); | ||
| mockChildProcessState.gitStatuses.push('', ''); | ||
| mockChildProcessState.spawnBehaviors.push({ stdout: 'streamed result' }); | ||
|
|
||
| const started = await startCodeNodeServer({ | ||
| host: '127.0.0.1', | ||
| port: 0, | ||
| workspaceRoot: workspace, | ||
| runnerCommand: 'node', | ||
| runnerArgs: ['-e', "process.stdout.write('streamed result')"], | ||
| allowWrite: false, | ||
| defaultTimeoutMs: 5_000, | ||
| name: 'Streaming Node', | ||
| description: 'Streams progress', | ||
| version: 'test', | ||
| }); | ||
| startedServers.push(started); | ||
|
|
||
| const response = await requestText(`${started.baseUrl}/a2a`, { | ||
| method: 'POST', | ||
| headers: { 'Content-Type': 'application/json' }, | ||
| body: JSON.stringify({ | ||
| jsonrpc: '2.0', | ||
| id: 7, | ||
| method: 'message/stream', | ||
| params: { | ||
| message: { | ||
| kind: 'message', | ||
| messageId: 'msg-7', | ||
| role: 'user', | ||
| contextId: 'ctx-7', | ||
| parts: [{ kind: 'text', text: 'Stream this' }], | ||
| }, | ||
| configuration: { | ||
| acceptedOutputModes: ['text'], | ||
| }, | ||
| }, | ||
| }), | ||
| }); | ||
|
|
||
| expect(response.headers['content-type']).toContain('text/event-stream'); | ||
| expect(response.body).toContain('"kind":"status-update"'); | ||
| expect(response.body).toContain('"kind":"artifact-update"'); | ||
| expect(response.body).toContain('"state":"completed"'); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
🟠 MAJOR: Missing test coverage for critical failure paths
Issue: Several important code paths lack test coverage:
-
Write mode enforcement (security-critical):
code-node-server.ts:349-351rejects write mode whenallowWrite: false, but no test verifies this. -
Runner failure handling:
code-node-server.ts:458-463transitions toFailedstate on non-zero exit, but all tests exercise success paths only. -
Verification command failure: Tests only cover verification success, not what happens when lint/test commands fail.
Why: Without coverage, regressions could allow unauthorized file modifications or incorrectly report failed executions as completed.
Fix: Add these test cases:
it('rejects write mode when allowWrite is false', async () => {
const started = await startCodeNodeServer({
// ... allowWrite: false
});
const result = await postJson(`${started.baseUrl}/a2a`, {
// ... metadata: { codeNode: { mode: 'write' } }
});
expect(result.result.status.state).toBe('failed');
expect(result.result.status.message.parts[0].text).toContain('Write mode requested');
});
it('reports failed state when runner exits non-zero', async () => {
mockChildProcessState.spawnBehaviors.push({ exitCode: 1, stderr: 'error' });
// ... assert status.state === 'failed'
});
Summary
Add a local
inkeep code-nodebridge for external-agent development so coding tasks can run against a checked-out repo during local dev.Changes
inkeep code-nodeCLI command that starts a local A2A-compatible bridgecodeNode.runnerArgsoverrides so different callers can use different local runner invocationslocalCodeNodeAgentand document how to pointexternalAgent()at the local bridgecode-node-servertest suite so it does not depend on ambientglobal.fetchstate from other test filesWhy
WS6 Phase 1 is the local execution layer for parallel local dev. This makes it possible to point an external agent at a local repo checkout and get back task output plus file and verification metadata, without waiting for the durable executor work.
Test Plan
pnpm formatpnpm --filter @inkeep/agents-cli testINKEEP_AGENTS_MANAGE_DATABASE_URL=postgresql://localhost:5432/test INKEEP_AGENTS_RUN_DATABASE_URL=postgresql://localhost:5433/test pnpm --filter @inkeep/agents-api testINKEEP_AGENTS_MANAGE_DATABASE_URL=postgresql://localhost:5432/test INKEEP_AGENTS_RUN_DATABASE_URL=postgresql://localhost:5433/test pnpm checkNotes
This PR is the
agentsside of the WS6 local-dev flow. The/shipintegration lives inteam-skillsas a separate companion change.The final monorepo
pnpm checkwas attempted multiple times, but this worktree hit unrelated flaky failures outside this diff inpackages/agents-coreandagents-manage-ui.