Skip to content

feat: add local code-node bridge for external-agent dev#2767

Open
vnv-varun wants to merge 3 commits intomainfrom
varun/ws6-phase1-claude-code-node
Open

feat: add local code-node bridge for external-agent dev#2767
vnv-varun wants to merge 3 commits intomainfrom
varun/ws6-phase1-claude-code-node

Conversation

@vnv-varun
Copy link
Contributor

Summary

Add a local inkeep code-node bridge for external-agent development so coding tasks can run against a checked-out repo during local dev.

Changes

  • add a new inkeep code-node CLI command that starts a local A2A-compatible bridge
  • execute incoming tasks inside a configured workspace with workspace guards, blocking send, streaming, cancelation, changed-file reporting, and optional verification output
  • allow per-request codeNode.runnerArgs overrides so different callers can use different local runner invocations
  • add a reference localCodeNodeAgent and document how to point externalAgent() at the local bridge
  • harden the new code-node-server test suite so it does not depend on ambient global.fetch state from other test files

Why

WS6 Phase 1 is the local execution layer for parallel local dev. This makes it possible to point an external agent at a local repo checkout and get back task output plus file and verification metadata, without waiting for the durable executor work.

Test Plan

  • pnpm format
  • pnpm --filter @inkeep/agents-cli test
  • INKEEP_AGENTS_MANAGE_DATABASE_URL=postgresql://localhost:5432/test INKEEP_AGENTS_RUN_DATABASE_URL=postgresql://localhost:5433/test pnpm --filter @inkeep/agents-api test
  • manual live-server smoke for blocking send, streaming, cancelation, changed files, verification output, and workspace guard behavior
  • INKEEP_AGENTS_MANAGE_DATABASE_URL=postgresql://localhost:5432/test INKEEP_AGENTS_RUN_DATABASE_URL=postgresql://localhost:5433/test pnpm check

Notes

This PR is the agents side of the WS6 local-dev flow. The /ship integration lives in team-skills as a separate companion change.

The final monorepo pnpm check was attempted multiple times, but this worktree hit unrelated flaky failures outside this diff in packages/agents-core and agents-manage-ui.

@vercel
Copy link

vercel bot commented Mar 19, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agents-api Ready Ready Preview, Comment Mar 19, 2026 0:49am
agents-docs Ready Ready Preview, Comment Mar 19, 2026 0:49am
agents-manage-ui Ready Ready Preview, Comment Mar 19, 2026 0:49am

Request Review

@changeset-bot
Copy link

changeset-bot bot commented Mar 19, 2026

🦋 Changeset detected

Latest commit: 58c8005

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 10 packages
Name Type
@inkeep/agents-cli Patch
@inkeep/agents-api Patch
@inkeep/agents-manage-ui Patch
@inkeep/agents-core Patch
@inkeep/agents-email Patch
@inkeep/agents-mcp Patch
@inkeep/agents-sdk Patch
@inkeep/agents-work-apps Patch
@inkeep/ai-sdk-provider Patch
@inkeep/create-agents Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Contributor

@pullfrog pullfrog bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium-priority — solid feature addition with a few issues worth addressing before merge.

The code-node bridge is well-structured: clean A2A compliance, good test coverage (blocking, streaming, cancel, workspace guard), and proper shutdown semantics. The main concerns are a default port collision with OTLP, a zsh fallback that will fail on most CI/server environments, unbounded growth of the taskRecords map, and per-request runnerArgs overrides accepting arbitrary argument injection from callers.

Pullfrog  | Fix all ➔Fix 👍s ➔View workflow runpullfrog.com𝕏

.description('Start a local A2A coding-agent node for local development')
.requiredOption('--workspace <path>', 'Workspace root the node may access')
.option('--host <host>', 'Host to bind the local server to', '127.0.0.1')
.option('--port <port>', 'Port to run the local server on', '4318')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Port 4318 is the well-known OTLP/HTTP port. The SDK docs (agents-docs/content/get-started/traces.mdx) and packages/agents-sdk/src/telemetry-provider.ts both reference localhost:4318 as the OTEL collector endpoint. Running inkeep code-node on the same machine as a local SigNoz/OTEL collector will produce a silent port conflict.

Consider a different default (e.g., 4320 or 14318) to avoid the collision.

Suggested change
.option('--port <port>', 'Port to run the local server on', '4318')
.option('--port <port>', 'Port to run the local server on', '4320')

verification = {
command: options.verificationCommand,
...(await runCommand(
process.env.SHELL || 'zsh',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zsh as a fallback is fragile — most CI containers, minimal Docker images, and many Linux distributions don't ship zsh. POSIX sh (or /bin/sh) is universally available.

Suggested change
process.env.SHELL || 'zsh',
process.env.SHELL || '/bin/sh',

const beforeStatus = snapshotGitStatus(workspace);
const rawPrompt = getPromptText(params.message);
const prompt = buildPrompt(rawPrompt, workspace, mode);
const runnerArgs = metadata.runnerArgs?.length ? metadata.runnerArgs : options.runnerArgs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per-request runnerArgs overrides from untrusted callers let anyone inject arbitrary arguments into the spawned process. Since the runnerCommand is controlled by the server operator, the impact is limited — but a caller could still pass args that change behavior in unexpected ways (e.g., --dangerously-skip-permissions for Claude Code). If this is intentional, a brief code comment explaining the trust model would help future readers; otherwise, consider dropping per-request arg overrides or gating them behind an opt-in flag.

export async function startCodeNodeServer(
options: CodeNodeServerOptions
): Promise<StartedCodeNodeServer> {
const taskRecords = new Map<string, TaskRecord>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taskRecords is never pruned — completed tasks accumulate indefinitely. For a long-running dev server this is a slow memory leak. Consider evicting terminal-state tasks after a TTL (e.g., 5 minutes) or capping the map size.

Comment on lines +370 to +375
let stdout = '';
let stderr = '';
const timeout = setTimeout(() => {
record.canceled = true;
terminateChild(child);
}, timeoutMs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stdout/stderr are accumulated without any size bound. A pathological runner could produce gigabytes of output, exhausting process memory. Consider capping the accumulated buffer size (e.g., to MAX_OUTPUT_CHARS) and discarding excess.

throw new Error('Write mode requested, but this node was started without write access');
}

const timeoutMs = metadata.timeoutMs ?? options.defaultTimeoutMs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metadata.timeoutMs comes from the untrusted request. A caller can set timeoutMs: Number.MAX_SAFE_INTEGER to effectively disable the timeout. Consider clamping to a server-side maximum (e.g., Math.min(metadata.timeoutMs, options.defaultTimeoutMs * 10)).

)
: root;

if (resolved !== root && !resolved.startsWith(`${root}${path.sep}`)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The startsWith check uses path.resolve which follows symlinks. A symlink inside the workspace root that points outside it would pass this check. Since this is a local-dev tool the risk is low, but worth noting for the trust model.

id: "local-code-node",
name: "Local Code Node",
description: "Delegates coding tasks into a local workspace during development",
baseUrl: "http://127.0.0.1:4318",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the default port changes (see port-conflict comment), update this to match.

id: 'local-code-node',
name: 'Local Code Node',
description: 'Delegates coding tasks into a locally running code-node bridge',
baseUrl: 'http://127.0.0.1:4318',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above — keep this in sync if the default port changes.

@pullfrog
Copy link
Contributor

pullfrog bot commented Mar 19, 2026

TL;DR — Adds an experimental inkeep code-node CLI command that starts a local A2A-compliant HTTP server, bridging Inkeep agents to a local CLI runner (defaulting to claude) for delegating coding tasks against a checked-out repository. The server handles task lifecycle, git change detection, optional verification commands, and reports structured results back through the A2A protocol.

Key changes

  • code-node-server.ts — New ~690-line A2A HTTP server that spawns a configurable CLI runner, tracks git diffs before/after execution, runs optional verification commands, and exposes message/send, message/stream, tasks/get, and tasks/cancel JSON-RPC methods alongside an Agent Card endpoint.
  • code-node.ts command + CLI registration — Wires the server into the CLI as inkeep code-node with options for --workspace, --runner-command, --runner-arg, --verification-command, --write, and more.
  • code-node-server.test.ts — Comprehensive test suite covering blocking/non-blocking sends, SSE streaming, workspace security validation, task cancellation, git change detection, and verification output capture.
  • local-code-node-agent.ts — Reference test agent demonstrating how to register the local bridge as an external agent via externalAgent() from the SDK.
  • external-agents.mdx + README.md — Documentation for the new command and its integration pattern.

Summary | 10 files | 3 commits | base: mainvarun/ws6-phase1-claude-code-node


code-node-server.ts — A2A bridge server

Before: No way to delegate coding tasks from Inkeep agents to a local CLI tool.
After: A full A2A-compliant HTTP server spawns a configurable runner (e.g. claude -p {prompt}), captures stdout/stderr, diffs git status before and after execution, optionally runs a verification command, and returns structured artifacts with changedFiles, exit codes, and verification output.

The server supports three execution paths:

Method Behavior
message/send (blocking) Awaits runner completion, returns final task
message/send (non-blocking) Returns immediately in working state, poll via tasks/get
message/stream Streams status-update and artifact-update SSE events

Security is enforced by resolveWorkspace(), which validates that any per-request workspace override resolves within the configured root — preventing directory traversal. The ExecutionMode type (read / write) gates whether the runner is allowed to modify files, controlled by the --write flag or per-request metadata.

How does per-request metadata work?

Callers can pass a codeNode key in message or params metadata to override workspace, timeoutMs, mode, and runnerArgs per request. The getMetadata() function merges params-level and message-level metadata, with message-level taking precedence.

agents-cli/src/utils/code-node-server.ts · agents-cli/src/commands/code-node.ts · agents-cli/src/index.ts


code-node-server.test.ts — test suite

Before: No test coverage for code-node functionality.
After: Five tests validate the full A2A lifecycle using mocked child processes and real temp directories.

Tests cover: blocking message/send with artifact validation, write-mode git diff detection with verification output, workspace-escape rejection, non-blocking send with tasks/cancel + child process kill, and SSE message/stream event shape. Child process mocking uses an EventEmitter-based spawn mock with configurable MockSpawnBehavior and queued spawnSync responses for git status.

agents-cli/src/__tests__/code-node-server.test.ts · agents-cli/src/__tests__/cli.test.ts


Reference agent + documentation

Before: No example of how to use a local code-node bridge with the agent framework.
After: A localCodeNodeAgent test agent shows the externalAgent()subAgent()agent() wiring, and docs explain the CLI flags and SDK registration.

The test agent defines three layers: an externalAgent pointing at http://127.0.0.1:4318, a coordinator subAgent that routes coding tasks to it, and a top-level agent that uses the coordinator as its default. Documentation is added to both the external-agents MDX page and the root README.

test-agents/agents/local-code-node-agent.ts · agents-docs/content/typescript-sdk/external-agents.mdx · README.md

Pullfrog  | View workflow run | Triggered by Pullfrogpullfrog.com𝕏

Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(7) Total Issues | Risk: Medium

🟠⚠️ Major (4) 🟠⚠️

Inline Comments:

  • 🟠 Major: code-node-server.ts:473 Task records Map grows unbounded without cleanup
  • 🟠 Major: code-node-server.ts:85-105 Path traversal via symlinks can bypass workspace guard
  • 🟠 Major: code-node-server.ts:241-248 Request body reading has no size limit
  • 🟠 Major: code-node-server.test.ts:188-444 Missing test coverage for write mode enforcement and failure paths

🟡 Minor (3) 🟡

Inline Comments:

  • 🟡 Minor: code-node-server.ts:353 User-supplied timeoutMs lacks validation bounds
  • 🟡 Minor: code-node-server.ts:370-382 stdout/stderr buffers grow unbounded during execution

🟡 1) code-node-server.ts:598-618 Missing server-side error logging

Issue: Errors during task execution (both streaming and non-blocking modes) are not logged server-side. Errors are only sent to clients or update task state silently.
Why: For incident debugging, operators need server-side visibility into failures. If client connections are lost, failures become invisible.
Fix: Add console.error('Task failed:', taskId, error) in error handlers at lines 598-607 and 616-618.
Refs: code-node-server.ts:598-618

💭 Consider (4) 💭

💭 1) index.ts:38-58 Command registration position
Issue: code-node is registered at the top of the command list, before lifecycle commands like add, init, push.
Fix: Consider moving near the dev command (~line 203) to group development utilities together.

💭 2) code-node.ts:55-68 Output format lacks emoji prefix
Issue: Other server commands like dev use emoji prefixes (e.g., 🚀). code-node uses plain chalk output.
Fix: Add emoji for visual consistency: console.log(chalk.green('🚀 Local code node listening at...')).

💭 3) code-node.ts:70-80 Shutdown handler lacks feedback
Issue: The dev command logs "🛑 Stopping dashboard server..." during shutdown. code-node exits silently.
Fix: Add shutdown message before await started.close().

💭 4) external-agents.mdx:56 Missing CLI options in docs
Issue: The --timeout-ms flag (default 2 min) is not documented. Users with long-running tasks will hit timeouts without knowing the option exists.
Fix: Add note: "Use --timeout-ms for tasks longer than 2 minutes. See inkeep code-node --help for all options."


💡 APPROVE WITH SUGGESTIONS

Summary: This is a well-structured implementation of a local A2A bridge with good test coverage for the happy paths. However, there are several resource management and security concerns that should be addressed:

  1. Memory leak from unbounded task record storage needs cleanup logic
  2. Path traversal via symlinks could bypass the workspace security boundary
  3. Unbounded request body could enable DoS
  4. Missing failure path tests for write-mode enforcement and runner failures

The fixes are straightforward and the PR is close to ready. The architectural approach is sound — the concerns are implementation details rather than design issues.

Discarded (5)
Location Issue Reason Discarded
code-node-server.ts:357 runnerArgs override from request Intentional per PR description ("Allow per-request code-node runner args")
code-node-server.ts:412-422 Verification shares timeout with main task Acceptable for local dev; separate timeout would over-complicate
code-node-server.ts:542-557 Race condition between cancel and completion Minor race with safe fallback; acceptable for local dev
index.ts:42 Host 127.0.0.1 vs localhost Functionally equivalent; 127.0.0.1 is more explicit
index.ts:43 Port 4318 conflicts with OTEL Valid concern but 4318 is common for local dev servers
Reviewers (5)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-standards 5 0 0 0 4 0 1
pr-review-sre 8 1 0 0 2 0 5
pr-review-tests 5 0 0 0 1 0 4
pr-review-consistency 6 0 3 0 0 0 3
pr-review-docs 2 0 1 0 0 0 1
Total 26 1 4 0 7 0 14

export async function startCodeNodeServer(
options: CodeNodeServerOptions
): Promise<StartedCodeNodeServer> {
const taskRecords = new Map<string, TaskRecord>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR: Task records Map grows unbounded

Issue: The taskRecords Map stores all tasks indefinitely with no cleanup. Each message/send or message/stream creates a new task record that is never removed, even after completion.

Why: In long-running deployments or under sustained load, this causes memory growth proportional to task count. Each record contains the full Task object with artifacts, history, and metadata.

Fix: Add TTL-based cleanup for completed tasks. Options:

  1. Periodic interval that removes terminal-state tasks older than a TTL (e.g., 1 hour)
  2. LRU cache with max size limit
  3. Clean up immediately on task completion with optional retention for debugging
// Option 1: Periodic cleanup
const TASK_RETENTION_MS = 60 * 60 * 1000; // 1 hour
setInterval(() => {
  const now = Date.now();
  for (const [id, record] of taskRecords) {
    if (isTerminalState(record.task.status.state)) {
      const timestamp = new Date(record.task.status.timestamp).getTime();
      if (now - timestamp > TASK_RETENTION_MS) {
        taskRecords.delete(id);
      }
    }
  }
}, 60_000).unref();

Refs:

Comment on lines +85 to +105
function resolveWorkspace(workspaceRoot: string, requestedWorkspace?: string): string {
const root = path.resolve(workspaceRoot);
const resolved = requestedWorkspace
? path.resolve(
path.isAbsolute(requestedWorkspace)
? requestedWorkspace
: path.join(root, requestedWorkspace)
)
: root;

if (resolved !== root && !resolved.startsWith(`${root}${path.sep}`)) {
throw new Error(`Workspace must be within ${root}`);
}

const stats = statSync(resolved, { throwIfNoEntry: false });
if (!stats || !stats.isDirectory()) {
throw new Error(`Workspace does not exist or is not a directory: ${resolved}`);
}

return resolved;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR: Path traversal via symlinks can bypass workspace guard

Issue: The resolveWorkspace function uses path.resolve() which does not resolve symlinks. An attacker could create a symlink inside the workspace root that points outside it, then request that symlinked path as the workspace.

Why: The string-based containment check on line 95 would pass because the resolved path starts with the root, but the actual filesystem target could be outside the allowed workspace. This bypasses the security boundary intended to restrict file access.

Fix: Use fs.realpathSync() to resolve symlinks before performing the path containment check:

import { realpathSync } from 'node:fs';

function resolveWorkspace(workspaceRoot: string, requestedWorkspace?: string): string {
  const root = realpathSync(path.resolve(workspaceRoot));
  const resolved = requestedWorkspace
    ? path.resolve(
        path.isAbsolute(requestedWorkspace)
          ? requestedWorkspace
          : path.join(root, requestedWorkspace)
      )
    : root;

  // Resolve symlinks in the requested path too
  let realResolved: string;
  try {
    realResolved = realpathSync(resolved);
  } catch {
    throw new Error(`Workspace does not exist: ${resolved}`);
  }

  if (realResolved !== root && !realResolved.startsWith(`${root}${path.sep}`)) {
    throw new Error(`Workspace must be within ${root}`);
  }

  // ... rest of function
}

Refs:

Comment on lines +241 to +248
async function readBody(request: IncomingMessage): Promise<string> {
return await new Promise((resolve, reject) => {
const chunks: Buffer[] = [];
request.on('data', (chunk) => chunks.push(Buffer.from(chunk)));
request.on('end', () => resolve(Buffer.concat(chunks).toString('utf8')));
request.on('error', reject);
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR: Request body reading has no size limit

Issue: The readBody function accumulates all request data into memory without a size limit. A malicious or buggy client could send an arbitrarily large payload, causing memory exhaustion before JSON parsing even attempts.

Why: While the server binds to 127.0.0.1 by default, it could be misconfigured to bind publicly. Even locally, a runaway script could DoS the server.

Fix: Add a maximum body size check and abort if exceeded:

const MAX_BODY_SIZE = 10 * 1024 * 1024; // 10MB

async function readBody(request: IncomingMessage): Promise<string> {
  return await new Promise((resolve, reject) => {
    const chunks: Buffer[] = [];
    let totalSize = 0;
    
    request.on('data', (chunk) => {
      totalSize += chunk.length;
      if (totalSize > MAX_BODY_SIZE) {
        request.destroy(new Error('Request body too large'));
        reject(new Error('Request body too large'));
        return;
      }
      chunks.push(Buffer.from(chunk));
    });
    request.on('end', () => resolve(Buffer.concat(chunks).toString('utf8')));
    request.on('error', reject);
  });
}

Then handle the error in the request handler to return 413 Payload Too Large.

throw new Error('Write mode requested, but this node was started without write access');
}

const timeoutMs = metadata.timeoutMs ?? options.defaultTimeoutMs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor: User-supplied timeoutMs lacks validation bounds

Issue: The timeoutMs value from request metadata is used directly without validation. A client could pass timeoutMs: 0 (immediate cancellation), negative values (undefined setTimeout behavior), or extremely large values (disabling timeout protection).

Why: Could lead to DoS via hung processes or unexpected immediate task cancellation.

Fix:

Suggested change
const timeoutMs = metadata.timeoutMs ?? options.defaultTimeoutMs;
const timeoutMs = Math.max(1000, Math.min(metadata.timeoutMs ?? options.defaultTimeoutMs, 86400000));

This bounds timeout to 1 second minimum, 24 hours maximum.

Comment on lines +370 to +382
let stdout = '';
let stderr = '';
const timeout = setTimeout(() => {
record.canceled = true;
terminateChild(child);
}, timeoutMs);

child.stdout.on('data', (chunk) => {
stdout += chunk.toString();
});
child.stderr.on('data', (chunk) => {
stderr += chunk.toString();
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor: stdout/stderr buffers grow unbounded during execution

Issue: The stdout and stderr variables accumulate all output from the child process without size limit. A misbehaving runner could produce gigabytes of output, leading to memory exhaustion.

Why: While output is truncated to MAX_OUTPUT_CHARS before being sent in the artifact, the full output is buffered in memory first.

Fix: Track buffer size and stop accumulating once a threshold is reached:

const MAX_BUFFER_SIZE = MAX_OUTPUT_CHARS * 2; // 40KB

child.stdout.on('data', (chunk) => {
  if (stdout.length < MAX_BUFFER_SIZE) {
    stdout += chunk.toString().slice(0, MAX_BUFFER_SIZE - stdout.length);
  }
});
child.stderr.on('data', (chunk) => {
  if (stderr.length < MAX_BUFFER_SIZE) {
    stderr += chunk.toString().slice(0, MAX_BUFFER_SIZE - stderr.length);
  }
});

Comment on lines +188 to +444
describe('code node server', () => {
it('serves an agent card and handles blocking message/send', async () => {
const workspace = await makeWorkspace();
mockChildProcessState.gitStatuses.push('', '');
mockChildProcessState.spawnBehaviors.push({ stdout: 'Runner finished' });

const started = await startCodeNodeServer({
host: '127.0.0.1',
port: 0,
workspaceRoot: workspace,
runnerCommand: 'node',
runnerArgs: ['-e', 'process.stdout.write(process.argv[1])', '{prompt}'],
allowWrite: false,
defaultTimeoutMs: 5_000,
name: 'Test Code Node',
description: 'Test bridge',
version: 'test',
});
startedServers.push(started);

const agentCard = JSON.parse(
(await requestText(`${started.baseUrl}/.well-known/agent.json`)).body
);
expect(agentCard.url).toBe(`${started.baseUrl}/a2a`);
expect(agentCard.skills[0].id).toBe('local-code-execution');

const result = await postJson(`${started.baseUrl}/a2a`, {
jsonrpc: '2.0',
id: 1,
method: 'message/send',
params: {
message: {
kind: 'message',
messageId: 'msg-1',
role: 'user',
contextId: 'ctx-1',
parts: [{ kind: 'text', text: 'Say hello' }],
metadata: {
codeNode: {
runnerArgs: ['-e', 'process.stdout.write(process.argv[1])', 'override {prompt}'],
},
},
},
configuration: {
blocking: true,
acceptedOutputModes: ['text'],
},
},
});

expect(result.result.status.state).toBe('completed');
expect(result.result.artifacts[0].parts[0].text).toContain('Runner finished');
expect(mockChildProcessState.spawnCalls[0].command).toBe('node');
expect(mockChildProcessState.spawnCalls[0].args[2]).toContain('override');
expect(mockChildProcessState.spawnCalls[0].args[2]).toContain('Task:\n\nSay hello');
});

it('includes changed files and verification output in a successful write run', async () => {
const workspace = await makeWorkspace();
mockChildProcessState.gitStatuses.push('', '?? generated.txt\n');
mockChildProcessState.spawnBehaviors.push({ stdout: 'done' }, { stdout: 'lint ok' });

const started = await startCodeNodeServer({
host: '127.0.0.1',
port: 0,
workspaceRoot: workspace,
runnerCommand: 'node',
runnerArgs: ['-e', "process.stdout.write('done')"],
verificationCommand: 'pnpm lint',
allowWrite: true,
defaultTimeoutMs: 5_000,
name: 'Writer Node',
description: 'Writes files',
version: 'test',
});
startedServers.push(started);

const result = await postJson(`${started.baseUrl}/a2a`, {
jsonrpc: '2.0',
id: 2,
method: 'message/send',
params: {
message: {
kind: 'message',
messageId: 'msg-2',
role: 'user',
contextId: 'ctx-2',
parts: [{ kind: 'text', text: 'Create a file' }],
metadata: { codeNode: { mode: 'write' } },
},
configuration: {
blocking: true,
acceptedOutputModes: ['text'],
},
},
});

const dataPart = result.result.artifacts[0].parts.find((part: any) => part.kind === 'data');
expect(dataPart.data.changedFiles).toEqual(['generated.txt']);
expect(dataPart.data.verification.command).toBe('pnpm lint');
expect(dataPart.data.verification.stdout).toContain('lint ok');
expect(mockChildProcessState.spawnCalls[1].args).toEqual(['-lc', 'pnpm lint']);
});

it('rejects workspaces outside the configured root', async () => {
const workspace = await makeWorkspace();
const outside = await makeWorkspace();
mockChildProcessState.gitStatuses.push('', '');

const started = await startCodeNodeServer({
host: '127.0.0.1',
port: 0,
workspaceRoot: workspace,
runnerCommand: 'node',
runnerArgs: ['-e', "process.stdout.write('ok')"],
allowWrite: false,
defaultTimeoutMs: 5_000,
name: 'Guarded Node',
description: 'Guards workspace',
version: 'test',
});
startedServers.push(started);

const result = await postJson(`${started.baseUrl}/a2a`, {
jsonrpc: '2.0',
id: 3,
method: 'message/send',
params: {
message: {
kind: 'message',
messageId: 'msg-3',
role: 'user',
contextId: 'ctx-3',
parts: [{ kind: 'text', text: 'Outside root' }],
metadata: { codeNode: { workspace: outside } },
},
configuration: {
blocking: true,
acceptedOutputModes: ['text'],
},
},
});

expect(result.result.status.state).toBe('failed');
expect(result.result.status.message.parts[0].text).toContain('Workspace must be within');
expect(mockChildProcessState.spawnCalls).toHaveLength(0);
});

it('supports non-blocking send plus cancel', async () => {
const workspace = await makeWorkspace();
mockChildProcessState.gitStatuses.push('', '');
mockChildProcessState.spawnBehaviors.push({ autoClose: false });

const started = await startCodeNodeServer({
host: '127.0.0.1',
port: 0,
workspaceRoot: workspace,
runnerCommand: 'node',
runnerArgs: ['-e', "setTimeout(() => process.stdout.write('done'), 60000)"],
allowWrite: false,
defaultTimeoutMs: 60_000,
name: 'Cancelable Node',
description: 'Cancelable run',
version: 'test',
});
startedServers.push(started);

const startResult = await postJson(`${started.baseUrl}/a2a`, {
jsonrpc: '2.0',
id: 4,
method: 'message/send',
params: {
message: {
kind: 'message',
messageId: 'msg-4',
role: 'user',
contextId: 'ctx-4',
parts: [{ kind: 'text', text: 'Long task' }],
},
configuration: {
blocking: false,
acceptedOutputModes: ['text'],
},
},
});

expect(startResult.result.status.state).toBe('working');
const taskId = startResult.result.id;

const cancelResult = await postJson(`${started.baseUrl}/a2a`, {
jsonrpc: '2.0',
id: 5,
method: 'tasks/cancel',
params: { id: taskId },
});

expect(cancelResult.result.status.state).toBe('canceled');
expect(mockChildProcessState.spawnCalls[0].child.kill).toHaveBeenCalledWith('SIGTERM');

await new Promise((resolve) => setTimeout(resolve, 0));
const taskResult = await postJson(`${started.baseUrl}/a2a`, {
jsonrpc: '2.0',
id: 6,
method: 'tasks/get',
params: { id: taskId },
});

expect(taskResult.result.status.state).toBe('canceled');
});

it('streams status and artifact events for message/stream', async () => {
const workspace = await makeWorkspace();
mockChildProcessState.gitStatuses.push('', '');
mockChildProcessState.spawnBehaviors.push({ stdout: 'streamed result' });

const started = await startCodeNodeServer({
host: '127.0.0.1',
port: 0,
workspaceRoot: workspace,
runnerCommand: 'node',
runnerArgs: ['-e', "process.stdout.write('streamed result')"],
allowWrite: false,
defaultTimeoutMs: 5_000,
name: 'Streaming Node',
description: 'Streams progress',
version: 'test',
});
startedServers.push(started);

const response = await requestText(`${started.baseUrl}/a2a`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
jsonrpc: '2.0',
id: 7,
method: 'message/stream',
params: {
message: {
kind: 'message',
messageId: 'msg-7',
role: 'user',
contextId: 'ctx-7',
parts: [{ kind: 'text', text: 'Stream this' }],
},
configuration: {
acceptedOutputModes: ['text'],
},
},
}),
});

expect(response.headers['content-type']).toContain('text/event-stream');
expect(response.body).toContain('"kind":"status-update"');
expect(response.body).toContain('"kind":"artifact-update"');
expect(response.body).toContain('"state":"completed"');
});
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR: Missing test coverage for critical failure paths

Issue: Several important code paths lack test coverage:

  1. Write mode enforcement (security-critical): code-node-server.ts:349-351 rejects write mode when allowWrite: false, but no test verifies this.

  2. Runner failure handling: code-node-server.ts:458-463 transitions to Failed state on non-zero exit, but all tests exercise success paths only.

  3. Verification command failure: Tests only cover verification success, not what happens when lint/test commands fail.

Why: Without coverage, regressions could allow unauthorized file modifications or incorrectly report failed executions as completed.

Fix: Add these test cases:

it('rejects write mode when allowWrite is false', async () => {
  const started = await startCodeNodeServer({
    // ... allowWrite: false
  });
  const result = await postJson(`${started.baseUrl}/a2a`, {
    // ... metadata: { codeNode: { mode: 'write' } }
  });
  expect(result.result.status.state).toBe('failed');
  expect(result.result.status.message.parts[0].text).toContain('Write mode requested');
});

it('reports failed state when runner exits non-zero', async () => {
  mockChildProcessState.spawnBehaviors.push({ exitCode: 1, stderr: 'error' });
  // ... assert status.state === 'failed'
});

@github-actions github-actions bot deleted a comment from claude bot Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant