Conversation
- Polymorphic agent deserialization in EventService (type(self.stored.agent) instead of hardcoded Agent) - Eager import of ACPAgent in api.py to register in discriminated union - Add close() lifecycle method to AgentBase, call from LocalConversation.close() - Await conversation close in EventService to ensure subprocess cleanup - Pre-install claude-code-acp in Docker image for ACPAgent support - Add remote runtime example for ACPAgent Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
API breakage checks (Griffe)Result: Passed |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste - Elegant solution that eliminates special cases
KEY INSIGHT: The polymorphic deserialization (agent_cls = type(self.stored.agent)) is textbook good design - it preserves type information naturally without conditionals, making ACPAgent work seamlessly alongside the base Agent class.
What I like:
- Data structure wins: Using Python's type system to handle polymorphism instead of switch statements
- Proper lifecycle management: The
close()hook is clean extensibility (no-op by default, override where needed) - Correctness fix: The await on
run_in_executor(line 609) - you actually need to wait for the cleanup - Solves real problem: ACPAgent subprocess cleanup is necessary, not theoretical
Minor notes:
- The eager import in api.py is a necessary hack for Pydantic discriminated unions - acceptable pragmatism
- Could add a type guard on
agent_cls, but givenstored.agentis pre-validated, this is safe
✅ LGTM - Core logic is sound, changes are minimal and well-targeted.
| assert isinstance(workspace, LocalWorkspace) | ||
| Path(workspace.working_dir).mkdir(parents=True, exist_ok=True) | ||
| agent = Agent.model_validate( | ||
| agent_cls = type(self.stored.agent) |
There was a problem hiding this comment.
🟢 Good taste: This polymorphic deserialization is elegant - preserves the concrete type naturally instead of hardcoding Agent.model_validate().
The pattern agent_cls = type(obj) → agent_cls.model_validate() eliminates special cases and makes ACPAgent work seamlessly.
| if self._conversation: | ||
| loop = asyncio.get_running_loop() | ||
| loop.run_in_executor(None, self._conversation.close) | ||
| await loop.run_in_executor(None, self._conversation.close) |
There was a problem hiding this comment.
🟢 Correctness fix: This await was missing - run_in_executor returns a Future that must be awaited. Good catch.
| """ | ||
| return None | ||
|
|
||
| def close(self) -> None: |
There was a problem hiding this comment.
🟢 Clean extensibility: Hook pattern done right - no-op by default, subclasses override where needed. No forced coupling.
| from openhands.agent_server.tool_router import tool_router | ||
| from openhands.agent_server.vscode_router import vscode_router | ||
| from openhands.agent_server.vscode_service import get_vscode_service | ||
| from openhands.sdk.agent import ACPAgent # noqa: F401 — register in DiscriminatedUnionMixin |
There was a problem hiding this comment.
🟡 Pragmatic hack: The # noqa: F401 import is necessary for Pydantic discriminated union registration. Not elegant, but solves a real problem. Acceptable trade-off.
The claude-code-acp npm install fails on golang and java base images that don't have npm. Make it conditional on npm availability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…yloads With ACPAgent registered in the discriminated union, agent payloads now require an explicit "kind" field. Updated 6 test payloads and renumbered the ACP example from 07 to 09 to avoid duplicate. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Coverage Report •
|
||||||||||||||||||||||||||||||||||||||||||||||||||
The CI Agent Server workflow builds binary target images, not source. Update the example to use target_type="binary" accordingly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add 'agent_type' input (default/acp) to run-eval.yml and pass it through to the evaluation repo's eval-job workflow dispatch payload. This enables dispatching SWE-bench evaluations using ACPAgent (Claude Code via ACP). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SWE-bench prebaked images use Python-only base images without Node.js. The conditional `if command -v npm` silently skipped claude-code-acp installation. Now installs Node.js from nodesource when npm is absent. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ACPAgent now emits a FinishAction ActionEvent + ObservationEvent after each step() completes. This makes it compatible with evaluation frameworks (SWE-bench, etc.) that detect task completion by scanning for FinishAction in the event history. Without this, the benchmarks fake_user_response loop didn't detect completion and sent up to 10 "please continue" messages, and the AgentFinishedCritic marked results as failed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claude Code ACP defaults to "ask" mode for tool permissions when no settings file exists. This causes the agent to hang waiting for interactive permission approval in headless/eval environments. Add /etc/claude-code/managed-settings.json with allow rules for Edit, Read, and Bash tools so Claude Code can operate without human approval. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claude Code's internal permission system blocks tools like Write
when running in default mode, causing the agent to hang waiting
for interactive approval. This happens even when the ACP server's
settings allow the tools, because Claude Code checks permissions
before making MCP tool calls.
Call set_session_mode("bypassPermissions") after creating the
session so Claude Code skips all permission checks in headless
mode.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add acp_prompt_timeout field (default 1800s) to ACPAgent that wraps the prompt() call with AsyncExecutor's timeout. This prevents the eval from hanging forever when claude-code-acp fails to send the JSON-RPC response after completing its work. Root cause analysis: claude-code-acp's prompt() method awaits sessionUpdate notifications end-to-end through the stdout pipe. If the response is never sent (e.g., due to stdout backpressure or an unhandled result subtype), the Python ACP client blocks indefinitely. Also fixes test assertions for FinishAction emission added in d72fd08. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The default asyncio.StreamReader limit (64 KiB) is too small for ACP session_update notifications that carry large tool-call outputs. When a JSON-RPC line exceeds the limit, readline() raises LimitOverrunError which silently kills the filter pipeline, leaving the prompt() future unresolved forever and eventually deadlocking the subprocess stdout pipe. Root cause confirmed on v10 eval: 59,496 bytes of unread data stuck in the subprocess pipe, both processes sleeping in ep_poll. Increase both the subprocess pipe and filtered_reader limits to 100 MiB. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Auto-detect ACP server type from InitializeResponse.agent_info.name - Map session modes: claude-code → bypassPermissions, codex-acp → full-access - Add acp_session_mode field for manual override - Install codex-acp alongside claude-code-acp in Docker image - Add acp-codex to run-eval.yml agent_type choices Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
codex-acp requires an explicit `conn.authenticate()` call after initialize and before session creation. Auto-detect the auth method from environment variables (CODEX_API_KEY or OPENAI_API_KEY). Without this, codex-acp returns "Authentication required" when trying to create a session in the remote runtime container. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Aligns naming with 'acp-codex': both now follow the
acp-{provider} pattern, making the agent type self-documenting.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add elapsed time tracking to _record_usage() via add_response_latency() for both main prompt and fork (ask_agent) paths - Capture agent_version from InitializeResponse.agent_info alongside agent_name - Expose agent_name and agent_version as public properties on ACPAgent so benchmarks can include ACP server identity in eval output Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The _BYPASS_MODE_MAP only matched "claude-code" but the actual agent name is "@zed-industries/claude-agent-acp", which doesn't contain that substring. Add "claude-agent" as an additional key so the mode correctly resolves to "bypassPermissions" instead of falling through to the default "full-access" (which claude-agent-acp rejects). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
4176666 to
b9c42fd
Compare
The npm package was renamed from @zed-industries/claude-code-acp to @zed-industries/claude-agent-acp. Update all references: bypass mode map key, Dockerfile, examples, tests, and docstrings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Test plan
test_acp_agent.py+test_agent_loading.py🤖 Generated with Claude Code
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:2b3a7ae-pythonRun
All tags pushed for this build
About Multi-Architecture Support
2b3a7ae-python) is a multi-arch manifest supporting both amd64 and arm642b3a7ae-python-amd64) are also available if needed