Skip to content

fix(swe): resolve tool server startup failures with port allocation, retries, and shell fallback#18

Merged
echobt merged 4 commits intomainfrom
fix/tool-server-port-allocation-health-check
Feb 18, 2026
Merged

fix(swe): resolve tool server startup failures with port allocation, retries, and shell fallback#18
echobt merged 4 commits intomainfrom
fix/tool-server-port-allocation-health-check

Conversation

@echobt
Copy link
Contributor

@echobt echobt commented Feb 18, 2026

Summary

Fix the Docker sandbox tool server failing to start in all containers during concurrent pipeline execution, eliminating the pervasive "Tool server health check failed after 3s, tools may not work" warnings.

Changes

Port Allocation (docker_sandbox.rs)

  • Replace timestamp-based port derivation with a global AtomicU16 counter (NEXT_PORT) that guarantees unique ports across all concurrent containers
  • Eliminate as u16 truncation bug and TOCTOU race conditions when multiple sandboxes start simultaneously under --network=host
  • Port range: 10,000–60,000 with atomic wrap-around

Tool Server Robustness (docker_sandbox.rs, tool_server.rs)

  • Add HTTPServer.allow_reuse_address = True in the embedded Python server to reduce Address already in use errors
  • Wrap HTTPServer() creation in try/except to catch and log OSError on port bind failure
  • Increase health check from 6×500ms (3s) to 12×500ms (6s) to handle slow container startup under load
  • Add full retry loop (up to 2 attempts) that kills stale processes and re-writes the server script
  • Verify script file size after writing to detect truncated stdin pipe transfers
  • Change start_tool_server() to return bool indicating success, tracked via tool_server_ok field

Shell-Based Tool Fallback (test_generator.rs)

  • Add shell_fallback() function implementing all 5 tools (read_file, list_dir, grep_files, search_files, apply_patch) via direct shell commands
  • When tool server is unavailable or returns connection errors, transparently fall back to shell execution
  • Extract parse_tool_response() into a standalone function for cleaner code organization

Error Handling Improvements (docker_sandbox.rs)

  • Upgrade git install failed from WARN-and-continue to anyhow::bail! — a sandbox without git is unusable
  • Upgrade checkout failed from WARN-and-continue to anyhow::bail! — running tests against the wrong commit produces invalid results
  • Both failures now properly destroy the container before returning the error

Tests

  • Add test_allocate_port_returns_valid_range and test_allocate_port_sequential_unique unit tests for the new port allocator

…l fallback

The tool server inside Docker sandbox containers was failing to start reliably,
producing "Tool server health check failed after 3s, tools may not work" warnings
for every container. This was caused by port collisions (timestamp-based port
derivation with --network=host), insufficient health check timeout, no retry
mechanism, and no fallback when the server was unavailable.

Port allocation: Replaced timestamp-based port derivation (which suffered from
u16 truncation and TOCTOU races) with a global AtomicU16 counter that guarantees
unique ports across all concurrent containers. Each sandbox atomically increments
the counter, wrapping from 60000 back to 10000.

Tool server Python code (tool_server.rs): Added HTTPServer.allow_reuse_address = True
to reduce EADDRINUSE errors, and wrapped server creation in try/except to log port
binding failures before exiting cleanly.

Health check robustness (docker_sandbox.rs): Increased health check from 6x500ms (3s)
to 12x500ms (6s). Added full retry loop (up to 2 attempts) that kills stale processes,
re-writes the script, and restarts on failure. Verifies script was written correctly
by checking file size. Changed start_tool_server() to return bool indicating success,
exposed via has_tool_server() for callers.

Error handling improvements (docker_sandbox.rs): git install failure now aborts sandbox
creation (was a silent warning). git checkout failure now aborts sandbox creation
instead of silently continuing on HEAD, which would produce invalid test results.

Shell-based tool fallback (test_generator.rs): When the tool server is unavailable
or returns connection errors, tools (read_file, list_dir, grep_files, search_files,
apply_patch) now fall back to equivalent shell commands. Extracted parse_tool_response()
helper for cleaner code. This ensures the agent loop continues even if the tool server
never starts.
…ction

Add validate_file_path() checks for file/directory path arguments and
sanitize_shell_arg() escaping for pattern/glob arguments in the
shell_fallback() function. Previously, LLM-provided values containing
single quotes could break out of shell quoting and execute arbitrary
commands inside the Docker container.
@echobt echobt merged commit abb69c6 into main Feb 18, 2026
10 checks passed
@echobt echobt deleted the fix/tool-server-port-allocation-health-check branch February 18, 2026 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments