Skip to content

fix(agent): widen MCP client-session timeout for cold first runs#338

Open
harshalmore31 wants to merge 1 commit into
IBM:mainfrom
harshalmore31:fix/335-mcp-init-timeout
Open

fix(agent): widen MCP client-session timeout for cold first runs#338
harshalmore31 wants to merge 1 commit into
IBM:mainfrom
harshalmore31:fix/335-mcp-init-timeout

Conversation

@harshalmore31
Copy link
Copy Markdown

Closes #335.

Summary

  • openai-agent runner now passes client_session_timeout_seconds=30 to MCPServerStdio(...) (was the SDK default of 5s).
  • deep-agent runner now passes session_kwargs={"read_timeout_seconds": timedelta(seconds=30)} per connection.
  • Each runner defines a module-level _MCP_CLIENT_TIMEOUT_SECONDS = 30 constant alongside _DEFAULT_MODEL.
  • claude-agent is intentionally untouched — its Python-side initialize timeout already defaults to 60s, and the stdio MCP servers it spawns are managed by the bundled Claude Code CLI subprocess (no Python-side knob).

Why 30s

Cold-cache first-spawn timings measured on a freshly synced env: vibration server ~12s (one-time, scipy + numpy + DSP submodule imports). Warm handshakes across all six servers: 0.26s–0.89s. 30s leaves substantial margin without making a genuinely misconfigured server feel slow to fail.

Test plan

  • uv run pytest src/agent/ -k "not integration" — 132/132 pass.
  • Manual smoke: uv run openai-agent --model-id gpt-4.1 "What is the current date and time?" — all 6 servers spawn cleanly, agent returns the expected answer.
  • Maintainer to confirm the timeout choice / constant location matches repo conventions.

The openai-agent and deep-agent runners left MCP stdio client-session
timeouts at the SDK / library defaults, so a heavy server's cold-cache
first spawn (e.g. vibration importing scipy + numpy + DSP submodules)
could exceed the budget and abort the entire agent run with
McpError: Timed out ... Waited 5.0 seconds during session.initialize().

Plumb a 30s timeout through both runners.  Warm handshakes are
sub-second across all six servers so there is no operational downside.

claude-agent is unaffected: its Python-side initialize timeout already
defaults to 60s, and the stdio MCP servers it spawns are managed by
the bundled Claude Code CLI subprocess.

Closes IBM#335.

Signed-off-by: harshalmore31 <harshalmore2468@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cold-start MCP-init timeout aborts agent run on fresh install

2 participants