Failure
Right after a fresh uv sync, the very first openai-agent run aborted with:
mcp.shared.exceptions.McpError: Timed out while waiting for response to ClientRequest. Waited 5.0 seconds.
The traceback ends in agents/mcp/server.py:726 → connect → session.initialize. One of the six MCP servers spawned via uv run <name>-mcp-server exceeded the 5s handshake window during cold start, and the whole run died.
Only reproducible on the genuinely first invocation on a cold machine — warm runs are fine. After uv sync, the OS page cache is empty and the first uv run <server> pays both uv run setup cost and a full import cost for that server's heavy modules. I measured a one-time cold start of ~12s for the vibration server in isolation (scipy + numpy + DSP submodules), down to 0.89s once warm. Warm handshake latencies across all six servers were 0.26–0.89s.
Root cause
The runners never set the MCP client-session timeout, so the SDK default of 5s bites on cold start:
src/agent/openai_agent/runner.py:91 — MCPServerStdio(...) constructed without client_session_timeout_seconds. SDK default is 5 (openai-agents agents/mcp/server.py:1055, 1176, 1311).
src/agent/deep_agent/runner.py:199 — MultiServerMCPClient(connections) constructed without session_kwargs, so ClientSession(read_timeout_seconds=...) falls back to the library default.
src/agent/claude_agent/runner.py:65-72 — McpStdioServerConfig dicts have no timeout knob in the TypedDict surface; behaviour is internal to claude-agent-sdk and didn't crash in my testing, but the same pattern of leaving SDK defaults unset applies.
This is the worst possible moment for a failure — the user's first run on a fresh clone. Cold-cache flakiness on a one-line knob isn't a real bug story.
Proposed solution
Plumb a generous client-session timeout through all three SDK runners. ~30s is fine for cold-start; warm cases are well under 1s so there's no operational downside.
Concretely:
openai-agent: pass client_session_timeout_seconds=30 to MCPServerStdio(...).
deep-agent: pass session_kwargs={"read_timeout_seconds": timedelta(seconds=30)} per connection in _build_mcp_connections.
claude-agent: best handled via whatever McpStdioServerConfig extension or session-level config the SDK exposes — needs a quick look at the SDK surface.
Optionally expose the value as an env var (MCP_INIT_TIMEOUT_SECONDS) or a --mcp-timeout CLI flag for users on slow machines.
Happy to send a PR.
Failure
Right after a fresh
uv sync, the very firstopenai-agentrun aborted with:The traceback ends in
agents/mcp/server.py:726 → connect → session.initialize. One of the six MCP servers spawned viauv run <name>-mcp-serverexceeded the 5s handshake window during cold start, and the whole run died.Only reproducible on the genuinely first invocation on a cold machine — warm runs are fine. After
uv sync, the OS page cache is empty and the firstuv run <server>pays bothuv runsetup cost and a full import cost for that server's heavy modules. I measured a one-time cold start of ~12s for thevibrationserver in isolation (scipy + numpy + DSP submodules), down to 0.89s once warm. Warm handshake latencies across all six servers were 0.26–0.89s.Root cause
The runners never set the MCP client-session timeout, so the SDK default of 5s bites on cold start:
src/agent/openai_agent/runner.py:91—MCPServerStdio(...)constructed withoutclient_session_timeout_seconds. SDK default is 5 (openai-agentsagents/mcp/server.py:1055, 1176, 1311).src/agent/deep_agent/runner.py:199—MultiServerMCPClient(connections)constructed withoutsession_kwargs, soClientSession(read_timeout_seconds=...)falls back to the library default.src/agent/claude_agent/runner.py:65-72—McpStdioServerConfigdicts have no timeout knob in the TypedDict surface; behaviour is internal to claude-agent-sdk and didn't crash in my testing, but the same pattern of leaving SDK defaults unset applies.This is the worst possible moment for a failure — the user's first run on a fresh clone. Cold-cache flakiness on a one-line knob isn't a real bug story.
Proposed solution
Plumb a generous client-session timeout through all three SDK runners. ~30s is fine for cold-start; warm cases are well under 1s so there's no operational downside.
Concretely:
openai-agent: passclient_session_timeout_seconds=30toMCPServerStdio(...).deep-agent: passsession_kwargs={"read_timeout_seconds": timedelta(seconds=30)}per connection in_build_mcp_connections.claude-agent: best handled via whateverMcpStdioServerConfigextension or session-level config the SDK exposes — needs a quick look at the SDK surface.Optionally expose the value as an env var (
MCP_INIT_TIMEOUT_SECONDS) or a--mcp-timeoutCLI flag for users on slow machines.Happy to send a PR.