Skip to content

fix(stdio): resolve BrokenResourceError race condition on quick exit#2079

Open
gspeter-max wants to merge 4 commits intomodelcontextprotocol:mainfrom
gspeter-max:peter_contribution
Open

fix(stdio): resolve BrokenResourceError race condition on quick exit#2079
gspeter-max wants to merge 4 commits intomodelcontextprotocol:mainfrom
gspeter-max:peter_contribution

Conversation

@gspeter-max
Copy link

ROOT CAUSE:
The stdio_client async context manager had a race condition when exiting quickly (before subprocess finished outputting data). The cleanup code in the finally block closed memory streams while background tasks (stdout_reader and stdin_writer) were still using them, resulting in BrokenResourceError.

Timeline of the bug:

  1. User code exits the async with stdio_client(...) context
  2. The finally block executes
  3. Streams are closed immediately
  4. Background tasks are still running and trying to send/receive data
  5. Tasks encounter closed streams → BrokenResourceError

CHANGES:

  1. Added BrokenResourceError to exception handlers in both client and server

    • src/mcp/client/stdio.py: Lines 161, 177 (stdout_reader, stdin_writer)
    • src/mcp/server/stdio.py: Lines 67, 77 (stdin_reader, stdout_writer)
    • This allows tasks to exit gracefully if streams close during operation
  2. Added task cancellation before stream closure in client transport

    • src/mcp/client/stdio.py: Line 210 (after process cleanup)
    • tg.cancel_scope.cancel() sends cancellation signal to background tasks
    • Tasks receive signal and finish their current operation
    • Then streams are closed (tasks aren't using them anymore)
  3. Added regression test

    • tests/client/test_stdio.py: test_stdio_client_quick_exit_race_condition
    • Verifies that quick context exits don't cause ExceptionGroup crashes

IMPACT:

  • No more ExceptionGroup crashes when exiting quickly
  • Graceful task shutdown with proper cancellation
  • Backward compatible - all existing tests pass
  • Better resource cleanup - tasks finish before streams close

TECHNICAL NOTES:

  • Server transport only needed exception handler changes (not task cancellation) because it doesn't manage subprocess lifecycle
  • The fix uses defense-in-depth: both proper coordination AND graceful handling
  • anyio.BrokenResourceError is raised when operations are attempted on closed resources, distinct from ClosedResourceError (resource already closed)

FILES MODIFIED:

  • src/mcp/client/stdio.py
  • src/mcp/server/stdio.py
  • tests/client/test_stdio.py

Motivation and Context

How Has This Been Tested?

Breaking Changes

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

ROOT CAUSE:
The stdio_client async context manager had a race condition when exiting
quickly (before subprocess finished outputting data). The cleanup code in
the finally block closed memory streams while background tasks (stdout_reader
and stdin_writer) were still using them, resulting in BrokenResourceError.

Timeline of the bug:
1. User code exits the async with stdio_client(...) context
2. The finally block executes
3. Streams are closed immediately
4. Background tasks are still running and trying to send/receive data
5. Tasks encounter closed streams → BrokenResourceError

CHANGES:
1. Added BrokenResourceError to exception handlers in both client and server
   - src/mcp/client/stdio.py: Lines 161, 177 (stdout_reader, stdin_writer)
   - src/mcp/server/stdio.py: Lines 67, 77 (stdin_reader, stdout_writer)
   - This allows tasks to exit gracefully if streams close during operation

2. Added task cancellation before stream closure in client transport
   - src/mcp/client/stdio.py: Line 210 (after process cleanup)
   - tg.cancel_scope.cancel() sends cancellation signal to background tasks
   - Tasks receive signal and finish their current operation
   - Then streams are closed (tasks aren't using them anymore)

3. Added regression test
   - tests/client/test_stdio.py: test_stdio_client_quick_exit_race_condition
   - Verifies that quick context exits don't cause ExceptionGroup crashes

IMPACT:
- No more ExceptionGroup crashes when exiting quickly
- Graceful task shutdown with proper cancellation
- Backward compatible - all existing tests pass
- Better resource cleanup - tasks finish before streams close

TECHNICAL NOTES:
- Server transport only needed exception handler changes (not task cancellation)
  because it doesn't manage subprocess lifecycle
- The fix uses defense-in-depth: both proper coordination AND graceful handling
- anyio.BrokenResourceError is raised when operations are attempted on closed
  resources, distinct from ClosedResourceError (resource already closed)

FILES MODIFIED:
- src/mcp/client/stdio.py
- src/mcp/server/stdio.py
- tests/client/test_stdio.py
Pyright reports error when variables are assigned but never used.
Changed (read_stream, write_stream) to (_, _) to indicate these are
intentionally unused in the race condition test.
ROOT CAUSE:
Manual tg.cancel_scope.cancel() was interfering with process cleanup in the
async with block, causing CancelledError and ProcessLookupError during
process termination.

CHANGES:
- Removed tg.cancel_scope.cancel() call from finally block
- The async with block already handles task cancellation when exiting

IMPACT:
- Fixes test_stdio_client_sigint_only_process failure
- Process cleanup now completes without interference
- Background tasks still properly cancelled by task group exit

FILES MODIFIED:
- src/mcp/client/stdio.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments