Skip to content

Python: fix(python): prevent MCP message_handler deadlock on notification reload#4866

Draft
giles17 wants to merge 2 commits intomicrosoft:mainfrom
giles17:fix/mcp-message-handler-deadlock-4828
Draft

Python: fix(python): prevent MCP message_handler deadlock on notification reload#4866
giles17 wants to merge 2 commits intomicrosoft:mainfrom
giles17:fix/mcp-message-handler-deadlock-4828

Conversation

@giles17
Copy link
Contributor

@giles17 giles17 commented Mar 23, 2026

Problem

When an MCP server (e.g. MATLAB's matlab-mcp-core-server) sends a notifications/tools/list_changed notification during or after tool execution, MCPTool.message_handler() previously awaited load_tools()/load_prompts() directly. Since the handler runs on the MCP SDK's single-threaded _receive_loop, this caused a deadlock:

  1. message_handler awaits load_tools()
  2. load_tools() calls session.list_tools(), which sends a request and waits for a response
  3. The response can never arrive because the receive loop is blocked inside the handler

This manifested as a timeout in call_tool(), which then surfaced as "Error: Function failed." to the model instead of the real tool output.

Fix

Schedule reloads as background asyncio.Tasks via a new _schedule_reload() helper method, freeing the receive loop to continue processing responses immediately. Strong references are kept in _pending_reload_tasks to prevent garbage collection before completion, and errors are logged rather than propagated into the receive loop.

Changes

  • python/packages/core/agent_framework/_mcp.py: Add _pending_reload_tasks set, _schedule_reload() helper; change message_handler to use it instead of direct await
  • python/packages/core/tests/core/test_mcp.py: Update existing notification test; add regression test for deadlock scenario; add test for reload failure logging

Testing

All 89 existing MCP tests pass (2 skipped). New tests verify:

  • message_handler returns immediately even when load_tools blocks (deadlock regression)
  • Background reload errors are logged, not raised

Fixes #4828

When an MCP server sends a notifications/tools/list_changed or
notifications/prompts/list_changed notification, the message_handler
previously awaited load_tools()/load_prompts() directly. Since the
handler runs on the MCP SDK's single-threaded receive loop, this
caused a deadlock: load_tools() sends a list_tools request and waits
for its response, but the receive loop cannot deliver that response
while blocked in the handler.

This manifested as a timeout in call_tool(), which then surfaced as
"Error: Function failed." to the model instead of the real tool
output. The MATLAB MCP server reliably triggers this because it sends
a tools/list_changed notification during tool execution.

Fix: schedule reloads as background asyncio.Tasks via a new
_schedule_reload() helper, freeing the receive loop immediately.

Fixes microsoft#4828

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 23, 2026 19:24
@github-actions github-actions bot changed the title fix(python): prevent MCP message_handler deadlock on notification reload Python: fix(python): prevent MCP message_handler deadlock on notification reload Mar 23, 2026
@markwallace-microsoft
Copy link
Member

markwallace-microsoft commented Mar 23, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/core/agent_framework
   _mcp.py5467686%96–97, 107–112, 123, 128, 171, 180, 192–193, 244, 253, 316, 324, 383, 497, 532, 545, 547–550, 569–570, 583–586, 588–589, 593, 650, 685, 687, 691–692, 694–695, 749, 764, 782, 834, 840, 861, 971, 998, 1011–1016, 1040, 1099–1100, 1106–1108, 1127, 1152–1153, 1157–1161, 1178–1182, 1329
TOTAL27318322488% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
5356 20 💤 0 ❌ 0 🔥 1m 24s ⏱️

@giles17 giles17 marked this pull request as draft March 23, 2026 19:27
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Prevents a deadlock in the Python MCP tool integration by ensuring MCP notification handling doesn’t synchronously await reload operations on the SDK’s single-threaded receive loop.

Changes:

  • Add background-task scheduling for tool/prompt reloads triggered by MCP list_changed notifications.
  • Track pending reload tasks to keep them alive until completion.
  • Update and extend MCP tests to cover the non-blocking/deadlock regression scenario and reload-failure behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
python/packages/core/agent_framework/_mcp.py Schedules reloads via background tasks instead of awaiting them in message_handler; tracks pending tasks.
python/packages/core/tests/core/test_mcp.py Updates notification tests for async scheduling; adds regression tests for deadlock scenario and reload failure handling.

Copy link
Contributor Author

@giles17 giles17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 90%

✓ Correctness

The fix correctly addresses a real deadlock where message_handler (running on the MCP SDK's single-threaded receive loop) previously awaited load_tools()/load_prompts() which themselves need the receive loop to process responses. The solution uses asyncio.create_task() to schedule reloads as background tasks, with proper strong-reference tracking via _pending_reload_tasks and a done-callback for cleanup. Exception handling in the background task prevents errors from propagating to the receive loop. Tests cover the non-blocking behavior, task lifecycle, and error logging. The implementation follows standard asyncio fire-and-forget patterns correctly.

✓ Security Reliability

The fix correctly addresses a real deadlock by scheduling reload coroutines as background tasks instead of awaiting them inside the MCP SDK's single-threaded receive loop. The implementation is sound: exceptions are caught and logged, tasks are prevented from GC via a strong-reference set, and the done-callback cleanup pattern is correct. The one reliability concern is that close() / _close_on_owner() does not cancel or await pending reload tasks before tearing down the session, which could cause reload tasks to run against a None session and produce confusing warnings or mask real errors during shutdown.

✓ Test Coverage

The test coverage for this fix is good. The PR adds three well-targeted tests: the existing notification test is updated to account for the async scheduling (with asyncio.sleep(0)), a new deadlock regression test uses an asyncio.Event to verify the handler returns immediately even when the reload blocks, and a failure-logging test confirms exceptions don't propagate. However, two minor gaps exist: the reload-failure test doesn't verify the task is cleaned up from _pending_reload_tasks after the error, and the load_prompts notification path lacks a dedicated non-blocking test (only load_tools is tested in the deadlock scenario).

✓ Design Approach

The fix correctly addresses a real deadlock: message_handler ran on the MCP SDK's single-threaded receive loop and previously awaited load_tools()/load_prompts(), which in turn called session.list_tools() — a request whose response could never arrive because the receive loop was blocked. Scheduling reloads as background tasks via asyncio.create_task() with a strong-reference set is the right approach. The tests are well-structured, especially the event-based non-blocking regression test. One design concern: if the server sends a burst of notifications/tools/list_changed notifications, multiple concurrent reload tasks are each created, all capturing existing_names as a local snapshot of self._functions at task-start time. Two concurrent tasks can race to append the same newly-advertised tool (the existing_names dedup check is local, not atomic with the append), producing duplicate entries in _functions. Additionally, coro: Any is looser than necessary; Coroutine[Any, Any, None] would let type checkers catch misuse. Neither blocks the merge — the duplicate-tool race is an edge case unlikely to cause user-visible harm, and the type annotation is minor — but both are worth a follow-up.

Suggestions

  • In test_mcp_tool_message_handler_reload_failure_is_logged, assert that _pending_reload_tasks is empty after the failed task completes (e.g. assert len(tool._pending_reload_tasks) == 0) to verify the done-callback cleanup works on the error path, consistent with the pattern in the deadlock test.
  • Tighten the coro: Any parameter type in _schedule_reload to Coroutine[Any, Any, None] (from collections.abc) so type checkers can flag accidental misuse, even though usage is internal-only.
  • Consider adding the prompts/list_changed notification to test_mcp_tool_message_handler_does_not_block_receive_loop (or a sibling test) to ensure both notification paths are covered for non-blocking behavior, not just tools/list_changed.

Automated review by giles17's agents

…cleanup, tests

- Fix exc_info=exc -> exc_info=True in _schedule_reload and message_handler
- Tighten _schedule_reload param type from Any to Coroutine[Any, Any, None]
- Coalesce reloads: cancel-and-replace per reload kind to prevent unbounded growth
- Cancel pending reload tasks in _close_on_owner before tearing down session
- Re-raise CancelledError in _safe_reload to respect task cancellation
- Replace flaky asyncio.sleep(0) with asyncio.wait_for/gather in tests
- Add caplog assertions to verify reload failure is actually logged
- Assert _pending_reload_tasks cleanup on error path

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor Author

@giles17 giles17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 91%

✗ Correctness

The core deadlock fix is correct and well-motivated: scheduling reloads as background tasks via asyncio.create_task instead of awaiting them inline in message_handler prevents the classic deadlock on the MCP SDK's single-threaded receive loop. The cancel-and-replace logic, strong-reference set, and done-callback cleanup are all sound. Tests are comprehensive. However, the exc_info=messageexc_info=True change on the error-logging line is a regression: because message_handler is NOT called from within an except block (the MCP SDK passes the exception as a parameter at lines 358 and 516 of shared/session.py), sys.exc_info() returns (None, None, None) and the exception type/traceback info is lost. The original exc_info=message correctly lets Python's logging extract (type(message), message, message.__traceback__) from the exception instance directly.

✗ Security Reliability

The core fix — scheduling MCP reloads as background tasks via asyncio.create_task instead of awaiting them inline in message_handler — is correct and addresses a real deadlock. However, the change from exc_info=message to exc_info=True on the error-logging line is a regression: message_handler is not inside an except block, so sys.exc_info() (used when exc_info=True) returns (None, None, None), silently discarding the exception's traceback. The original exc_info=message correctly passes the exception instance, which Python's logging (3.5+) converts to a full traceback tuple. Additionally, _close_on_owner cancels pending reload tasks but does not await them, which can produce 'Task was destroyed but it is pending' warnings and allows cancelled tasks to race against session teardown.

✓ Test Coverage

The PR adds three new/updated tests covering the core deadlock fix: non-blocking behavior of message_handler, background task scheduling, and reload-failure logging. These tests are well-structured and meaningfully validate the primary fix. However, the new cancel-and-replace logic in _schedule_reload (which cancels a previous pending reload of the same kind when a new notification arrives) and the pending-task cleanup in _close_on_owner both lack dedicated test coverage. These are behavioral additions that could harbor bugs (e.g., incorrect task name comparison) without any test exercising them.

✗ Design Approach

The core fix — scheduling reloads as background tasks via asyncio.create_task to avoid deadlocking the MCP receive loop — is correct and well-motivated. The cancel-and-replace deduplication, the _pending_reload_tasks strong-reference set, the explicit CancelledError re-raise, and the teardown cleanup in _close_on_owner are all sound improvements. However, the diff introduces one clear regression: changing exc_info=message to exc_info=True in the exception-logging branch of message_handler. Since that code is not inside an except block, exc_info=True resolves to sys.exc_info() which returns (None, None, None) — silently dropping the exception traceback. The original exc_info=message correctly passes the exception object directly to the logger, which Python 3.x logging accepts and uses to attach the traceback. This should be reverted.

Flagged Issues

  • Changing exc_info=message to exc_info=True on the error-logging line (line 803) loses exception information. message_handler is called by the MCP SDK with the exception as a parameter—NOT from within an except block (see mcp/shared/session.py lines 358, 516)—so sys.exc_info() returns (None, None, None) and no traceback is ever logged. The original exc_info=message was correct: Python 3.x logging accepts an exception instance and extracts (type, value, __traceback__) from it. Revert this line.

Suggestions

  • In _close_on_owner, await the cancelled reload tasks (e.g. await asyncio.gather(*tasks, return_exceptions=True)) before proceeding to session teardown. Without this, cancelled tasks may still be in-flight when self.session is set to None, producing 'Task was destroyed but it is pending' warnings and a subtle race condition.
  • Add a test for the cancel-and-replace behavior: send two rapid notifications/tools/list_changed notifications with a blocking load_tools (via asyncio.Event), verify the first task is cancelled and only one task remains pending. This exercises the existing.cancel() path and the task-naming scheme in _schedule_reload.
  • Add a test for _close_on_owner cleanup: create an MCPTool with a pending reload task, call _close_on_owner, and verify the task is cancelled and _pending_reload_tasks is empty. The existing test mocks the entire method, so the new cancellation code has no coverage.
  • Ensure the caplog-based version of test_mcp_tool_message_handler_reload_failure_is_logged is what ships. The on-disk version without caplog only checks the mock was called and doesn't assert the warning was actually logged, so a regression that silently swallows errors would go undetected.

Automated review by giles17's agents

"""
if isinstance(message, Exception):
logger.error("Error from MCP server: %s", message, exc_info=message)
logger.error("Error from MCP server: %s", message, exc_info=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: message_handler is called by the MCP SDK with the exception as a parameter—NOT from within an except block (see mcp/shared/session.py:359 and :516). With exc_info=True, sys.exc_info() returns (None, None, None), so the exception type and traceback are silently lost. The original exc_info=message correctly passes the exception instance, letting Python's logging extract the full traceback directly.

Suggested change
logger.error("Error from MCP server: %s", message, exc_info=True)
logger.error("Error from MCP server: %s", message, exc_info=message)

Comment on lines 968 to +972
async def _close_on_owner(self) -> None:
# Cancel any pending reload tasks before tearing down the session.
for task in list(self._pending_reload_tasks):
task.cancel()
self._pending_reload_tasks.clear()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reliability concern: task.cancel() only requests cancellation. Without awaiting the tasks, they may still be in-flight when _safe_close_exit_stack() nullifies self.session, and Python may emit 'Task was destroyed but it is pending' warnings. Consider gathering the cancelled tasks before proceeding.

Suggested change
async def _close_on_owner(self) -> None:
# Cancel any pending reload tasks before tearing down the session.
for task in list(self._pending_reload_tasks):
task.cancel()
self._pending_reload_tasks.clear()
tasks = list(self._pending_reload_tasks)
for task in tasks:
task.cancel()
self._pending_reload_tasks.clear()
if tasks:
await asyncio.gather(*tasks, return_exceptions=True)

assert reload_warnings[0].levelname == "WARNING"
assert reload_warnings[0].exc_info is not None


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a companion test for the cancel-and-replace logic: send two notifications/tools/list_changed in quick succession with a blocking load_tools (via asyncio.Event), and assert that the first task gets cancelled and only one task remains in _pending_reload_tasks. This is the only way to verify the existing.cancel() path and the task-naming scheme work correctly.

assert len(tool._pending_reload_tasks) == 0


async def test_mcp_tool_message_handler_reload_failure_is_logged(caplog: pytest.LogCaptureFixture):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test verifies the exception doesn't propagate, but (in its current on-disk form without caplog) doesn't assert the warning was actually logged. The diff version with caplog and log-record assertions is a meaningful improvement—make sure that version is what ships. Without the log assertion, a regression that silently swallows errors without logging would go undetected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: [Bug]: MCPStdioTool drops successful MATLAB MCP results and sends "Error: Function failed." instead

3 participants