FIX: Release GIL during blocking ODBC connect/disconnect calls#497
Open
saurabh500 wants to merge 5 commits intomainfrom
Open
FIX: Release GIL during blocking ODBC connect/disconnect calls#497saurabh500 wants to merge 5 commits intomainfrom
saurabh500 wants to merge 5 commits intomainfrom
Conversation
Release the Python GIL during SQLDriverConnect and SQLDisconnect calls so that other Python threads can run concurrently during connection I/O (DNS resolution, TCP handshake, TLS negotiation, SQL Server auth). Changes: - connection.cpp: Wrap SQLDriverConnect_ptr in py::gil_scoped_release - connection.cpp: Conditionally release GIL around SQLDisconnect_ptr (guarded by PyGILState_Check for destructor safety) - connection.cpp: Remove disconnect() from reset() failure paths to avoid GIL release inside pool mutex; callers handle cleanup - connection_pool.cpp: Move connect() outside pool mutex in acquire() (reserve slot inside mutex, connect outside, rollback on failure) - connection_pool.cpp: Move disconnect() outside mutex in release() - connection_pool.cpp: Release _manager_mutex before calling pool acquire/release/close to prevent mutex/GIL lock-ordering deadlocks - Add test_021_concurrent_connection_perf.py with stress tests proving concurrent connection parallelism and no GIL starvation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
📊 Code Coverage Report
Diff CoverageDiff: main...HEAD, staged and unstaged changes
Summary
mssql_python/pybind/connection/connection.cppLines 156-165 156 py::gil_scoped_release release;
157 ret = SQLDisconnect_ptr(_dbcHandle->get());
158 } else {
159 // Destructor / shutdown path — GIL is not held, call directly.
! 160 ret = SQLDisconnect_ptr(_dbcHandle->get());
! 161 }
162 // In destructor/shutdown paths, suppress errors to avoid
163 // std::terminate() if this throws during stack unwinding.
164 if (hasGil) {
165 checkError(ret);Lines 163-172 163 // std::terminate() if this throws during stack unwinding.
164 if (hasGil) {
165 checkError(ret);
166 } else if (!SQL_SUCCEEDED(ret)) {
! 167 LOG("SQLDisconnect failed in destructor/shutdown path; ignoring error");
! 168 }
169 // triggers SQLFreeHandle via destructor, if last owner
170 _dbcHandle.reset();
171 } else {
172 LOG("No connection handle to disconnect");mssql_python/pybind/connection/connection_pool.cppLines 103-122 103 if (_pool.size() < _max_size) {
104 conn->updateLastUsed();
105 _pool.push_back(conn);
106 } else {
! 107 should_disconnect = true;
! 108 }
109 }
110 // Disconnect outside the mutex to avoid holding it during the
111 // blocking ODBC call (which releases the GIL).
112 if (should_disconnect) {
! 113 try {
! 114 conn->disconnect();
! 115 } catch (const std::exception& ex) {
! 116 LOG("ConnectionPool::release: disconnect failed: %s", ex.what());
! 117 }
! 118 std::lock_guard<std::mutex> lock(_mutex);
119 if (_current_size > 0)
120 --_current_size;
121 }
122 }📋 Files Needing Attention📉 Files with overall lowest coverage (click to expand)mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.pybind.ddbc_bindings.h: 67.8%
mssql_python.row.py: 70.5%
mssql_python.pybind.logger_bridge.hpp: 70.8%
mssql_python.pybind.ddbc_bindings.cpp: 73.9%
mssql_python.pybind.connection.connection.cpp: 75.6%
mssql_python.__init__.py: 77.3%
mssql_python.ddbc_bindings.py: 79.6%
mssql_python.pybind.connection.connection_pool.cpp: 79.6%
mssql_python.connection.py: 85.2%🔗 Quick Links
|
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add test_pool_release_overflow_disconnects_outside_mutex to exercise the ConnectionPool::release() overflow path where a connection is returned to a pool that is already at max_size. This triggers the should_disconnect branch that disconnects outside the mutex. The disconnect-without-GIL path (connection.cpp lines 160-161) is a C++ destructor safety guard only reachable during interpreter shutdown when the last shared_ptr drops without the GIL held. This cannot be reliably exercised from a Python test since __del__/GC always holds the GIL. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
004d32d to
83995e4
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR improves multi-threaded throughput by releasing the Python GIL around the blocking ODBC connect (SQLDriverConnect) and disconnect (SQLDisconnect) calls, and restructures connection pooling/manager locking to avoid mutex↔GIL deadlocks during those operations.
Changes:
- Release the GIL during
Connection::connect()and (conditionally) duringConnection::disconnect(). - Restructure
ConnectionPool/ConnectionPoolManagerto avoid holding C++ mutexes across connect/disconnect operations. - Add new stress/performance tests for concurrent connect/disconnect behavior, plus a pooling regression test.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
mssql_python/pybind/connection/connection.cpp |
Releases GIL during ODBC connect/disconnect; adjusts reset failure handling. |
mssql_python/pybind/connection/connection_pool.cpp |
Moves connect/disconnect work outside pool/manager mutexes to prevent GIL/mutex deadlocks. |
tests/test_021_concurrent_connection_perf.py |
Adds stress tests to validate parallelism and absence of starvation/deadlocks under load. |
tests/test_009_pooling.py |
Adds a test intended to cover pool release overflow/disconnect behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- connection.cpp: Suppress checkError() in destructor/shutdown path to avoid std::terminate() during stack unwinding - connection_pool.cpp release(): Move _current_size decrement after successful disconnect to keep pool accounting consistent on failure - connection_pool.cpp closePools(): Keep _manager_mutex held during close to prevent races with new pool creation - test_009_pooling.py: Fix overflow test to use max_size=2 for acquisition then shrink to max_size=1 before release, matching actual pool semantics where max_size limits concurrent connections Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Disconnect on localhost is sub-millisecond (~0.3ms), so thread scheduling overhead dominates and speedup ratios are meaningless. Changed the test to verify all concurrent disconnects complete without errors or deadlocks instead of asserting a speedup threshold. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The Python GIL is held during the entire ODBC connection establishment (
SQLDriverConnect) and teardown (SQLDisconnect) calls. These are blocking I/O operations involving DNS resolution, TCP handshake, TLS negotiation, and SQL Server authentication that can take hundreds of milliseconds to seconds. While the GIL is held, all other Python threads are blocked, even those doing unrelated work.This causes severe throughput degradation in multi-threaded applications that establish connections concurrently.
Relates to #491
Benchmark Results
A/B benchmark with 10 threads concurrently establishing connections (pooling disabled, SQL Server 2025 on localhost):
Before this change, 10 concurrent threads take the same time as 10 serial connections — the GIL completely serializes them. After, threads overlap their I/O and achieve ~6× speedup.
Raw benchmark output (click to expand)
Before (GIL held during ODBC calls):
After (GIL released during ODBC calls):
Solution
1. GIL Release in
connection.cppConnection::connect(): WrapSQLDriverConnect_ptrin apy::gil_scoped_releaseblock. Allpy::dictaccess (applyAttrsBefore) completes before the release scope, andcheckError()runs after GIL is re-acquired.Connection::disconnect(): Conditionally release GIL aroundSQLDisconnect_ptr, guarded byPyGILState_Check()to safely handle destructor/shutdown paths where the GIL may not be held.Connection::reset(): Removed internaldisconnect()calls from failure paths. The pool caller already adds failed connections toto_disconnectfor cleanup outside the mutex, so the internal disconnect was both redundant and problematic (it would release the GIL while the pool mutex was held).2. Pool Lock Restructuring in
connection_pool.cppReleasing the GIL inside ODBC calls while holding C++ mutexes creates a mutex/GIL lock-ordering deadlock. The pool changes are a necessary prerequisite for the GIL release to be safe.
Why the pool changes are required
Without restructuring, we get this deadlock:
How the fix works —
ConnectionPool::acquire()The solution separates mutex-protected bookkeeping from blocking ODBC calls:
If
connect()fails, the reserved slot is rolled back:The same pattern applies to all pool/manager methods
Pool::acquire()connect()inside_mutexconnect()outsidePool::release()disconnect()inside_mutexdisconnect()outsideManager::acquireConnection()pool->acquire()inside_manager_mutexacquire()outsideManager::returnConnection()pool->release()inside_manager_mutexrelease()outsideManager::closePools()pool->close()inside_manager_mutexclose()outsideThe invariant is: never hold a C++ mutex across a scope that releases the GIL.
Testing
Added
test_021_concurrent_connection_perf.pywith three@pytest.mark.stresstests:test_concurrent_connection_gil_release— 10 threads establish connections concurrently (pooling disabled); asserts >2× speedup over serial baseline, proving I/O parallelism.test_concurrent_disconnect_gil_release— 10 threads disconnect concurrently; asserts >1.5× speedup.test_mixed_connect_disconnect_under_load— I/O threads (connect/disconnect) run alongside CPU-bound Python threads for 5 seconds; verifies no starvation or deadlocks.Added
test_pool_release_overflow_disconnects_outside_mutexintest_009_pooling.pyto cover the pool release overflow path.Added
benchmarks/bench_concurrent_connect.py— standalone A/B benchmark script for reproducing the results above.Safety
LOG()macro acquires GIL internally viapy::gil_scoped_acquire— safe to call from any contextcheckError()is pure C++ (throwsstd::runtime_error) — caught by pybind11 at the boundary where GIL is heldPyGILState_Check()guard indisconnect()handles destructor paths safely