Skip to content

Fix litellm connection pool limiting concurrent_requests#1190

Open
sihyeonn wants to merge 1 commit intohuggingface:mainfrom
sihyeonn:fix/litellm-concurrent-requests-pool-limit
Open

Fix litellm connection pool limiting concurrent_requests#1190
sihyeonn wants to merge 1 commit intohuggingface:mainfrom
sihyeonn:fix/litellm-concurrent-requests-pool-limit

Conversation

@sihyeonn
Copy link
Copy Markdown

Problem

Setting concurrent_requests above 100 (e.g. 128) doesn't actually increase parallelism to the vLLM server — requests get bottlenecked at 100.

The root cause is httpx's default max_connections=100 on its connection pool. Since litellm uses httpx internally for litellm.completion(), the ThreadPoolExecutor happily spawns 128 threads but they all block waiting for one of the 100 available connections.

Fix

Set litellm.client_session with an httpx.Client whose pool limits match config.concurrent_requests. This is litellm's officially documented approach for providing a custom HTTP session.

No new dependencies — httpx is already a transitive dep of litellm.

Testing

Verified locally against a vLLM server with concurrent_requests: 128; active connections now exceed 100 as expected. Default config (concurrent_requests: 10) works as before.

…efault pool limit

Configure litellm's global HTTP client session with connection pool limits
matching the user-specified concurrent_requests value, bypassing the default
httpx max_connections=100 cap.

Fixes huggingface#1100

Signed-off-by: Sihyeon Jang <sihyeon.jang@navercorp.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant