Skip to content

Fix concurrency leak during enforceContextLimit and during stall retry#445

Open
sirn wants to merge 2 commits into
mcowger:mainfrom
sirn:fix-context-limit-leak
Open

Fix concurrency leak during enforceContextLimit and during stall retry#445
sirn wants to merge 2 commits into
mcowger:mainfrom
sirn:fix-context-limit-leak

Conversation

@sirn
Copy link
Copy Markdown
Contributor

@sirn sirn commented May 20, 2026

This PR fixes two concurrency leaks.

enforceContextLimit leaks concurrency slot

When "Enforce Limits" is on for an alias, an API call that hit that limit will cause concurrently to be permanently leaked.

Steps to reproduce

  1. Configure an alias with enforce_limit: true with a small context_length (e.g. 100).
  2. Send a request to that alias that exceed context_length
  3. Gets context_length_exceeded and concurrency permanently increased by 1

Root cause

In dispatcher.ts, acquire() was called before context length check (enforceContextLimit). When the check fails, it throws ContextLengthExceededError and never fall through to the path where doRelease is called. This caused the concurrency count to get permanently leaked.

Fix

Move enforceContextLimit before acquire() so context length.

TTFB stall timeout leaks concurrency slot

When TTFB stall timeout is configured with multiple targets, and both of them happen to fail, the first provider's concurrency count will be permanently leaked.

Steps to reproduce

  1. Configure global stall detection with some value of TTFB timeout and TTFB byte threshold (e.g. 5s timeout/50B threshold)
  2. Configure an alias with two providers, both pointing to a slow provider that takes longer than TTFB timeout
  3. Send a request to that alias with "stream": true.
  4. Concurrency counter for the first provider permanently increased by 1 for each failed requests
$ cat <<EOF | tee slow-proxy.ts
Bun.serve({
    port: 9876,
    async fetch(req) {
        await Bun.sleep(6000);
        return new Response("ok", { status: 200 });
    },
});

console.log("Slow mock on :9876");
EOF

$ bun run slow-proxy.ts
Slow mock on :9876

# Configure Plexus per instructions above, then:
$ curl -H "Authorization: Bearer sk-78f8aff4-ab34-4ebe-98b0-876495443a34" -H "Content-Type: application/json" -d '{"model":"test","messages":[{"role": "user","content": "hi"}],"stream":true}' http://localhost:4000/v1/chat/completions

Root cause

After TTFB stall detection aborts the fetch and failover is configured (canRetryStall is true), it skips throw and calls continue. Since doRelease() is only done in the outer catch block, this means that doRelease() was never gets called except for the last provider. This caused the concurrency count to get permanently leaked.

Fix

Add doRelease() in the inner catch block too.

sirn added 2 commits May 20, 2026 18:17
Pre-dispatch context limit enforcement now runs before the
concurrency slot is acquired. Previously, if enforceContextLimit
threw a ContextLengthExceededError, the acquired slot was never
released — leaking exactly one slot per oversized request until
the provider deadlocked at maxConcurrency.
When the TTFB stall detection aborts the fetch request, the catch
block either continues the failover loop or throws the stall error
without calling doRelease(). This leaks the concurrency slot
permanently. Each stalled request increments the count by one,
eventually deadlocking the provider at maxConcurrency.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants