Skip to content

fix: detect and kill zombie connections where response write stalls#392

Merged
polaz merged 9 commits intomainfrom
fix/#391-fix-detect-and-kill-zombie-connections-where-respo
Apr 5, 2026
Merged

fix: detect and kill zombie connections where response write stalls#392
polaz merged 9 commits intomainfrom
fix/#391-fix-detect-and-kill-zombie-connections-where-respo

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented Apr 5, 2026

Summary

  • Add response write timeout middleware that destroys sockets when HTTP response writes stall due to dead TCP peers (laptop sleep, network change)
  • Configurable via GITLAB_RESPONSE_WRITE_TIMEOUT_MS (default 10s, set 0 to disable)
  • SSE streams excluded — handled by existing heartbeat mechanism

Problem

POST responses hang for ~125s (TCP retransmit timeout) when the downstream client disconnects mid-flight. GitLab responds in <200ms but res.write()/res.end() blocks until TCP gives up retransmitting to the dead peer. The server accumulates hanging requests, each holding a Node event loop callback for 125s.

Solution

Express middleware intercepts writeHead() and starts a timer. If the response finish event doesn't fire within 10s, the socket is destroyed. The write_timeout close reason is logged for observability.

Test plan

  • 8 unit tests covering: timeout destroys socket, finish/close clears timer, SSE excluded, disabled when 0, no double-destroy, writeTimedOut flag, multiple writeHead
  • Existing 4883 tests pass
  • Lint clean, build clean

Closes #391

Add response write timeout middleware that destroys sockets when HTTP
response writes stall due to dead TCP peers. When a client disconnects
mid-flight (laptop sleep, network change), the TCP connection stays
half-open and res.write()/res.end() blocks for ~125s until TCP gives up.

The middleware intercepts writeHead() and starts a timer (default 10s).
If the response doesn't finish within the timeout, the socket is
destroyed. SSE streams are excluded (handled by existing heartbeat).

- Add GITLAB_RESPONSE_WRITE_TIMEOUT_MS config (default 10s)
- Add responseWriteTimeoutMiddleware to Express pipeline
- Add 'write_timeout' to ConnectionCloseReason for logging
- Set res.locals.writeTimedOut flag for close handler reason detection

Closes #391
Copilot AI review requested due to automatic review settings April 5, 2026 15:07
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 5, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 225eb492-4fd1-42e2-9c96-850e676dc6c0

📥 Commits

Reviewing files that changed from the base of the PR and between c0f8106 and b334cc1.

📒 Files selected for processing (2)
  • src/middleware/response-write-timeout.ts
  • src/server.ts

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • New environment variable GITLAB_RESPONSE_WRITE_TIMEOUT_MS (default 10000ms) to detect and close stalled non-SSE responses; set to 0 to disable. SSE responses are excluded.
  • Documentation

    • README updated with the new connection-resilience option and behavior.
  • Tests

    • Added tests covering response write timeout behavior and related edge cases.

Walkthrough

Adds a configurable response-write timeout (env GITLAB_RESPONSE_WRITE_TIMEOUT_MS, default 10000ms). New Express middleware starts a one-time timer after headers are written (skips SSE); on timeout it marks res.locals.writeTimedOut, logs a 'write_timeout' warning and destroys the response. Config, types, server wiring, docs, and tests updated.

Changes

Cohort / File(s) Summary
Documentation
README.md, README.md.in
Documented new GITLAB_RESPONSE_WRITE_TIMEOUT_MS env var (default 10000) controlling max time to flush non-SSE responses before terminating stalled connections; 0 disables.
Configuration
src/config.ts
Added exported RESPONSE_WRITE_TIMEOUT_MS (default 10000) and extended parseTimerMs(..., allowZero) so 0 can be accepted to disable the timeout.
Logging types
src/logging/types.ts
Extended ConnectionCloseReason union with new literal 'write_timeout'.
Middleware export surface
src/middleware/index.ts
Re-exported responseWriteTimeoutMiddleware.
Response write timeout middleware
src/middleware/response-write-timeout.ts
New middleware that monkey-patches res.writeHead to start a single timeout for non-SSE responses; on timeout sets res.locals.writeTimedOut=true, logs a warning with { reason: 'write_timeout' } and metadata, calls res.destroy(), and clears the timer on finish/close.
Server integration
src/server.ts
Wires middleware into startup (registered before express.json() in dual mode), centralizes close-reason resolution via resolveCloseReason(...), treats write_timeout with precedence over heartbeat failure, and detects res.locals.writeTimedOut on res.on('close') to close tracked request stacks with 'write_timeout'.
Unit tests
tests/unit/middleware/response-write-timeout.test.ts, tests/unit/server.test.ts
Added comprehensive middleware tests (timeout firing, logging, SSE exclusion, disabling via 0, cleanup, single-timer semantics) and updated server tests to mock RESPONSE_WRITE_TIMEOUT_MS and adjust middleware probing logic.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client (MCP)
  participant Server as Express Server
  participant Logger as Logger

  Client->>Server: send HTTP request
  Server->>Server: handler runs\nheaders written (res.writeHead)
  Server->>Server: start write timeout timer (N ms)
  alt response finishes before timeout
    Server->>Server: res.finish / res.close\nclear timer
  else timeout fires
    Server->>Logger: logWarn({ reason: "write_timeout", timeoutMs, method, path, session })
    Server->>Server: set res.locals.writeTimedOut = true
    Server->>Client: destroy socket (res.destroy)
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 70.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately and clearly describes the main change—adding zombie connection detection and termination for stalled response writes.
Description check ✅ Passed The PR description is directly related to the changeset, explaining the problem, solution, and testing approach for the response write timeout feature.
Linked Issues check ✅ Passed The PR fulfills all primary coding objectives from issue #391: implements configurable response write timeout middleware, excludes SSE, destroys stalled connections, and includes comprehensive tests.
Out of Scope Changes check ✅ Passed All changes are directly in scope: response write timeout middleware, config variables, types, tests, and server integration. No unrelated changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/#391-fix-detect-and-kill-zombie-connections-where-respo

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 5, 2026

Test Coverage Report

Overall Coverage: 96.49%

Metric Percentage
Statements 95.86%
Branches 86.98%
Functions 94.89%
Lines 96.49%

View detailed coverage report

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 5, 2026

Codecov Report

❌ Patch coverage is 79.62963% with 11 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/middleware/response-write-timeout.ts 82.85% 1 Missing and 5 partials ⚠️
src/server.ts 66.66% 3 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an Express middleware to detect stalled HTTP response writes (zombie TCP peers) and proactively destroy the connection after a configurable timeout, improving server resilience and observability.

Changes:

  • Introduce responseWriteTimeoutMiddleware() with GITLAB_RESPONSE_WRITE_TIMEOUT_MS configuration (default 10s, 0 disables).
  • Wire middleware into server startup and enrich disconnect reason logging with write_timeout.
  • Add unit tests and document the new environment variable.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/middleware/response-write-timeout.ts New middleware that starts a timer on writeHead and destroys stalled responses.
src/middleware/index.ts Exports the new middleware from the middleware barrel.
src/server.ts Registers the middleware early and logs write_timeout as a disconnect reason.
src/logging/types.ts Extends ConnectionCloseReason union with write_timeout.
src/config.ts Adds RESPONSE_WRITE_TIMEOUT_MS config parsed from GITLAB_RESPONSE_WRITE_TIMEOUT_MS.
tests/unit/middleware/response-write-timeout.test.ts New unit tests validating timeout behavior, cleanup, SSE skip, and disable flag.
tests/unit/server.test.ts Updates config mocks and middleware selection logic to account for the new middleware.
README.md.in Documents GITLAB_RESPONSE_WRITE_TIMEOUT_MS in the config table.
README.md Documents GITLAB_RESPONSE_WRITE_TIMEOUT_MS in the config table.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/config.ts`:
- Around line 449-452: RESPONSE_WRITE_TIMEOUT_MS currently uses parseTimerMs
which rejects zero, preventing setting GITLAB_RESPONSE_WRITE_TIMEOUT_MS=0 to
disable the timeout; update parseTimerMs to accept an allowZero boolean (default
false) that forwards to parseStrictInt, then call
parseTimerMs(process.env.GITLAB_RESPONSE_WRITE_TIMEOUT_MS, 10000, true) when
setting RESPONSE_WRITE_TIMEOUT_MS so zero is preserved; reference parseTimerMs,
parseStrictInt, and RESPONSE_WRITE_TIMEOUT_MS (and confirm
response-write-timeout.ts checks for <= 0 remain effective).

In `@src/middleware/response-write-timeout.ts`:
- Around line 51-54: Replace the unsafe any[] rest parameter for the
monkey-patched writeHead with a proper typed tuple: declare a local alias type
using Parameters<Response['writeHead']> (e.g., WriteHeadArgs) and change the
patched signature in the writeHead assignment from ...args: any[] to ...args:
WriteHeadArgs while keeping the return type Response and preserving the existing
logic inside the function.
- Around line 55-83: The middleware's patched writeHead handler uses a loose
any[] signature and checks res.getHeader('content-type') before calling
originalWriteHead, so Content-Type passed as an argument can be missed; change
the handler signature to match Node's writeHead types instead of ...args: any[]
(use the proper IncomingHttpHeaders/number/string overload types), call
originalWriteHead(...args) first (or compute the effective headers by merging
res.getHeader with headers passed in args) and only then determine isSSE by
inspecting the final Content-Type before starting writeTimer, and keep the
existing logic that sets res.locals.writeTimedOut and calls res.destroy() on
timeout (referencing originalWriteHead, writeTimer, res.getHeader, args,
RESPONSE_WRITE_TIMEOUT_MS, res.locals.writeTimedOut).

In `@tests/unit/middleware/response-write-timeout.test.ts`:
- Around line 7-10: Update the header comment in the response-write-timeout test
to reflect the actual default timeout: change the text fragment "default 30s" to
"default 10s" (matching the PR behavior of 10000ms) so the comment accurately
documents the middleware's default timeout.
- Around line 201-217: The test for responseWriteTimeoutMiddleware should also
assert that only one timer is registered when writeHead is called multiple
times: spy on global.setTimeout (e.g., jest.spyOn(global, 'setTimeout')) before
invoking middleware(req, res, next), call res.writeHead twice, then expect
setTimeout toHaveBeenCalledTimes(1) in addition to the existing
expect(res.destroy).toHaveBeenCalledTimes(1); restore the spy after the test to
avoid cross-test pollution. This references responseWriteTimeoutMiddleware and
res.writeHead to locate the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1a9414c7-1cd0-4392-b0cb-f9de424473d9

📥 Commits

Reviewing files that changed from the base of the PR and between 76d1d09 and c81a8ad.

📒 Files selected for processing (9)
  • README.md
  • README.md.in
  • src/config.ts
  • src/logging/types.ts
  • src/middleware/index.ts
  • src/middleware/response-write-timeout.ts
  • src/server.ts
  • tests/unit/middleware/response-write-timeout.test.ts
  • tests/unit/server.test.ts

- Call originalWriteHead() before reading Content-Type so headers
  passed via writeHead(200, {headers}) are visible to SSE detection
- Replace any[] with Parameters<typeof res.writeHead> for type safety
- Add allowZero to parseTimerMs so GITLAB_RESPONSE_WRITE_TIMEOUT_MS=0
  actually disables the timeout
- Correct stale default value in test docblock (30s -> 10s)
- Add setTimeout spy assertion in double-writeHead test
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/middleware/response-write-timeout.ts (1)

61-63: ⚠️ Potential issue | 🟠 Major

Normalize Content-Type before SSE detection.

Line 62 compares media type case-sensitively. A valid header like Text/Event-Stream is then treated as non-SSE, so the middleware can incorrectly destroy long-lived SSE responses.

Suggested fix
-        const contentType = res.getHeader('content-type');
-        const isSSE = typeof contentType === 'string' && contentType.includes('text/event-stream');
+        const contentType = res.getHeader('content-type');
+        const normalizedContentType =
+          typeof contentType === 'string'
+            ? contentType.toLowerCase()
+            : Array.isArray(contentType)
+              ? contentType.join(',').toLowerCase()
+              : '';
+        const isSSE = normalizedContentType.includes('text/event-stream');
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/middleware/response-write-timeout.ts` around lines 61 - 63, The SSE
detection is case-sensitive; update the content-type check so it normalizes the
header before testing: get the header into contentType (as currently) but
compute isSSE by coercing/normalizing the value to a lowercase string (e.g. if
Array.isArray(contentType) join it, or use String(contentType).toLowerCase())
and then check .includes('text/event-stream') so headers like
"Text/Event-Stream" are recognized; change the isSSE expression accordingly
(referencing the contentType and isSSE symbols and the 'text/event-stream'
literal).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/middleware/response-write-timeout.test.ts`:
- Around line 140-145: Add a regression test to response-write-timeout.test.ts
that exercises SSE delivered via res.writeHead(status, headers) so the
middleware sees the 'text/event-stream' header when headers are passed in the
writeHead call (not just via getHeader()); specifically, create a test that uses
responseWriteTimeoutMiddleware() with a mock res whose writeHead is invoked with
a headers object containing 'content-type': 'text/event-stream', and assert the
SSE path is skipped (mirroring the existing GET SSE test); update or add a
second test covering the same scenario for the lines noted (149-155) to ensure
the fix for originalWriteHead/header visibility is covered.

---

Duplicate comments:
In `@src/middleware/response-write-timeout.ts`:
- Around line 61-63: The SSE detection is case-sensitive; update the
content-type check so it normalizes the header before testing: get the header
into contentType (as currently) but compute isSSE by coercing/normalizing the
value to a lowercase string (e.g. if Array.isArray(contentType) join it, or use
String(contentType).toLowerCase()) and then check .includes('text/event-stream')
so headers like "Text/Event-Stream" are recognized; change the isSSE expression
accordingly (referencing the contentType and isSSE symbols and the
'text/event-stream' literal).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 09a96849-9844-45a5-96d0-317c21891978

📥 Commits

Reviewing files that changed from the base of the PR and between c81a8ad and 839c422.

📒 Files selected for processing (3)
  • src/config.ts
  • src/middleware/response-write-timeout.ts
  • tests/unit/middleware/response-write-timeout.test.ts

- Use toLowerCase() before checking for text/event-stream so
  mixed-case headers like Text/Event-Stream are recognized
- Handle array Content-Type values (Node.js allows string[])
- Add regression test for SSE via writeHead(200, {headers})
- Add test for mixed-case Content-Type header
@coderabbitai coderabbitai bot added the javascript Pull requests that update javascript code label Apr 5, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/middleware/response-write-timeout.ts`:
- Around line 78-79: The code directly casts req.headers['mcp-session-id'] to
string | undefined and passes it to logWarn; instead create a Zod schema (e.g.,
z.string().optional()) to validate and normalize the header (handle string[] by
selecting the first element or rejecting), use schema.safeParse or parse to
obtain a validated sessionId, and then pass that validated value into the
logWarn call in response-write-timeout.ts (update the variable referenced as
sessionId and the logWarn invocation accordingly).

In `@tests/unit/middleware/response-write-timeout.test.ts`:
- Around line 188-202: Add a new Jest test to cover the array-valued
Content-Type branch: call responseWriteTimeoutMiddleware() with a GET request
via createMockReq and a response from createMockRes whose
getHeader('content-type') is mocked to return ['text/event-stream']; invoke
middleware(req, res, next), call res.writeHead(200), advance timers
(jest.advanceTimersByTime) past the timeout, and assert res.destroy was not
called to ensure SSE is correctly detected when Content-Type is an array; place
this alongside the existing SSE tests that use mixed-case/string headers and
mirror their structure (reuse middleware, req, res, next names).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 09594f6f-abb3-43a7-a250-2cba09fb90f7

📥 Commits

Reviewing files that changed from the base of the PR and between 839c422 and 7c57d06.

📒 Files selected for processing (2)
  • src/middleware/response-write-timeout.ts
  • tests/unit/middleware/response-write-timeout.test.ts

- Replace unsafe 'as string' cast on mcp-session-id header with
  runtime typeof check handling string | string[] | undefined
- Add test for array-valued Content-Type SSE detection
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/middleware/response-write-timeout.ts`:
- Around line 63-68: The nested ternary that computes normalizedContentType from
contentType is hard to read; extract that logic into a small helper function
(e.g., normalizeContentType or getNormalizedContentType) that accepts
contentType and returns the lowercase string (handle string, Array, or other ->
''). Replace the ternary expression that assigns normalizedContentType with a
call to the new helper and add a short unit test or inline comment to document
behavior for string, array, and non-string inputs.

In `@tests/unit/middleware/response-write-timeout.test.ts`:
- Line 265: The test currently spies on the Node-only global using
jest.spyOn(global, 'setTimeout') which SonarCloud flags; replace the reference
to use the standard ES2020 globalThis (i.e., change the spy target from global
to globalThis) so setTimeoutSpy is created via jest.spyOn(globalThis,
'setTimeout') for cross-environment compatibility while keeping the existing spy
name and assertions intact.
- Line 12: The import of the built-in EventEmitter should use the Node.js
built-in module protocol; locate the import that brings in EventEmitter and
change its module specifier from 'events' to use the node:events protocol so the
test imports the built-in module via node:events rather than the bare 'events'
specifier.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5a85342a-cd04-407c-86f9-3046059d147f

📥 Commits

Reviewing files that changed from the base of the PR and between 7c57d06 and 9c95027.

📒 Files selected for processing (2)
  • src/middleware/response-write-timeout.ts
  • tests/unit/middleware/response-write-timeout.test.ts

- Extract normalizeContentType() from nested ternary in middleware
- Extract setupMiddleware() helper in tests to eliminate repeated
  middleware/req/res/next setup across 11 test cases
- Consolidate 4 SSE detection tests into it.each + 1 writeHead test
- Use node: protocol for built-in module import
- Use globalThis instead of global for setTimeout spy
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Move responseWriteTimeoutMiddleware() before express.json() so that
parse/size error responses (4xx) are also protected against stalled
writes to dead TCP peers.
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 5, 2026

@coderabbitai full review

@polaz polaz requested a review from Copilot April 5, 2026 16:52
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 5, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md.in`:
- Line 102: The documentation for GITLAB_RESPONSE_WRITE_TIMEOUT_MS should
explicitly state that setting it to 0 disables the timeout; update the README
entry for GITLAB_RESPONSE_WRITE_TIMEOUT_MS to mention "set to 0 to disable
timeout" (or equivalent), so operators know 0 turns off enforcement. Reference
the environment variable name GITLAB_RESPONSE_WRITE_TIMEOUT_MS in the wording.

In `@src/server.ts`:
- Around line 621-627: Extract the duplicated nested ternary that resolves
disconnect reasons into a single shared helper (e.g., getDisconnectReason(res))
and replace both occurrences with calls to that helper; implement
getDisconnectReason to accept the response/connection object and return one of
the strings 'write_timeout', 'heartbeat_failed', 'destroyed', or
'client_disconnect' by checking res.locals?.writeTimedOut,
res.locals?.heartbeatFailed, and res.destroyed in that priority order, and
update both call sites to use the helper to improve readability and
maintainability.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 787b073f-3e8b-4bc0-a3dc-47dcb12dad59

📥 Commits

Reviewing files that changed from the base of the PR and between 76d1d09 and 8a9ea1c.

📒 Files selected for processing (9)
  • README.md
  • README.md.in
  • src/config.ts
  • src/logging/types.ts
  • src/middleware/index.ts
  • src/middleware/response-write-timeout.ts
  • src/server.ts
  • tests/unit/middleware/response-write-timeout.test.ts
  • tests/unit/server.test.ts

- Extract duplicated nested ternary for disconnect reason into
  resolveCloseReason() shared helper (fixes SonarCloud warning)
- Clear mockSkipFn after probe in access log middleware test
- Document that GITLAB_RESPONSE_WRITE_TIMEOUT_MS=0 disables timeout
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md.in`:
- Line 102: Update the README row for GITLAB_RESPONSE_WRITE_TIMEOUT_MS to note
that SSE streams are excluded: change the description for
GITLAB_RESPONSE_WRITE_TIMEOUT_MS (currently "Max time to flush response to
client before destroying zombie connection (`0` to disable)") to explicitly
state that responses with Content-Type "text/event-stream" (SSE) are skipped and
remain on the heartbeat path, so the timeout does not apply to SSE connections.

In `@tests/unit/server.test.ts`:
- Around line 369-375: The test uses explicit any casts for mockJsonFn and
middleware function handling which violates strict TypeScript; replace the any
casts by typing mockExpress.json() as the Express JSON middleware type and type
middlewareCalls elements as tuples matching middleware signatures, then update
the .filter/.map/.find pipeline to use properly typed predicates and callbacks
(referencing mockJsonFn, middlewareCalls, accessLogMiddleware, and
shouldSkipAccessLogRequest) so the probe finds the access log middleware without
using any casts.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0b3e9784-ac3a-400c-9d9b-6f0438a90ee9

📥 Commits

Reviewing files that changed from the base of the PR and between 8a9ea1c and f565c54.

📒 Files selected for processing (4)
  • README.md
  • README.md.in
  • src/server.ts
  • tests/unit/server.test.ts

- Document that RESPONSE_WRITE_TIMEOUT_MS applies to non-SSE responses
  only (SSE uses heartbeat-based detection)
- Replace any casts in server test middleware probe with typed aliases
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 5, 2026

@coderabbitai full review

@polaz polaz requested a review from Copilot April 5, 2026 17:49
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 5, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 5, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 5, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/middleware/response-write-timeout.ts (1)

80-81: ⚠️ Potential issue | 🟡 Minor

Use a Zod schema for mcp-session-id header normalization.

Line 80-Line 81 narrows untrusted header data with typeof/array indexing, but this still bypasses the project rule requiring Zod-based validation for external input.

Suggested fix
 import type { Request, Response, NextFunction } from 'express';
+import { z } from 'zod';
 import { RESPONSE_WRITE_TIMEOUT_MS } from '../config';
 import { logWarn } from '../logger';
+
+const mcpSessionIdSchema = z
+  .union([z.string(), z.array(z.string())])
+  .optional()
+  .transform((value) => (Array.isArray(value) ? value[0] : value));
@@
-              const rawSessionId = req.headers['mcp-session-id'];
-              const sessionId = typeof rawSessionId === 'string' ? rawSessionId : rawSessionId?.[0];
+              const parsedSessionId = mcpSessionIdSchema.safeParse(req.headers['mcp-session-id']);
+              const sessionId = parsedSessionId.success ? parsedSessionId.data : undefined;
#!/bin/bash
# Verify whether external header extraction is currently validated via Zod in this middleware.
fd -t f response-write-timeout.ts src | while read -r f; do
  echo "== $f =="
  rg -n -C2 "mcp-session-id|safeParse|parse\\(|from 'zod'|from \"zod\"" "$f"
done

As per coding guidelines, “All external data must be validated with Zod schemas”.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/middleware/response-write-timeout.ts` around lines 80 - 81, Replace the
ad-hoc header normalization for rawSessionId/sessionId with a Zod-based
validation: introduce a Zod schema (e.g., using z.preprocess to accept string or
string[] and return the first string, then z.string().min(1)) and use
schema.safeParse or schema.parse on req.headers['mcp-session-id'] to produce a
validated sessionId; update the code where rawSessionId and sessionId are
referenced to use the parsed value and handle validation failure (log/return)
per existing middleware error handling. Ensure you import Zod (from 'zod') and
reference the schema name (e.g., mcpSessionIdSchema) so reviewers can locate the
change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/middleware/response-write-timeout.ts`:
- Around line 80-81: Replace the ad-hoc header normalization for
rawSessionId/sessionId with a Zod-based validation: introduce a Zod schema
(e.g., using z.preprocess to accept string or string[] and return the first
string, then z.string().min(1)) and use schema.safeParse or schema.parse on
req.headers['mcp-session-id'] to produce a validated sessionId; update the code
where rawSessionId and sessionId are referenced to use the parsed value and
handle validation failure (log/return) per existing middleware error handling.
Ensure you import Zod (from 'zod') and reference the schema name (e.g.,
mcpSessionIdSchema) so reviewers can locate the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 57559e88-f077-4859-880d-8e80c9226ce9

📥 Commits

Reviewing files that changed from the base of the PR and between 76d1d09 and c0f8106.

📒 Files selected for processing (9)
  • README.md
  • README.md.in
  • src/config.ts
  • src/logging/types.ts
  • src/middleware/index.ts
  • src/middleware/response-write-timeout.ts
  • src/server.ts
  • tests/unit/middleware/response-write-timeout.test.ts
  • tests/unit/server.test.ts

…ess log

- Validate mcp-session-id header via Zod schema instead of typeof cast
- Log write-timed-out responses as 'write_timeout' error in access log
  instead of treating as normal completion (observability gap)
- Add code note explaining why timer starts at writeHead (intentional)
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 5, 2026

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

@polaz polaz merged commit 9ee114b into main Apr 5, 2026
23 of 24 checks passed
@polaz polaz deleted the fix/#391-fix-detect-and-kill-zombie-connections-where-respo branch April 5, 2026 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

javascript Pull requests that update javascript code released

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: detect and kill zombie connections where response write stalls

2 participants