Wire session-aware backend routing in proxy transports (RC-12)#4318
Wire session-aware backend routing in proxy transports (RC-12)#4318
Conversation
There was a problem hiding this comment.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
PR size has been reduced below the XL threshold. Thank you for splitting this up!
|
✅ PR size has been reduced below the XL threshold. The size review has been dismissed and this PR can now proceed with normal review. Thank you for splitting this up! |
Adds Option/WithSessionStorage functional option to the streamable HTTP proxy, enabling injection of a custom session storage backend for multi-replica deployments. Updates NewHTTPProxy signature from variadic middlewares to explicit slice + variadic opts, and fixes all call sites. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-12) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… in Rewrite (RC-12) On initialize responses that return Mcp-Session-Id, RoundTrip now creates the session via AddSession and stores backend_url=targetURI in metadata. The Rewrite closure in Start looks up that metadata to override the outbound URL, enabling session-aware routing to originating backend pods. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…wn session (RC-12) Guard added in RoundTrip: requests carrying an Mcp-Session-Id that is not found in the session store are rejected with HTTP 400 immediately, unless the request is an initialize call. Two tests added to cover both branches. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4318 +/- ##
==========================================
- Coverage 69.24% 68.35% -0.89%
==========================================
Files 478 478
Lines 48157 48423 +266
==========================================
- Hits 33346 33101 -245
- Misses 12224 12388 +164
- Partials 2587 2934 +347 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
💡 Codex ReviewIn Line 623 in a58edb9
toolhive/pkg/skills/skillsvc/skillsvc.go Line 265 in a58edb9 After resolving from registry, ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
Pull request overview
This PR implements RC-12 session-sticky routing capabilities in the proxy transports by allowing an injectable session storage backend and using session metadata in the transparent proxy to route follow-up requests back to the originating backend.
Changes:
- Add
WithSessionStorage(session.Storage)option support to the Streamable HTTP, HTTP-SSE, and Transparent proxies (and adjust constructors to accept options). - Transparent proxy now records
backend_urlin session metadata on initialize and uses it duringReverseProxy.Rewriteto steer subsequent requests. - Add/extend unit/integration tests for session storage injection and transparent backend routing behavior.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/transport/stdio.go | Updates proxy constructor calls for new middleware parameter shape; currently no plumbing for session storage injection. |
| pkg/transport/proxy/transparent/transparent_proxy.go | Adds session storage option, unknown-session guard, session metadata persistence, and Rewrite-based backend routing. |
| pkg/transport/proxy/transparent/backend_routing_test.go | New tests covering backend routing via backend_url and 400s for unknown sessions. |
| pkg/transport/proxy/transparent/transparent_test.go | Adds a unit test ensuring WithSessionStorage initializes a session manager. |
| pkg/transport/proxy/streamable/streamable_proxy.go | Introduces options pattern and WithSessionStorage for Streamable proxy; updates constructor signature. |
| pkg/transport/proxy/streamable/streamable_proxy_test.go | Updates constructor call sites and adds a WithSessionStorage unit test. |
| pkg/transport/proxy/streamable/streamable_proxy_spec_test.go | Updates constructor call sites for new signature. |
| pkg/transport/proxy/streamable/streamable_proxy_mcp_client_integration_test.go | Updates constructor call sites for new signature. |
| pkg/transport/proxy/streamable/streamable_proxy_integration_test.go | Updates constructor call sites for new signature. |
| pkg/transport/proxy/httpsse/http_proxy.go | Introduces options pattern and WithSessionStorage for HTTP-SSE proxy; updates constructor signature. |
| pkg/transport/proxy/httpsse/http_proxy_test.go | Updates constructor call sites and adds a WithSessionStorage unit test. |
| pkg/transport/proxy/httpsse/http_proxy_integration_test.go | Updates constructor call sites for new signature. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // WithSessionStorage injects a custom storage backend into the session manager. | ||
| // When not provided, the proxy uses in-memory LocalStorage (single-replica default). | ||
| func WithSessionStorage(storage session.Storage) Option { | ||
| return func(p *HTTPSSEProxy) { | ||
| if storage == nil { | ||
| return | ||
| } | ||
| if p.sessionManager != nil { | ||
| _ = p.sessionManager.Stop() | ||
| } | ||
| sseFactory := func(id string) session.Session { return session.NewSSESession(id) } | ||
| p.sessionManager = session.NewManagerWithStorage(session.DefaultSessionTTL, sseFactory, storage) | ||
| } |
There was a problem hiding this comment.
WithSessionStorage allows swapping the session manager to use arbitrary session.Storage backends (e.g. Redis), but HTTPSSEProxy’s runtime logic depends on sessionManager.Range() to (a) detect whether any clients are connected (ForwardResponseToClients) and (b) broadcast SSE events (sendSSEEvent) / disconnect clients (Stop). Manager.Range() is a no-op for non-LocalStorage backends, so using RedisStorage here would result in messages never being delivered to connected SSE clients and sessions not being disconnected/cleaned up correctly. Consider either rejecting non-local storage for HTTPSSEProxy, or keeping a separate in-memory registry for live SSE connections while using shared storage only for minimal routing metadata/session existence checks.
jerm-dro
left a comment
There was a problem hiding this comment.
Manager.Range() is a no-op for non-local storage backends, and HTTPSSEProxy depends on it for core functionality. This needs to be resolved before Redis storage can be safely injected. See inline comments for details and two additional suggestions.
| } | ||
| sseFactory := func(id string) session.Session { return session.NewSSESession(id) } | ||
| p.sessionManager = session.NewManagerWithStorage(session.DefaultSessionTTL, sseFactory, storage) | ||
| } |
There was a problem hiding this comment.
blocker: Manager.Range() is a no-op for non-LocalStorage backends (manager.go:264-268). HTTPSSEProxy relies on Range() in three critical paths: disconnecting all clients on Stop (line 223), detecting connected clients for message forwarding (line 278), and broadcasting SSE events to all sessions (line 443). If Redis storage is injected here, all three silently stop working — messages are never delivered and sessions are never disconnected.
This should be addressed before Redis storage is integrated. Ideally we'd remove Range() from Manager altogether — it's a LocalStorage-specific concept that doesn't translate to distributed backends. The SSE proxy needs a different mechanism for tracking live connections that doesn't depend on iterating the session store.
| @@ -428,7 +467,9 @@ func (t *tracingTransport) RoundTrip(req *http.Request) (*http.Response, error) | |||
| slog.Debug("detected Mcp-Session-Id header", "session_id", ct) | |||
| internalID := normalizeSessionID(ct) | |||
| if _, ok := t.p.sessionManager.Get(internalID); !ok { | |||
There was a problem hiding this comment.
nitpick: "backend_url" is used as a metadata key in three places: stored here, read in the Rewrite closure (line 555), and asserted in tests. Consider extracting this to a const to avoid silent breakage if one occurrence is changed without the others.
| @@ -428,7 +467,9 @@ func (t *tracingTransport) RoundTrip(req *http.Request) (*http.Response, error) | |||
| slog.Debug("detected Mcp-Session-Id header", "session_id", ct) | |||
| internalID := normalizeSessionID(ct) | |||
| if _, ok := t.p.sessionManager.Get(internalID); !ok { | |||
There was a problem hiding this comment.
suggestion: Rather than storing the proxy's own targetURI here and overriding scheme/host in the Rewrite closure on every request, could we resolve the backend pod's stable DNS name via the K8s API (e.g., http://mcp-server-0.mcp-server.default.svc:8080) before creating the session? If the session's backend_url is already the pod's full address, the Rewrite closure becomes unnecessary — the reverse proxy can target the session's URL directly.
Summary
Implements RC-12 from the horizontal scaling RFC (THV-0047, epic #263):
WithSessionStorage(session.Storage) Optionto all three proxy transports (streamable, HTTP-SSE, transparent), allowing a Redis-backed session store to be injected for multi-replica deploymentsbackend_url = targetURIin session metadata when aninitializeresponse returnsMcp-Session-Id, and routes subsequent requests to the originating pod via theRewriteclosureFixes #4212
Type of change
Test plan
task test)task test-e2e)task lint-fix)Changes
Does this introduce a user-facing change?
Special notes for reviewers
Large PR Justification
This is a complete PR with an isolated functionality and complete tests. Cannot be split.