Skip to content

Wire session-aware backend routing in proxy transports (RC-12)#4318

Open
yrobla wants to merge 6 commits intomainfrom
issue-4212
Open

Wire session-aware backend routing in proxy transports (RC-12)#4318
yrobla wants to merge 6 commits intomainfrom
issue-4212

Conversation

@yrobla
Copy link
Contributor

@yrobla yrobla commented Mar 23, 2026

Summary

Implements RC-12 from the horizontal scaling RFC (THV-0047, epic #263):

  • Adds WithSessionStorage(session.Storage) Option to all three proxy transports (streamable, HTTP-SSE, transparent), allowing a Redis-backed session store to be injected for multi-replica deployments
  • Transparent proxy now stores backend_url = targetURI in session metadata when an initialize response returns Mcp-Session-Id, and routes subsequent requests to the originating pod via the Rewrite closure
  • Non-initialize requests with an unknown session ID are rejected with HTTP 400 rather than silently forwarded

Fixes #4212

Type of change

  • Bug fix
  • New feature
  • Refactoring (no behavior change)
  • Dependency update
  • Documentation
  • Other (describe):

Test plan

  • Unit tests (task test)
  • E2E tests (task test-e2e)
  • Linting (task lint-fix)
  • Manual testing (describe below)

Changes

File Change

Does this introduce a user-facing change?

Special notes for reviewers

Large PR Justification

This is a complete PR with an isolated functionality and complete tests. Cannot be split.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@github-actions github-actions bot added the size/XL Extra large PR: 1000+ lines changed label Mar 23, 2026
@github-actions github-actions bot removed the size/XL Extra large PR: 1000+ lines changed label Mar 23, 2026
@github-actions github-actions bot dismissed their stale review March 23, 2026 13:18

PR size has been reduced below the XL threshold. Thank you for splitting this up!

@github-actions github-actions bot added the size/L Large PR: 600-999 lines changed label Mar 23, 2026
@github-actions
Copy link
Contributor

✅ PR size has been reduced below the XL threshold. The size review has been dismissed and this PR can now proceed with normal review. Thank you for splitting this up!

taskbot and others added 5 commits March 23, 2026 14:21
Adds Option/WithSessionStorage functional option to the streamable HTTP
proxy, enabling injection of a custom session storage backend for
multi-replica deployments. Updates NewHTTPProxy signature from variadic
middlewares to explicit slice + variadic opts, and fixes all call sites.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-12)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… in Rewrite (RC-12)

On initialize responses that return Mcp-Session-Id, RoundTrip now creates
the session via AddSession and stores backend_url=targetURI in metadata.
The Rewrite closure in Start looks up that metadata to override the outbound
URL, enabling session-aware routing to originating backend pods.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…wn session (RC-12)

Guard added in RoundTrip: requests carrying an Mcp-Session-Id that is not
found in the session store are rejected with HTTP 400 immediately, unless
the request is an initialize call. Two tests added to cover both branches.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codecov
Copy link

codecov bot commented Mar 23, 2026

Codecov Report

❌ Patch coverage is 52.94118% with 32 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.35%. Comparing base (6dd3eaf) to head (93a4fbd).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
pkg/transport/stdio.go 0.00% 11 Missing ⚠️
pkg/transport/proxy/httpsse/http_proxy.go 41.66% 3 Missing and 4 partials ⚠️
pkg/transport/proxy/streamable/streamable_proxy.go 41.66% 3 Missing and 4 partials ⚠️
...g/transport/proxy/transparent/transparent_proxy.go 78.78% 3 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4318      +/-   ##
==========================================
- Coverage   69.24%   68.35%   -0.89%     
==========================================
  Files         478      478              
  Lines       48157    48423     +266     
==========================================
- Hits        33346    33101     -245     
- Misses      12224    12388     +164     
- Partials     2587     2934     +347     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@yrobla yrobla requested a review from Copilot March 23, 2026 13:23
@github-actions github-actions bot added size/M Medium PR: 300-599 lines changed and removed size/L Large PR: 600-999 lines changed labels Mar 23, 2026
@chatgpt-codex-connector
Copy link

💡 Codex Review


P1 Badge Preserve path/query when rewriting to backend_url

In Rewrite, assigning pr.Out.URL = parsed overwrites the URL that pr.SetURL(targetURL) already built from the inbound request, which drops the original path/query (for example /mcp). Once a session has backend_url metadata, routed requests can be sent to the backend root instead of the intended MCP endpoint, causing 404s or wrong-handler behavior. Keep the inbound path/query and only swap scheme/host (or re-run SetURL with the backend target) when applying session-aware routing.


provider, err := registry.GetDefaultProviderWithConfig(config.NewDefaultProvider())

P2 Badge Use non-interactive registry lookup in serve mode

lazySkillLookup.SearchSkills creates the registry provider without registry.WithInteractive(false), unlike serve-mode registry routes (pkg/api/v1/registry.go) which explicitly disable browser OAuth flows. In API/server deployments with OAuth-protected registries and no cached token, plain-name install lookup can trigger interactive auth behavior (or fail indirectly) instead of returning a deterministic auth-required response, making registry-backed installs unreliable in headless environments.


return result, s.registerSkillInGroup(ctx, opts.Group, opts.Name)

P2 Badge Add resolved registry installs to groups by skill name

After resolving from registry, opts.Name is rewritten to the OCI reference (ref.String()), and this value is then passed to registerSkillInGroup. Group filtering later matches group entries against InstalledSkill.Metadata.Name, so storing the OCI reference causes registry-resolved installs to be omitted from group-scoped results and can leave stale group entries on uninstall. Register the extracted skill name (for example from result.Skill.Metadata.Name) instead of the OCI reference string.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements RC-12 session-sticky routing capabilities in the proxy transports by allowing an injectable session storage backend and using session metadata in the transparent proxy to route follow-up requests back to the originating backend.

Changes:

  • Add WithSessionStorage(session.Storage) option support to the Streamable HTTP, HTTP-SSE, and Transparent proxies (and adjust constructors to accept options).
  • Transparent proxy now records backend_url in session metadata on initialize and uses it during ReverseProxy.Rewrite to steer subsequent requests.
  • Add/extend unit/integration tests for session storage injection and transparent backend routing behavior.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
pkg/transport/stdio.go Updates proxy constructor calls for new middleware parameter shape; currently no plumbing for session storage injection.
pkg/transport/proxy/transparent/transparent_proxy.go Adds session storage option, unknown-session guard, session metadata persistence, and Rewrite-based backend routing.
pkg/transport/proxy/transparent/backend_routing_test.go New tests covering backend routing via backend_url and 400s for unknown sessions.
pkg/transport/proxy/transparent/transparent_test.go Adds a unit test ensuring WithSessionStorage initializes a session manager.
pkg/transport/proxy/streamable/streamable_proxy.go Introduces options pattern and WithSessionStorage for Streamable proxy; updates constructor signature.
pkg/transport/proxy/streamable/streamable_proxy_test.go Updates constructor call sites and adds a WithSessionStorage unit test.
pkg/transport/proxy/streamable/streamable_proxy_spec_test.go Updates constructor call sites for new signature.
pkg/transport/proxy/streamable/streamable_proxy_mcp_client_integration_test.go Updates constructor call sites for new signature.
pkg/transport/proxy/streamable/streamable_proxy_integration_test.go Updates constructor call sites for new signature.
pkg/transport/proxy/httpsse/http_proxy.go Introduces options pattern and WithSessionStorage for HTTP-SSE proxy; updates constructor signature.
pkg/transport/proxy/httpsse/http_proxy_test.go Updates constructor call sites and adds a WithSessionStorage unit test.
pkg/transport/proxy/httpsse/http_proxy_integration_test.go Updates constructor call sites for new signature.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions github-actions bot added size/M Medium PR: 300-599 lines changed and removed size/M Medium PR: 300-599 lines changed labels Mar 23, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@yrobla yrobla requested a review from Copilot March 23, 2026 15:14
@github-actions github-actions bot added size/M Medium PR: 300-599 lines changed and removed size/M Medium PR: 300-599 lines changed labels Mar 23, 2026
@github-actions github-actions bot added size/M Medium PR: 300-599 lines changed and removed size/M Medium PR: 300-599 lines changed labels Mar 23, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions github-actions bot added size/M Medium PR: 300-599 lines changed and removed size/M Medium PR: 300-599 lines changed labels Mar 23, 2026
@yrobla yrobla requested a review from Copilot March 23, 2026 15:24
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +95 to +107
// WithSessionStorage injects a custom storage backend into the session manager.
// When not provided, the proxy uses in-memory LocalStorage (single-replica default).
func WithSessionStorage(storage session.Storage) Option {
return func(p *HTTPSSEProxy) {
if storage == nil {
return
}
if p.sessionManager != nil {
_ = p.sessionManager.Stop()
}
sseFactory := func(id string) session.Session { return session.NewSSESession(id) }
p.sessionManager = session.NewManagerWithStorage(session.DefaultSessionTTL, sseFactory, storage)
}
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WithSessionStorage allows swapping the session manager to use arbitrary session.Storage backends (e.g. Redis), but HTTPSSEProxy’s runtime logic depends on sessionManager.Range() to (a) detect whether any clients are connected (ForwardResponseToClients) and (b) broadcast SSE events (sendSSEEvent) / disconnect clients (Stop). Manager.Range() is a no-op for non-LocalStorage backends, so using RedisStorage here would result in messages never being delivered to connected SSE clients and sessions not being disconnected/cleaned up correctly. Consider either rejecting non-local storage for HTTPSSEProxy, or keeping a separate in-memory registry for live SSE connections while using shared storage only for minimal routing metadata/session existence checks.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@jerm-dro jerm-dro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manager.Range() is a no-op for non-local storage backends, and HTTPSSEProxy depends on it for core functionality. This needs to be resolved before Redis storage can be safely injected. See inline comments for details and two additional suggestions.

}
sseFactory := func(id string) session.Session { return session.NewSSESession(id) }
p.sessionManager = session.NewManagerWithStorage(session.DefaultSessionTTL, sseFactory, storage)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocker: Manager.Range() is a no-op for non-LocalStorage backends (manager.go:264-268). HTTPSSEProxy relies on Range() in three critical paths: disconnecting all clients on Stop (line 223), detecting connected clients for message forwarding (line 278), and broadcasting SSE events to all sessions (line 443). If Redis storage is injected here, all three silently stop working — messages are never delivered and sessions are never disconnected.

This should be addressed before Redis storage is integrated. Ideally we'd remove Range() from Manager altogether — it's a LocalStorage-specific concept that doesn't translate to distributed backends. The SSE proxy needs a different mechanism for tracking live connections that doesn't depend on iterating the session store.

@@ -428,7 +467,9 @@ func (t *tracingTransport) RoundTrip(req *http.Request) (*http.Response, error)
slog.Debug("detected Mcp-Session-Id header", "session_id", ct)
internalID := normalizeSessionID(ct)
if _, ok := t.p.sessionManager.Get(internalID); !ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: "backend_url" is used as a metadata key in three places: stored here, read in the Rewrite closure (line 555), and asserted in tests. Consider extracting this to a const to avoid silent breakage if one occurrence is changed without the others.

@@ -428,7 +467,9 @@ func (t *tracingTransport) RoundTrip(req *http.Request) (*http.Response, error)
slog.Debug("detected Mcp-Session-Id header", "session_id", ct)
internalID := normalizeSessionID(ct)
if _, ok := t.p.sessionManager.Get(internalID); !ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Rather than storing the proxy's own targetURI here and overriding scheme/host in the Rewrite closure on every request, could we resolve the backend pod's stable DNS name via the K8s API (e.g., http://mcp-server-0.mcp-server.default.svc:8080) before creating the session? If the session's backend_url is already the pod's full address, the Rewrite closure becomes unnecessary — the reverse proxy can target the session's URL directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M Medium PR: 300-599 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wire session-aware backend routing and LRU cache-size in proxy transports (RC-12, RC-13)

4 participants