Skip to content

RFC: Session-scoped MCP client lifecycle management#38

Open
yrobla wants to merge 2 commits intomainfrom
session-scoped-mcp-client
Open

RFC: Session-scoped MCP client lifecycle management#38
yrobla wants to merge 2 commits intomainfrom
session-scoped-mcp-client

Conversation

@yrobla
Copy link

@yrobla yrobla commented Feb 4, 2026

The RFC captures the complete architectural vision from the discussion, providing a roadmap for simplifying the client pooling implementation while maintaining all current functionality.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an RFC proposing a shift from per-request/lazy MCP backend client creation to session-scoped client lifecycle management to simplify the current pooling approach while preserving backend state across tool calls.

Changes:

  • Introduces a new RFC describing a session-scoped backend client map owned by VMCPSession.
  • Specifies eager client initialization during session setup (AfterInitialize) and cleanup during session close.
  • Outlines phased implementation steps, testing strategy, and alternatives considered.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@yrobla yrobla force-pushed the session-scoped-mcp-client branch from f935166 to 6e503d4 Compare February 4, 2026 13:40
The RFC captures the complete architectural vision from the discussion,
providing a roadmap for simplifying the client pooling implementation
while maintaining all current functionality.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@yrobla yrobla force-pushed the session-scoped-mcp-client branch 3 times, most recently from 840b512 to ddf4273 Compare February 4, 2026 15:26
@yrobla yrobla requested a review from Copilot February 4, 2026 15:26
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@yrobla yrobla force-pushed the session-scoped-mcp-client branch 2 times, most recently from 50f4742 to ec70d08 Compare February 4, 2026 15:51
@yrobla yrobla requested a review from Copilot February 4, 2026 15:52
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

**Connection Failures During Tool Calls**:
- Return error to client (existing behavior)
- Health monitor marks backend unhealthy (existing behavior)
- Client initialization attempts continue for unhealthy backends (health-based filtering is future work, see Phase 4)
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error-handling bullets for tool calls say "Client initialization attempts continue for unhealthy backends", but the RFC’s proposed design removes per-request client creation/initialization (clients are created once during session setup). This is internally inconsistent and affects failure-recovery semantics. Please clarify whether tool calls will (a) never re-initialize and always reuse the same client, or (b) attempt to recreate/re-handshake a client on demand when missing/errored/unhealthy (and where that logic lives).

Suggested change
- Client initialization attempts continue for unhealthy backends (health-based filtering is future work, see Phase 4)
- No automatic client re-initialization during tool calls; unhealthy backends remain unavailable for the lifetime of the session (health-based recovery/filtering is future work, see Phase 4)

Copilot uses AI. Check for mistakes.
Comment on lines +240 to +264
### Secrets Management

**No Changes**: Outgoing auth secrets are still retrieved via `OutgoingAuthRegistry` during client creation. The timing changes (session init vs first request) but the mechanism is identical.
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In "Secrets Management" this states outgoing auth secrets are retrieved during client creation and only the timing changes. With session-scoped clients, any short-lived/outgoing credentials resolved at client-creation time (e.g., expiring tokens) could become stale mid-session, whereas per-request client creation would naturally refresh them. The RFC should explicitly describe how credential refresh/reauth is handled for long-lived clients (per-request header injection, token refresh hooks, client recreation on auth failures, etc.).

Copilot uses AI. Check for mistakes.
@yrobla yrobla force-pushed the session-scoped-mcp-client branch from ec70d08 to 54dd1b4 Compare February 4, 2026 16:00
@yrobla yrobla force-pushed the session-scoped-mcp-client branch from 54dd1b4 to d714819 Compare February 4, 2026 16:28
Copy link

@jerm-dro jerm-dro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left two bigger comments elaborating on my thoughts. To summarize, I think there are larger problems with how the session concept is handled within vMCP. Namely, we have session concerns scattered throughout the codebase and all of these concerns are tightly coupled. As a solution, I think we should create an interface that encapsulates these concerns:

// Session represents an active MCP session with all its capabilities and resources.
// This is a pure domain object - no protocol concerns and can be run without being spun up in a server.
type Session interface {
    ID() string

    // Capabilities - returns discovered tools/resources for this session
    // Perhaps these could be combined into one "GetCapabilities" method.
    Tools() []Tool
    Resources() []Resource

    // MCP operations - routing logic is encapsulated here
    CallTool(ctx context.Context, name string, arguments map[string]any) (*ToolResult, error)
    ReadResource(ctx context.Context, uri string) (*ResourceResult, error)
    GetPrompt(ctx context.Context, name string, arguments map[string]any) (*PromptResult, error)

    // Lifecycle
    Close() error
}

That would make solving this client-lifetime problem trivial.


Move MCP backend client lifecycle from per-request creation/destruction to session-scoped management. Clients will be created once during session initialization, reused throughout the session lifetime, and closed during session cleanup. This simplifies the architecture and ensures consistent backend state preservation.

## Problem Statement
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is just one symptom of a larger problem: scattered management of session concerns. "Session concerns" is a bit broad, but I think there's two components:

  1. Creating objects with session lifetimes.
  2. Wiring our concept of a session up to the MCP sdk, so we can actually communicate over the MCP protocol.

I had claude generate the following timeline of when all these things happen:

The Tangled Story of Session Initialization

Act 1: Discovery Happens Before Session Exists

A client sends an HTTP POST to initialize an MCP session. The request has no Mcp-Session-Id header yet.

The discovery middleware intercepts the request first. It sees there's no session ID, so it triggers capability discovery:

capabilities, err := manager.Discover(discoveryCtx, backends)

This queries all backends, resolves conflicts, and builds the routing table. These are session-scoped objects - they'll live for the duration of this client's session.

But here's the problem: the session doesn't exist yet. The SDK hasn't created it. So the middleware stuffs the capabilities into the request context and hopes someone downstream will put them in the right place:

ctx = WithDiscoveredCapabilities(ctx, capabilities)

The request continues to the next handler.


Act 2: The SDK Creates a Session (But It's Empty)

The request reaches the SDK's StreamableHTTPServer. The SDK sees this is an initialize request and needs to create a session. It calls our sessionIDAdapter.Generate():

func (a *sessionIDAdapter) Generate() string {
    sessionID := uuid.New().String()
    if err := a.manager.AddWithID(sessionID); err != nil {
        // ...
    }
    return sessionID
}

This creates an empty VMCPSession in the transport session manager. The session exists now, but it's a hollow shell - no routing table, no tools, no clients.

Notice: Generate() has no access to the capabilities that were discovered upstream. It can't populate the session. It just creates an empty container and returns an ID to the SDK.


Act 3: The Hook Tries to Glue It Together

The SDK fires the OnRegisterSession hook. We registered a handler for this:

hooks.AddOnRegisterSession(func(ctx context.Context, session server.ClientSession) {
    srv.handleSessionRegistration(ctx, session, sessionManager)
})

Now handleSessionRegistration runs. It has to:

  1. Fish the capabilities back out of the context (where middleware stashed them)
  2. Find the VMCPSession that Generate() created
  3. Copy the capabilities into the session
caps := discovery.DiscoveredCapabilitiesFromContext(ctx)
// ...
vmcpSess.SetRoutingTable(caps.RoutingTable)
vmcpSess.SetTools(caps.Tools)

This is where routing table and tools finally get stored in the session.

But wait - we're also registering tools with the SDK here via AddSessionTools. This is SDK wiring, not session object creation. Both concerns happen in the same function.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stating the client problem another way: "it's hard & unintuitive to create objects in vMCP with a session lifetime."

These tangled concerns produce other unfortunate effects. The first that comes to mind is missing integration tests below the server level. Because we can't create sessions without spinning up the whole vMCP server, we are stuck testing very small units (e.g. the discovery module) or the whole server (e.g. gingko/integration tests) Related issue: stacklok/toolhive#2852.

I think the solution should look something like the following (generated with claude's assistance):

Session Architecture: Decoupling and Encapsulation

This proposal addresses two architectural problems in the vMCP session system:

  1. Decoupling session creation from SDK wiring: Today, session construction is spread across middleware, adapters, and hooks—tightly coupled to the MCP SDK's lifecycle callbacks. A SessionFactory separates the concern of building a session from integrating with the SDK.

  2. Encapsulating session behavior: Today's VMCPSession is a passive data container read and written from many locations. The proposed Session interface is an active domain object that owns its clients, encapsulates routing logic, and manages its own lifecycle.


Background: How Session Routing Works Today

The current design routes requests via context, not session object lookup.

  1. Middleware stuffs capabilities into context before the SDK sees the request: discovery/middleware.go:109-110

  2. Router is stateless - it extracts the routing table from context on every request: router/default_router.go:54

    capabilities, ok := discovery.DiscoveredCapabilitiesFromContext(ctx)
    target := capabilities.RoutingTable.Tools[toolName]
  3. Handler factory uses router to find backend target, then calls a shared backend client: adapter/handler_factory.go:103-125

    target, err := f.router.RouteTool(ctx, toolName)
    result, err := f.backendClient.CallTool(ctx, target, toolName, args, meta)

The flow is: request → middleware (stuff ctx) → handler → router (read ctx) → backend client

There's no sessions.Load(sessionID) - routing data flows through context. The backend client is shared across all sessions and creates new MCP connections per request.

Current Objects Are Data Containers, Not Domain Objects

The existing VMCPSession is a passive data container with getters and setters, not an object with encapsulated behavior:

type VMCPSession struct {
    *transportsession.StreamableSession
    routingTable *vmcp.RoutingTable
    tools        []vmcp.Tool
    clientPool   interface{}
    mu           sync.RWMutex
}

func (s *VMCPSession) SetRoutingTable(rt *vmcp.RoutingTable) { ... }
func (s *VMCPSession) RoutingTable() *vmcp.RoutingTable      { ... }
func (s *VMCPSession) SetTools(tools []vmcp.Tool)            { ... }
func (s *VMCPSession) Tools() []vmcp.Tool                    { ... }

This creates cognitive load because:

  • Data is written in one place, read in another: The routing table is set in OnRegisterSession (server.go:989) but read by the router via context, not from the session object.

  • No single source of truth: Session state is scattered across context values, the VMCPSession struct, and the transport layer's StreamableSession. To understand what a "session" contains, you must trace through middleware, hooks, and multiple storage locations.

  • Objects don't do anything: The router, handler factory, and backend client are separate stateless components that operate on data pulled from context. The session object doesn't route requests or manage clients—it just holds references that other code reads.

The result is that understanding "how does a tool call get routed?" requires reading middleware, router, handler factory, and backend client code—four separate locations that coordinate via context threading rather than method calls on a domain object.


Core Interfaces

Both the session factory and session itself should be interfaces for testability:

// SessionFactory creates fully-formed sessions from configuration and runtime inputs.
type SessionFactory interface {
    // MakeSession constructs a session with all its dependencies.
    // This is where capability discovery, client creation, and resource allocation happen.
    MakeSession(
        ctx context.Context,
        identity *auth.Identity,
        backends []Backend,
    ) (Session, error)
}

// Session represents an active MCP session with all its capabilities and resources.
// This is a pure domain object - no protocol concerns.
type Session interface {
    ID() string

    // Capabilities - returns discovered tools/resources for this session
    // Perhaps these could be combined into one "GetCapabilities" method.
    Tools() []Tool
    Resources() []Resource

    // MCP operations - routing logic is encapsulated here
    CallTool(ctx context.Context, name string, arguments map[string]any) (*ToolResult, error)
    ReadResource(ctx context.Context, uri string) (*ResourceResult, error)
    GetPrompt(ctx context.Context, name string, arguments map[string]any) (*PromptResult, error)

    // Lifecycle
    Close() error
}

Session Factory Implementation

type defaultSessionFactory struct {
    aggregator    Aggregator
    clientFactory ClientFactory
    config        *Config
}

func (f *defaultSessionFactory) MakeSession(
    ctx context.Context,
    identity *auth.Identity,
    backends []Backend,
) (Session, error) {
    // 1. Discover capabilities (queries all backends, resolves conflicts)
    capabilities, err := f.aggregator.AggregateCapabilities(ctx, backends)
    if err != nil {
        return nil, fmt.Errorf("capability discovery failed: %w", err)
    }

    // 2. Create MCP clients for all backends in routing table
    clients := make(map[string]*Client)
    for _, target := range capabilities.RoutingTable.UniqueBackends() {
        c, err := f.clientFactory.CreateAndInitialize(ctx, target)
        if err != nil {
            closeAll(clients)
            return nil, fmt.Errorf("client creation failed for %s: %w", target.WorkloadID, err)
        }
        clients[target.WorkloadID] = c
    }

    // 3. Return fully-formed session
    return &defaultSession{
        id:           uuid.New().String(),
        identity:     identity,
        routingTable: capabilities.RoutingTable,
        tools:        capabilities.Tools,
        resources:    capabilities.Resources,
        clients:      clients,
    }, nil
}

Session Implementation

The session encapsulates routing logic and holds pre-created clients:

type defaultSession struct {
    id           string
    identity     *auth.Identity
    routingTable *RoutingTable
    tools        []Tool
    resources    []Resource
    clients      map[string]*Client // backendID -> client
}

func (s *defaultSession) ID() string            { return s.id }
func (s *defaultSession) Tools() []Tool         { return s.tools }
func (s *defaultSession) Resources() []Resource { return s.resources }

func (s *defaultSession) CallTool(ctx context.Context, name string, args map[string]any) (*ToolResult, error) {
    target, ok := s.routingTable.Tools[name]
    if !ok {
        return nil, fmt.Errorf("tool not found: %s", name)
    }

    client := s.clients[target.WorkloadID]
    return client.CallTool(ctx, target.OriginalName, args)
}

func (s *defaultSession) Close() error {
    var errs []error
    for id, c := range s.clients {
        if err := c.Close(); err != nil {
            errs = append(errs, fmt.Errorf("closing client %s: %w", id, err))
        }
    }
    s.clients = nil
    return errors.Join(errs...)
}

Wiring into the MCP SDK

The mark3labs/mcp-go SDK defines SessionIdManager:

type SessionIdManager interface {
    Generate() string
    Validate(sessionID string) (isTerminated bool, err error)
    Terminate(sessionID string) (isNotAllowed bool, err error)
}

A single SessionManager implements both SDK session lifecycle and provides adapted tools/resources:

// SessionManager implements SDK session lifecycle and provides adapted capabilities.
type SessionManager struct {
    factory  SessionFactory
    sessions sync.Map // map[string]Session
}

// --- SessionIdManager interface (SDK lifecycle) ---

func (m *SessionManager) Generate(ctx context.Context) string {
    identity, _ := auth.IdentityFromContext(ctx)
    backends := getBackendsFromContext(ctx)

    session, err := m.factory.MakeSession(ctx, identity, backends)
    if err != nil {
        logger.Errorf("session creation failed: %v", err)
        return ""
    }

    m.sessions.Store(session.ID(), session)
    return session.ID()
}

func (m *SessionManager) Validate(sessionID string) (isTerminated bool, err error) {
    _, ok := m.sessions.Load(sessionID)
    if !ok {
        return false, fmt.Errorf("session not found")
    }
    return false, nil
}

func (m *SessionManager) Terminate(sessionID string) (isNotAllowed bool, err error) {
    val, ok := m.sessions.LoadAndDelete(sessionID)
    if !ok {
        return false, nil
    }

    session := val.(Session)
    if err := session.Close(); err != nil {
        logger.Warnf("error closing session %s: %v", sessionID, err)
    }
    return false, nil
}

// --- Adapted capabilities for SDK registration ---

func (m *SessionManager) GetAdaptedTools(sessionID string) []mcp.Tool {
    val, ok := m.sessions.Load(sessionID)
    if !ok {
        return nil
    }
    session := val.(Session)

    var sdkTools []mcp.Tool
    for _, tool := range session.Tools() {
        sdkTools = append(sdkTools, mcp.NewTool(
            tool.Name,
            m.createToolHandler(sessionID, tool.Name),
            mcp.WithDescription(tool.Description),
            mcp.WithToolInputSchema(tool.InputSchema),
        ))
    }
    return sdkTools
}

// --- Internal handler creation ---

func (m *SessionManager) createToolHandler(sessionID, toolName string) mcp.ToolHandler {
    return func(ctx context.Context, req mcp.CallToolRequest) (*mcp.CallToolResult, error) {
        val, ok := m.sessions.Load(sessionID)
        if !ok {
            return nil, fmt.Errorf("session not found: %s", sessionID)
        }
        session := val.(Session)

        args, _ := req.Params.Arguments.(map[string]any)
        result, err := session.CallTool(ctx, toolName, args)
        if err != nil {
            return mcp.NewToolResultError(err.Error()), nil
        }

        return toSDKResult(result), nil
    }
}

Registration with the SDK:

// 1. Session lifecycle management
streamableServer := server.NewStreamableHTTPServer(
    mcpServer,
    server.WithSessionIdManager(sessionManager),
)

// 2. In OnRegisterSession hook, register adapted tools/resources
hooks.AddOnRegisterSession(func(ctx context.Context, sdkSession server.ClientSession) {
    // The SDK calls sessionManager.Generate() first, which creates the session and
    // returns its ID. The SDK then fires this hook, passing that same ID via
    // sdkSession.SessionID(). This allows us to look up the session we created
    // in Generate() and register its tools/resources with the SDK.
    sessionID := sdkSession.SessionID()

    mcpServer.AddSessionTools(sessionID, sessionManager.GetAdaptedTools(sessionID)...)
    mcpServer.AddSessionResources(sessionID, sessionManager.GetAdaptedResources(sessionID)...)
})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants