Skip to content

[Gap]: Document MCP Optimizer (Context Optimization) for vMCP #500

@yrobla

Description

@yrobla

What needs documentation?

The MCP Optimizer is a context optimization feature for Virtual MCP Servers that
dramatically reduces token usage when aggregating multiple MCP backends. Instead of
exposing all backend tools directly to LLM clients (which can consume 10,000+ tokens for
50+ tools), the optimizer exposes only two meta-tools:

  1. find_tool: Semantic search to discover relevant tools on-demand
  2. call_tool: Dynamic invocation of any backend tool by name

This achieves 70-95% token reduction depending on the number of aggregated backends.

Key aspects to document:

  1. Configuration:
    - spec.config.optimizer.embeddingService field in VirtualMCPServer CRD
    - How to reference an EmbeddingServer resource
    - Example configurations
  2. Functionality:
    - How the optimizer works (two-tool workflow)
    - Token savings mechanism and typical reduction percentages
    - What happens to original tool definitions (indexed but not exposed)
    - Interaction with resources and prompts (not affected)
  3. Embedding service integration:
    - EmbeddingServer CRD and deployment
    - Supported models (sentence-transformers, BAAI, intfloat, etc.)
    - OpenAI-compatible API contract
    - Service naming and networking requirements
  4. find_tool and call_tool specifications:
    - Input/output schemas for both tools
    - How to use them effectively
    - Token metrics reporting (baseline vs actual)
  5. Limitations and requirements:
    - Platform requirements (macOS/Windows, Docker/Rancher Desktop)
    - Current limitations (Linux, Podman on Windows, network isolation)
    - Kubernetes cluster requirement
  6. Complete working example:
    - EmbeddingServer deployment
    - VirtualMCPServer with optimizer enabled
    - Example client interaction showing find_tool → call_tool workflow

Context and references

Related commits:

Use case

Problem: When organizations aggregate multiple MCP backends (e.g., 10 backends with 200+
tools total), sending all tool definitions to the LLM client in every request consumes
massive token counts. For example:

  • 200 tools × ~200 tokens per definition = ~40,000 tokens
  • This happens on EVERY request to the LLM
  • Costs scale linearly with number of backends

Solution needed: Users deploying VirtualMCPServer resources in Kubernetes need to
understand:

  • How to enable the optimizer to reduce token usage by 70-95%
  • What embedding service to deploy and how to configure it
  • How the find_tool/call_tool workflow works
  • Platform requirements and current limitations
  • Expected token savings for their use case

Target audience:

  • Platform engineers deploying vMCP in Kubernetes
  • Teams aggregating 5+ MCP backends
  • Organizations optimizing LLM API costs
  • Developers building LLM applications with large toolsets

Documentation goals:

  1. Help users deploy and configure the optimizer correctly
  2. Set clear expectations about token savings
  3. Explain the embedding service requirement and options
  4. Provide complete working examples
  5. Document limitations upfront to avoid deployment issues

Additional context

Current status (important to document):

  • ✅ Feature is implemented and working
  • ✅ E2E tests passing
  • ⚠️ Experimental status (requires feature flag in UI)
  • ⚠️ Platform limitations (Linux, Podman on Windows, network isolation)
  • ✅ Production-ready embedding service (HuggingFace TEI)

Documentation structure recommendation:

  1. Create new guide: docs/toolhive/guides-vmcp/mcp-optimizer.mdx
  2. Add to sidebar after composite-tools.mdx
  3. Cross-reference from:
    - configuration.mdx (mention optimizer config option)
    - tool-aggregation.mdx (mention as optimization technique)
    - Tutorials section (link to complete tutorial)

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions