-
Notifications
You must be signed in to change notification settings - Fork 2
Description
What needs documentation?
The MCP Optimizer is a context optimization feature for Virtual MCP Servers that
dramatically reduces token usage when aggregating multiple MCP backends. Instead of
exposing all backend tools directly to LLM clients (which can consume 10,000+ tokens for
50+ tools), the optimizer exposes only two meta-tools:
- find_tool: Semantic search to discover relevant tools on-demand
- call_tool: Dynamic invocation of any backend tool by name
This achieves 70-95% token reduction depending on the number of aggregated backends.
Key aspects to document:
- Configuration:
- spec.config.optimizer.embeddingService field in VirtualMCPServer CRD
- How to reference an EmbeddingServer resource
- Example configurations - Functionality:
- How the optimizer works (two-tool workflow)
- Token savings mechanism and typical reduction percentages
- What happens to original tool definitions (indexed but not exposed)
- Interaction with resources and prompts (not affected) - Embedding service integration:
- EmbeddingServer CRD and deployment
- Supported models (sentence-transformers, BAAI, intfloat, etc.)
- OpenAI-compatible API contract
- Service naming and networking requirements - find_tool and call_tool specifications:
- Input/output schemas for both tools
- How to use them effectively
- Token metrics reporting (baseline vs actual) - Limitations and requirements:
- Platform requirements (macOS/Windows, Docker/Rancher Desktop)
- Current limitations (Linux, Podman on Windows, network isolation)
- Kubernetes cluster requirement - Complete working example:
- EmbeddingServer deployment
- VirtualMCPServer with optimizer enabled
- Example client interaction showing find_tool → call_tool workflow
Context and references
Related commits:
- stacklok/toolhive@26a4013b - Initial optimizer implementation
(Jan 26, 2026) - stacklok/toolhive@614c4b2e - Refactor to interface-based
architecture - stacklok/toolhive@53029a03 - Marked E2E test as pending
(requires real embedding service) - stacklok/toolhive@75a99235 - Removed implementation files
(refactoring)
Use case
Problem: When organizations aggregate multiple MCP backends (e.g., 10 backends with 200+
tools total), sending all tool definitions to the LLM client in every request consumes
massive token counts. For example:
- 200 tools × ~200 tokens per definition = ~40,000 tokens
- This happens on EVERY request to the LLM
- Costs scale linearly with number of backends
Solution needed: Users deploying VirtualMCPServer resources in Kubernetes need to
understand:
- How to enable the optimizer to reduce token usage by 70-95%
- What embedding service to deploy and how to configure it
- How the find_tool/call_tool workflow works
- Platform requirements and current limitations
- Expected token savings for their use case
Target audience:
- Platform engineers deploying vMCP in Kubernetes
- Teams aggregating 5+ MCP backends
- Organizations optimizing LLM API costs
- Developers building LLM applications with large toolsets
Documentation goals:
- Help users deploy and configure the optimizer correctly
- Set clear expectations about token savings
- Explain the embedding service requirement and options
- Provide complete working examples
- Document limitations upfront to avoid deployment issues
Additional context
Current status (important to document):
- ✅ Feature is implemented and working
- ✅ E2E tests passing
⚠️ Experimental status (requires feature flag in UI)⚠️ Platform limitations (Linux, Podman on Windows, network isolation)- ✅ Production-ready embedding service (HuggingFace TEI)
Documentation structure recommendation:
- Create new guide: docs/toolhive/guides-vmcp/mcp-optimizer.mdx
- Add to sidebar after composite-tools.mdx
- Cross-reference from:
- configuration.mdx (mention optimizer config option)
- tool-aggregation.mdx (mention as optimization technique)
- Tutorials section (link to complete tutorial)