[Gap]: Document MCP Optimizer (Context Optimization) for vMCP

### What needs documentation?

The MCP Optimizer is a context optimization feature for Virtual MCP Servers that           
dramatically reduces token usage when aggregating multiple MCP backends. Instead of        
exposing all backend tools directly to LLM clients (which can consume 10,000+ tokens for   
50+ tools), the optimizer exposes only two meta-tools:                                     
                                                                                             
1. find_tool: Semantic search to discover relevant tools on-demand                         
2. call_tool: Dynamic invocation of any backend tool by name                               
                                                                                             
This achieves 70-95% token reduction depending on the number of aggregated backends. 

Key aspects to document:                                                                   
                                                                                             
  1. Configuration:                                                                          
    - spec.config.optimizer.embeddingService field in VirtualMCPServer CRD                   
    - How to reference an EmbeddingServer resource                                           
    - Example configurations                                                                 
  2. Functionality:                                                                          
    - How the optimizer works (two-tool workflow)                                            
    - Token savings mechanism and typical reduction percentages                              
    - What happens to original tool definitions (indexed but not exposed)                    
    - Interaction with resources and prompts (not affected)                                  
  3. Embedding service integration:                                                          
    - EmbeddingServer CRD and deployment                                                     
    - Supported models (sentence-transformers, BAAI, intfloat, etc.)                         
    - OpenAI-compatible API contract                                                         
    - Service naming and networking requirements                                             
  4. find_tool and call_tool specifications:                                                 
    - Input/output schemas for both tools                                                    
    - How to use them effectively                                                            
    - Token metrics reporting (baseline vs actual)                                           
  5. Limitations and requirements:                                                           
    - Platform requirements (macOS/Windows, Docker/Rancher Desktop)                          
    - Current limitations (Linux, Podman on Windows, network isolation)                      
    - Kubernetes cluster requirement                                                         
  6. Complete working example:                                                               
    - EmbeddingServer deployment                                                             
    - VirtualMCPServer with optimizer enabled                                                
    - Example client interaction showing find_tool → call_tool workflow

### Context and references

Related commits:                                                                           
  - https://github.com/stacklok/toolhive/commit/26a4013b - Initial optimizer implementation  
  (Jan 26, 2026)                                                                             
  - https://github.com/stacklok/toolhive/commit/614c4b2e - Refactor to interface-based       
  architecture                                                                               
  - https://github.com/stacklok/toolhive/commit/53029a03 - Marked E2E test as pending        
  (requires real embedding service)                                                          
  - https://github.com/stacklok/toolhive/commit/75a99235 - Removed implementation files      
  (refactoring)                                               

### Use case

Problem: When organizations aggregate multiple MCP backends (e.g., 10 backends with 200+   
tools total), sending all tool definitions to the LLM client in every request consumes     
massive token counts. For example:                                                         
  - 200 tools × ~200 tokens per definition = ~40,000 tokens                                  
  - This happens on EVERY request to the LLM                                                 
  - Costs scale linearly with number of backends

Solution needed: Users deploying VirtualMCPServer resources in Kubernetes need to          
understand:                                                                                
  - How to enable the optimizer to reduce token usage by 70-95%                              
  - What embedding service to deploy and how to configure it                                 
  - How the find_tool/call_tool workflow works                                               
  - Platform requirements and current limitations                                            
  - Expected token savings for their use case                                                
                                                                                             
Target audience:                                                                           
  - Platform engineers deploying vMCP in Kubernetes                                          
  - Teams aggregating 5+ MCP backends                                                        
  - Organizations optimizing LLM API costs                                                   
  - Developers building LLM applications with large toolsets                                 
                                                                                             
Documentation goals:                                                                       
  1. Help users deploy and configure the optimizer correctly                                 
  2. Set clear expectations about token savings                                              
  3. Explain the embedding service requirement and options                                   
  4. Provide complete working examples                                                       
  5. Document limitations upfront to avoid deployment issues   

### Additional context

Current status (important to document):                                                    
  - ✅ Feature is implemented and working                                                    
  - ✅ E2E tests passing                                                                     
  - ⚠️ Experimental status (requires feature flag in UI)                                     
  - ⚠️ Platform limitations (Linux, Podman on Windows, network isolation)                    
  - ✅ Production-ready embedding service (HuggingFace TEI)                                  
                                                                                             
Documentation structure recommendation:                                                    
  1. Create new guide: docs/toolhive/guides-vmcp/mcp-optimizer.mdx                           
  2. Add to sidebar after composite-tools.mdx                                                
  3. Cross-reference from:                                                                   
    - configuration.mdx (mention optimizer config option)                                    
    - tool-aggregation.mdx (mention as optimization technique)                               
    - Tutorials section (link to complete tutorial)            

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Gap]: Document MCP Optimizer (Context Optimization) for vMCP #500

What needs documentation?

Context and references

Use case

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Gap]: Document MCP Optimizer (Context Optimization) for vMCP #500

Description

What needs documentation?

Context and references

Use case

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions