Skip to content

MCP Resource Limits Configuration Proposal #2073

@aiob3

Description

@aiob3

MCP Resource Limits Configuration Proposal

Problem Statement

Docker MCP Gateway / cagent currently has no built-in mechanism to limit:

  • Maximum concurrent MCP instances running in parallel
  • Memory per instance
  • Total memory consumption
  • CPU allocation per tool
  • Instance lifecycle (timeout, cleanup)

This leads to resource exhaustion on local development machines (especially WSL 2), causing:

  • 90%+ CPU usage from uncontrolled docker-mcp.exe spawning
  • Memory bloat
  • System lockups
  • Daemon crashes

Example of Current Issue

When adding multiple MCP tools (exa, fetch, filesystem, clickhouse, playwright, etc.), each tool call can spawn new instances without limits, resulting in 100+ orphaned processes consuming resources indefinitely.

Proposed Solution

Add a .mcp-limits.yaml configuration file that allows users to define resource constraints:

# .mcp-limits.yaml
mcp:
  global:
    max_concurrent_instances: 10          # Max total instances across all tools
    max_total_memory: 2048                # MB - total memory cap
    max_total_cpu: 80                     # Percentage (0-100)
    cleanup_orphans: true
    orphan_detection_interval: 30         # seconds
    instance_timeout: 600                 # seconds (10 minutes)

  tools:
    exa:
      max_instances: 2
      max_memory: 256                     # MB per instance
      max_cpu: 25                         # Percentage per instance
      timeout: 120                        # seconds

    fetch:
      max_instances: 3
      max_memory: 512
      max_cpu: 50
      timeout: 180

    playwright:
      max_instances: 1
      max_memory: 1024
      max_cpu: 50
      timeout: 300

    clickhouse:
      max_instances: 1
      max_memory: 512
      max_cpu: 40
      timeout: 600

  # Fallback for tools not explicitly configured
  default:
    max_instances: 2
    max_memory: 256
    max_cpu: 30
    timeout: 300

Implementation Details

1. Configuration Loading

# Locations checked in order:
~/.mcp-limits.yaml              # User home
.mcp-limits.yaml                # Project root
$CAGENT_CONFIG_DIR/limits.yaml  # Environment variable
/etc/cagent/mcp-limits.yaml     # System-wide (Linux/Mac)

2. Instance Manager Enhancement

class MCPInstanceManager:
    def __init__(self, config_path):
        self.config = load_config(config_path)
        self.active_instances = {}
        self.start_orphan_cleanup_task()
    
    def spawn_instance(self, tool_name):
        """Spawn with resource limits enforced"""
        # Check global limits
        if len(self.active_instances) >= self.config.global.max_concurrent_instances:
            raise MCPLimitExceeded("Max concurrent instances reached")
        
        tool_config = self.config.tools.get(tool_name, self.config.default)
        
        # Check tool-specific limits
        tool_instances = len([i for i in self.active_instances.values() 
                            if i.tool == tool_name])
        if tool_instances >= tool_config.max_instances:
            raise MCPLimitExceeded(f"Max instances for {tool_name} reached")
        
        # Spawn with cgroup/ulimit constraints
        instance = self._spawn_docker_container(
            tool_name,
            memory_limit=f"{tool_config.max_memory}M",
            cpuset=tool_config.max_cpu
        )
        
        self.active_instances[instance.id] = instance
        return instance
    
    def cleanup_orphans(self):
        """Periodically remove dead/timed-out instances"""
        now = time.time()
        to_remove = []
        
        for instance_id, instance in self.active_instances.items():
            elapsed = now - instance.created_at
            if elapsed > instance.config.timeout:
                instance.terminate()
                to_remove.append(instance_id)
        
        for instance_id in to_remove:
            del self.active_instances[instance_id]

3. Monitoring & Alerts

class MCPResourceMonitor:
    def check_health(self):
        """Emit warnings when approaching limits"""
        total_mem = sum(i.memory_usage for i in self.instances.values())
        total_cpu = sum(i.cpu_usage for i in self.instances.values())
        
        if total_mem > self.config.global.max_total_memory * 0.8:
            logger.warning(f"Memory usage at {total_mem}MB ({total_mem / self.config.global.max_total_memory * 100}%)")
        
        if total_cpu > self.config.global.max_total_cpu * 0.8:
            logger.warning(f"CPU usage at {total_cpu}%")

4. CLI Integration

# Show current limits
cagent mcp limits show

# Update limits
cagent mcp limits set --max-instances 5 --max-memory 2048

# Monitor usage in real-time
cagent mcp monitor

# Force cleanup
cagent mcp cleanup --force

Benefits

  1. Prevents resource exhaustion: No more 90%+ CPU spikes
  2. Production-ready: Scales safely in constrained environments
  3. User-friendly: Zero-config defaults, easy to customize
  4. Transparent: Monitor actual usage vs limits
  5. Safe: Graceful degradation instead of crashes

Testing Scenarios

# Test 1: Spawn 20 tools, should queue/fail gracefully
for i in {1..20}; do cagent call exa --query "test$i" & done

# Test 2: Monitor that cleanup removes orphans
cagent mcp monitor  # Should show instances terminating after timeout

# Test 3: Verify memory cap respected
cagent mcp limits set --max-memory 512
# Large operation should fail or queue

Backward Compatibility

  • All limits default to unlimited if .mcp-limits.yaml not found
  • Existing configs work unchanged
  • Environment variable overrides available for CI/CD

Related Issues

  • Similar pattern used by: Kubernetes (resource requests/limits), Docker (--memory, --cpus), Systemd (MemoryMax, CPUQuota)

Files to Modify

  1. cagent/config/limits.py - New config parser
  2. cagent/mcp/instance_manager.py - Resource enforcement
  3. cagent/mcp/monitor.py - Health checks
  4. cagent/cli/commands/mcp.py - New CLI commands
  5. docs/mcp-configuration.md - Documentation
  6. examples/.mcp-limits.yaml - Example config

Implementation Priority

  • Phase 1: Global instance/memory limits (MVP)
  • Phase 2: Per-tool limits + cleanup
  • Phase 3: CLI monitoring + auto-tuning
  • Phase 4: Integration with Docker Desktop API for WSL 2

Author: aiob3 & Gordon (Security/Performance concern)
Date: 03/12/2026
Severity: Medium (Impacts usability, not security)
Type: Enhancement/Feature Request

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions