Note: This guide is for OpenClaw (formerly Moltbot/Clawdbot).
- Home (README)
- Plain English
- Technical
- Privacy + safety
- Deployment
- Optimizations
- Reference
Goal: run OpenClaw with local AI models via Docker Model Runner for zero API costs and complete privacy.
This is the ideal setup when:
- You want complete privacy (no data leaves your machine)
- You want zero ongoing API costs
- You have capable hardware (Apple Silicon, NVIDIA GPU, or AMD GPU)
- You want to work offline
Related official docs:
- https://docs.docker.com/desktop/features/model-runner/
- https://docs.openclaw.ai/gateway/local-models
- Docker Blog: Private Personal AI with Clawdbot + DMR
- Docker Blog: OpenCode + Model Runner for Private AI Coding
Docker Model Runner (DMR) is Docker Desktop's built-in tool for running LLMs locally. It provides:
- OpenAI-compatible API at
http://model-runner.docker.internal/v1 - Automatic resource management — models load on demand, unload when idle
- Multiple inference engines — llama.cpp (all platforms), vLLM (NVIDIA), Diffusers (image generation)
- Zero data collection option available
Plain English: Instead of sending your messages to Anthropic/OpenAI (and paying per token), you run a local AI model on your own computer. Docker Desktop handles all the complexity — downloading models, GPU acceleration, and serving them via an API that looks just like OpenAI's.
| Scenario | Recommendation |
|---|---|
| Maximum privacy | DMR — data never leaves your machine |
| Zero ongoing costs | DMR — one-time model download only |
| Best model quality | Cloud — Claude Opus, GPT-4o are still more capable |
| Offline operation | DMR — works without internet after setup |
| Limited hardware | Cloud — no GPU/RAM requirements |
| Quick setup | Cloud — just add API key |
| Production/commercial | Cloud — better reliability and quality guarantees |
Bottom line: DMR is excellent for privacy-conscious personal use, development/testing, and learning. For production workloads requiring highest-quality responses, cloud providers still have an edge.
Docker Model Runner has different requirements depending on your platform:
| Platform | Requirements |
|---|---|
| macOS | Apple Silicon (M1/M2/M3/M4) required |
| Windows | NVIDIA GPU (driver 576.57+) or Qualcomm Adreno |
| Linux | CPU, NVIDIA (CUDA), AMD (ROCm), or Vulkan |
| Model Size | RAM Required | Best For |
|---|---|---|
| 1-3B params | 8GB | Quick responses, simple tasks |
| 7-8B params | 16GB | General assistant, coding help |
| 13B+ params | 32GB+ | Complex reasoning, longer context |
Tip: If you're unsure, start with a 7B model like ai/qwen2.5-coder — it balances quality and resource usage well.
Docker Model Runner requires Docker Desktop 4.40 or later.
macOS:
# Check current version
docker --version
# Update via brew (if installed via brew)
brew upgrade --cask docker
# Or download from https://www.docker.com/products/docker-desktop/Windows/Linux: Download from https://www.docker.com/products/docker-desktop/
- Open Docker Desktop
- Go to Settings (gear icon)
- Navigate to Features in development → Beta features
- Enable Docker Model Runner
- Click Apply & restart
Alternative (CLI):
docker desktop enable model-runner --tcp 12434# List available models
docker model list
# Should show available models (empty initially)Choose a model appropriate for your hardware and use case:
# Recommended: GLM 4.7 Flash (fast coding assistance)
docker model pull glm-4.7-flash
# Alternative: Qwen3 Coder (agentic coding workflows)
docker model pull qwen3-coder
# Alternative: GPT-OSS (complex reasoning & scheduling)
docker model pull gpt-ossOther popular models:
# Qwen 2.5 Coder (proven coding model)
docker model pull ai/qwen2.5-coder
# Llama 3.2 (general purpose)
docker model pull ai/llama3.2
# Mistral (fast, efficient)
docker model pull ai/mistral# Run a quick test
docker model run glm-4.7-flash "What is a recursive function?"You should see a response generated locally.
# Add Docker Model Runner as a provider
openclaw config set models.providers.dmr.baseUrl http://model-runner.docker.internal/v1
# API key is not needed but the field must exist
openclaw config set models.providers.dmr.apiKey "not-needed"
# Add a model entry under the provider
openclaw config set models.providers.dmr.models '[{ "id": "glm-4.7-flash" }]'# Check status
openclaw status
# Test a message
openclaw message send --to self "Hello, are you running locally?"openclaw security audit --deepIf issues are found:
openclaw security audit --fixBased on Docker's official blog post on private AI with Clawdbot + DMR:
| Model | Best For | Pull Command |
|---|---|---|
glm-4.7-flash |
Fast coding assistance and debugging | docker model pull glm-4.7-flash |
qwen3-coder |
Agentic coding workflows | docker model pull qwen3-coder |
gpt-oss |
Complex reasoning and scheduling | docker model pull gpt-oss |
| Model | Size | Best For | Pull Command |
|---|---|---|---|
ai/qwen2.5-coder |
7B | Coding, technical tasks | docker model pull ai/qwen2.5-coder |
ai/llama3.2 |
3B/8B | General conversation | docker model pull ai/llama3.2 |
ai/mistral |
7B | Fast responses, balanced | docker model pull ai/mistral |
ai/gemma2 |
9B | Google's open model | docker model pull ai/gemma2 |
ai/phi-4 |
14B | Microsoft's reasoning model | docker model pull ai/phi-4 |
ai/deepseek-r1 |
Various | Advanced reasoning (larger) | docker model pull ai/deepseek-r1 |
For most users: Start with glm-4.7-flash for fast coding help or qwen3-coder for agentic workflows — these are Docker's current recommendations and well-suited for OpenClaw's technical assistant use cases.
Some models support configurable context windows. This is set via the model configuration:
# Check model details
docker model inspect glm-4.7-flashDMR automatically uses available GPU acceleration:
- macOS: Metal (Apple Silicon)
- Windows/Linux: CUDA (NVIDIA) or ROCm (AMD)
No additional configuration needed — Docker Desktop detects and uses your GPU automatically.
Docker Desktop allows configuring resource limits for the Model Runner:
- Open Docker Desktop Settings
- Navigate to Resources
- Adjust Memory and CPU limits as needed
Recommended: Allocate at least 8GB RAM for 7B models, 16GB for 13B+ models.
You can configure OpenClaw to use local models for most requests while falling back to cloud providers for complex tasks.
Option 1: Manual switching
# Switch to local (assumes 'dmr' provider already configured)
openclaw config set agents.defaults.model.primary "dmr/glm-4.7-flash"
# Switch to cloud (replace 'anthropic' with your cloud provider name)
openclaw config set agents.defaults.model.primary "anthropic/claude-sonnet-4-20250514"Option 2: Use profiles
# Create a local profile
OPENCLAW_PROFILE=local openclaw config set agents.defaults.model.primary "dmr/glm-4.7-flash"
# Create a cloud profile
OPENCLAW_PROFILE=cloud openclaw config set agents.defaults.model.primary "anthropic/claude-sonnet-4-20250514"
# Run with specific profile
OPENCLAW_PROFILE=local openclaw gateway runLocal models (7B-13B parameters) are generally less capable than frontier cloud models (Claude Opus, GPT-4o):
| Capability | Local 7B | Cloud Frontier |
|---|---|---|
| Simple Q&A | Good | Excellent |
| Code generation | Good | Excellent |
| Complex reasoning | Limited | Excellent |
| Long context | Limited (typically 4K-8K) | Large (100K+) |
| Tool use/function calling | Varies by model | Excellent |
- RAM: Models load into memory; expect 8-16GB+ usage
- Disk: Each model is 2-20GB on disk
- GPU VRAM: If using GPU acceleration, model fits in VRAM
- First response: Cold start takes 10-30 seconds as model loads
- Local models may occasionally produce lower-quality or inconsistent responses
- No automatic retry/fallback without additional configuration
- Model updates require manual pulls
- Verify Docker Desktop is running
- Verify Model Runner is enabled in Docker Desktop settings
- Check the model is pulled:
docker model list - Verify the API is accessible:
curl http://localhost:12434/v1/models
- Check available RAM (model may be swapping to disk)
- Try a smaller/faster model (
glm-4.7-flashis optimized for speed) - Verify GPU acceleration is working:
docker model inspect glm-4.7-flash # Look for "accelerator" field
Ensure the model name in OpenClaw config matches exactly what you pulled:
# List pulled models
docker model list
# Verify config
openclaw config get provider.modelModels stay loaded for fast responses. To unload:
# Stop all running models
docker model stop --allOr configure auto-unload timeout in Docker Desktop settings.
The special hostname model-runner.docker.internal only works from within Docker. If OpenClaw runs outside Docker:
# Use localhost with the exposed port instead
openclaw config set models.providers.dmr.baseUrl http://localhost:12434/v1- No cloud transmission: Inference happens entirely on your machine
- No telemetry option: Docker Desktop can be configured to disable telemetry
- No API keys exposed: No secrets to manage for local inference
- Session transcripts: Still stored locally in
~/.openclaw/ - Channel tokens: WhatsApp/Telegram tokens still needed
- Network exposure: Keep Gateway loopback-only
- Disk encryption: Enable FileVault/LUKS for credential protection
For the most private setup:
# Run Gateway loopback-only
openclaw config set gateway.bind loopback
# Use local model
openclaw config set models.providers.dmr.baseUrl http://model-runner.docker.internal/v1
openclaw config set models.providers.dmr.models '[{ "id": "glm-4.7-flash" }]'
# Enable Docker sandbox for tool execution
openclaw config set agents.defaults.sandbox.mode "all"
openclaw config set agents.defaults.sandbox.docker.network "none"Combined with FileVault/LUKS disk encryption, this keeps all AI processing and data on hardware you control.
- Docker Desktop 4.40+ installed
- Model Runner feature enabled
- Resource limits configured appropriately
- Docker Desktop telemetry disabled (if desired)
-
models.providers.dmr.baseUrlpoints to Model Runner -
models.providers.dmr.modelsincludes the target model -
gateway.bindset toloopback - DM policy is
allowlistorpairing
- Model pulled and verified working
- Model appropriate for available RAM
- GPU acceleration verified (if available)
- FileVault/LUKS disk encryption enabled
-
~/.openclaw/permissions are 0700 - Shell history protection enabled
-
openclaw security audit --deeppassed
| Deployment | Setup Cost | Monthly Cost | Notes |
|---|---|---|---|
| DMR | Hardware (existing) | $0 | Model download bandwidth only |
| Cloud (Anthropic) | $0 | $5-50+ | Pay per token |
| VPS + Cloud | $0 | $6-20+ | VPS + API costs |
| Moltworker | $0 | $5-15 | Cloudflare Workers plan |
Bottom line: If you already have capable hardware (Apple Silicon Mac, gaming PC with GPU), DMR provides indefinite local AI at zero marginal cost.
After setup:
- Test tool execution — verify web fetch, exec tools work correctly
- Experiment with models — try different models for your use cases
- Configure channels — set up WhatsApp/Telegram as usual
- Review transcripts — ensure response quality meets your needs
For channel setup, see: Pairing Guide
For local model optimization: https://docs.openclaw.ai/gateway/local-models