CopilotKit · jpr5 · Mar 24, 2026 · Mar 24, 2026 · Mar 24, 2026 · Mar 24, 2026
diff --git a/.gitignore b/.gitignore
@@ -3,3 +3,4 @@ dist/
 *.tsbuildinfo
 .worktrees/
 .superpowers/
+coverage/
diff --git a/.prettierignore b/.prettierignore
@@ -2,3 +2,4 @@ dist/
 node_modules/
 pnpm-lock.yaml
 charts/
+coverage/
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # @copilotkit/llmock [![Unit Tests](https://github.com/CopilotKit/llmock/actions/workflows/test-unit.yml/badge.svg)](https://github.com/CopilotKit/llmock/actions/workflows/test-unit.yml) [![Drift Tests](https://github.com/CopilotKit/llmock/actions/workflows/test-drift.yml/badge.svg)](https://github.com/CopilotKit/llmock/actions/workflows/test-drift.yml) [![npm version](https://img.shields.io/npm/v/@copilotkit/llmock)](https://www.npmjs.com/package/@copilotkit/llmock)
 
-Deterministic mock LLM server for testing. A real HTTP server on a real port — not an in-process interceptor — so every process in your stack (Playwright, Next.js, agent workers, microservices) can point at it via `OPENAI_BASE_URL` / `ANTHROPIC_BASE_URL` and get reproducible, instant responses. Streams SSE in real OpenAI, Claude, Gemini, Bedrock, Azure, Vertex AI, Ollama, and Cohere API formats, driven entirely by fixtures. Zero runtime dependencies.
+Mock infrastructure for AI application testing — LLM APIs, MCP tools, A2A agents, vector databases, search, and more. Real HTTP server on a real port, fixture-driven, zero runtime dependencies.
 
 ## Quick Start
 
@@ -23,72 +23,106 @@ const url = await mock.start();
 await mock.stop();
 ```
 
-## When to Use This vs MSW
+## Usage Scenarios
 
-[MSW (Mock Service Worker)](https://mswjs.io/) is a popular API mocking library, but it solves a different problem.
+### In-process testing
 
-**The key difference is architecture.** llmock runs a real HTTP server on a port. MSW patches `http`/`https`/`fetch` modules inside a single Node.js process. MSW can only intercept requests from the process that calls `server.listen()` — child processes, separate services, and workers are unaffected.
+Use the programmatic API to start and stop the mock server in your test setup. Every test framework works — Vitest, Jest, Playwright, Mocha, anything.
 
-This matters for E2E tests where multiple processes make LLM API calls:
+```typescript
+import { LLMock } from "@copilotkit/llmock";
+
+const mock = new LLMock({ port: 5555 });
+mock.loadFixtureDir("./fixtures");
+const url = await mock.start();
+process.env.OPENAI_BASE_URL = `${url}/v1`;
+
+// ... run tests ...
+
+await mock.stop();
+```
+
+### Running locally
 
+Use the CLI with `--watch` to hot-reload fixtures as you edit them. Point your app at the mock and iterate without touching real APIs.
+
+```bash
+llmock -p 4010 -f ./fixtures --watch
 ```
-Playwright test runner (Node)
-  └─ controls browser → Next.js app (separate process)
-                            └─ OPENAI_BASE_URL → llmock :5555
-                                ├─ Mastra agent workers
-                                ├─ LangGraph workers
-                                └─ CopilotKit runtime
+
+### CI pipelines
+
+Use the Docker image with `--strict` mode and record-and-replay for deterministic, zero-cost CI runs.
+
+```yaml
+# GitHub Actions example
+- name: Start aimock
+  run: |
+    docker run -d --name aimock \
+      -v ./fixtures:/fixtures \
+      -p 4010:4010 \
+      ghcr.io/copilotkit/aimock \
+      llmock --strict -f /fixtures
+
+- name: Run tests
+  env:
+    OPENAI_BASE_URL: http://localhost:4010/v1
+  run: pnpm test
+
+- name: Stop aimock
+  run: docker stop aimock
 ```
 
-MSW can't intercept any of those calls. llmock can — it's a real server on a real port.
+### Cross-language testing
 
-**Use llmock when:**
+The Docker image runs as a standalone HTTP server — any language that speaks HTTP can use it. Python, Go, Rust, Ruby, Java, anything.
 
-- Multiple processes need to hit the same mock (E2E tests, agent frameworks, microservices)
-- You want multi-provider SSE format out of the box (OpenAI, Claude, Gemini, Bedrock, Azure, Vertex AI, Ollama, Cohere)
-- You prefer defining fixtures as JSON files rather than code
-- You need a standalone CLI server
+```bash
+docker run -d -p 4010:4010 ghcr.io/copilotkit/aimock llmock -f /fixtures
 
-**Use MSW when:**
+# Python
+client = openai.OpenAI(base_url="http://localhost:4010/v1", api_key="mock")
 
-- All API calls originate from a single Node.js process (unit tests, SDK client tests)
-- You're mocking many different APIs, not just OpenAI
-- You want in-process interception without running a server
+# Go
+client := openai.NewClient(option.WithBaseURL("http://localhost:4010/v1"))
 
-| Capability                   | llmock                | MSW                                                                       |
-| ---------------------------- | --------------------- | ------------------------------------------------------------------------- |
-| Cross-process interception   | **Yes** (real server) | **No** (in-process only)                                                  |
-| OpenAI Chat Completions SSE  | **Built-in**          | Manual — build `data: {json}\n\n` + `[DONE]` yourself                     |
-| OpenAI Responses API SSE     | **Built-in**          | Manual — MSW's `sse()` sends `data:` events, not OpenAI's `event:` format |
-| Claude Messages API SSE      | **Built-in**          | Manual — build `event:`/`data:` SSE yourself                              |
-| Gemini streaming             | **Built-in**          | Manual — build `data:` SSE yourself                                       |
-| WebSocket APIs               | **Built-in**          | **No**                                                                    |
-| Fixture file loading (JSON)  | **Yes**               | **No** — handlers are code-only                                           |
-| Request journal / inspection | **Yes**               | **No** — track requests manually                                          |
-| Non-streaming responses      | **Yes**               | **Yes**                                                                   |
-| Error injection (one-shot)   | **Yes**               | **Yes** (via `server.use()`)                                              |
-| CLI for standalone use       | **Yes**               | **No**                                                                    |
-| Zero dependencies            | **Yes**               | **No** (~300KB)                                                           |
+# Rust
+let client = Client::new().with_base_url("http://localhost:4010/v1");
+```
 
 ## Features
 
-- **[Multi-provider support](https://llmock.copilotkit.dev/compatible-providers.html)** — [OpenAI Chat Completions](https://llmock.copilotkit.dev/chat-completions.html), [OpenAI Responses](https://llmock.copilotkit.dev/responses-api.html), [Anthropic Claude](https://llmock.copilotkit.dev/claude-messages.html), [Google Gemini](https://llmock.copilotkit.dev/gemini.html), [AWS Bedrock](https://llmock.copilotkit.dev/aws-bedrock.html) (streaming + Converse), [Azure OpenAI](https://llmock.copilotkit.dev/azure-openai.html), [Vertex AI](https://llmock.copilotkit.dev/vertex-ai.html), [Ollama](https://llmock.copilotkit.dev/ollama.html), [Cohere](https://llmock.copilotkit.dev/cohere.html)
+- **[Record-and-replay](https://llmock.copilotkit.dev/record-replay.html)** — VCR-style proxy records real API responses as fixtures for deterministic replay
+- **[Multi-provider support](https://llmock.copilotkit.dev/compatible-providers.html)** — [OpenAI Chat Completions](https://llmock.copilotkit.dev/chat-completions.html), [Responses API](https://llmock.copilotkit.dev/responses-api.html), [Anthropic Claude](https://llmock.copilotkit.dev/claude-messages.html), [Google Gemini](https://llmock.copilotkit.dev/gemini.html), [AWS Bedrock](https://llmock.copilotkit.dev/aws-bedrock.html), [Azure OpenAI](https://llmock.copilotkit.dev/azure-openai.html), [Vertex AI](https://llmock.copilotkit.dev/vertex-ai.html), [Ollama](https://llmock.copilotkit.dev/ollama.html), [Cohere](https://llmock.copilotkit.dev/cohere.html)
+- **[MCPMock](https://llmock.copilotkit.dev/mcp-mock.html)** — Mock MCP server with tools, resources, prompts, and session management
+- **[A2AMock](https://llmock.copilotkit.dev/a2a-mock.html)** — Mock A2A protocol server with agent cards, message routing, and streaming
+- **[VectorMock](https://llmock.copilotkit.dev/vector-mock.html)** — Mock vector database with Pinecone, Qdrant, and ChromaDB endpoints
+- **[Services](https://llmock.copilotkit.dev/services.html)** — Built-in search (Tavily), rerank (Cohere), and moderation (OpenAI) mocks
+- **[Chaos testing](https://llmock.copilotkit.dev/chaos-testing.html)** — Probabilistic failure injection: 500 errors, malformed JSON, mid-stream disconnects
+- **[Prometheus metrics](https://llmock.copilotkit.dev/metrics.html)** — Request counts, latencies, and fixture match rates at `/metrics`
 - **[Embeddings API](https://llmock.copilotkit.dev/embeddings.html)** — OpenAI-compatible embedding responses with configurable dimensions
 - **[Structured output / JSON mode](https://llmock.copilotkit.dev/structured-output.html)** — `response_format`, `json_schema`, and function calling
 - **[Sequential responses](https://llmock.copilotkit.dev/sequential-responses.html)** — Stateful multi-turn fixtures that return different responses on each call
 - **[Streaming physics](https://llmock.copilotkit.dev/streaming-physics.html)** — Configurable `ttft`, `tps`, and `jitter` for realistic timing
 - **[WebSocket APIs](https://llmock.copilotkit.dev/websocket.html)** — OpenAI Responses WS, Realtime API, and Gemini Live
 - **[Error injection](https://llmock.copilotkit.dev/error-injection.html)** — One-shot errors, rate limiting, and provider-specific error formats
-- **[Chaos testing](https://llmock.copilotkit.dev/chaos-testing.html)** — Probabilistic failure injection: 500 errors, malformed JSON, mid-stream disconnects
-- **[Prometheus metrics](https://llmock.copilotkit.dev/metrics.html)** — Request counts, latencies, and fixture match rates at `/metrics`
 - **[Request journal](https://llmock.copilotkit.dev/docs.html)** — Record, inspect, and assert on every request
 - **[Fixture validation](https://llmock.copilotkit.dev/fixtures.html)** — Schema validation at load time with `--validate-on-load`
 - **CLI with hot-reload** — Standalone server with `--watch` for live fixture editing
 - **[Docker + Helm](https://llmock.copilotkit.dev/docker.html)** — Container image and Helm chart for CI/CD pipelines
-- **Record-and-replay** — VCR-style proxy-on-miss records real API responses as fixtures for deterministic replay
 - **[Drift detection](https://llmock.copilotkit.dev/drift-detection.html)** — Daily CI runs against real APIs to catch response format changes
 - **Claude Code integration** — `/write-fixtures` skill teaches your AI assistant how to write fixtures correctly
 
+## aimock CLI (Full-Stack Mock)
+
+For projects that need more than LLM mocking, the `aimock` CLI reads a JSON config file and serves all mock services on one port:
+
+```bash
+aimock --config aimock.json --port 4010
+```
+
+See the [aimock documentation](https://llmock.copilotkit.dev/aimock-cli.html) for config file format and Docker usage.
+
 ## CLI Quick Reference
 
 ```bash
@@ -97,6 +131,7 @@ llmock [options]
 
 | Option               | Short | Default      | Description                                 |
 | -------------------- | ----- | ------------ | ------------------------------------------- |
+| `--config`           |       |              | Config file for aimock CLI                  |
 | `--port`             | `-p`  | `4010`       | Port to listen on                           |
 | `--host`             | `-h`  | `127.0.0.1`  | Host to bind to                             |
 | `--fixtures`         | `-f`  | `./fixtures` | Path to fixtures directory or file          |
@@ -137,6 +172,19 @@ Full API reference, fixture format, E2E patterns, and provider-specific guides:
 
 **[https://llmock.copilotkit.dev/docs.html](https://llmock.copilotkit.dev/docs.html)**
 
+## llmock vs MSW
+
+[MSW (Mock Service Worker)](https://mswjs.io/) patches `http`/`https`/`fetch` inside a single Node.js process. llmock runs a real HTTP server on a real port that any process can reach — child processes, microservices, agent workers, Docker containers. MSW can't intercept any of those; llmock can. For a detailed comparison including other tools, see the [full comparison on the docs site](https://llmock.copilotkit.dev/#comparison).
+
+| Capability                 | llmock                       | MSW                    |
+| -------------------------- | ---------------------------- | ---------------------- |
+| Cross-process interception | **Yes** (real server)        | No (in-process only)   |
+| LLM SSE streaming          | **Built-in** (13+ providers) | Manual for each format |
+| Fixture files (JSON)       | **Yes**                      | No (code-only)         |
+| Record & replay            | **Yes**                      | No                     |
+| WebSocket APIs             | **Yes**                      | No                     |
+| Zero dependencies          | **Yes**                      | No (~300KB)            |
+
 ## Real-World Usage
 
 [CopilotKit](https://github.com/CopilotKit/CopilotKit) uses llmock across its test suite to verify AI agent behavior across multiple LLM providers without hitting real APIs.