Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@ dist/
*.tsbuildinfo
.worktrees/
.superpowers/
coverage/
1 change: 1 addition & 0 deletions .prettierignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@ dist/
node_modules/
pnpm-lock.yaml
charts/
coverage/
126 changes: 87 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# @copilotkit/llmock [![Unit Tests](https://github.com/CopilotKit/llmock/actions/workflows/test-unit.yml/badge.svg)](https://github.com/CopilotKit/llmock/actions/workflows/test-unit.yml) [![Drift Tests](https://github.com/CopilotKit/llmock/actions/workflows/test-drift.yml/badge.svg)](https://github.com/CopilotKit/llmock/actions/workflows/test-drift.yml) [![npm version](https://img.shields.io/npm/v/@copilotkit/llmock)](https://www.npmjs.com/package/@copilotkit/llmock)

Deterministic mock LLM server for testing. A real HTTP server on a real port — not an in-process interceptor — so every process in your stack (Playwright, Next.js, agent workers, microservices) can point at it via `OPENAI_BASE_URL` / `ANTHROPIC_BASE_URL` and get reproducible, instant responses. Streams SSE in real OpenAI, Claude, Gemini, Bedrock, Azure, Vertex AI, Ollama, and Cohere API formats, driven entirely by fixtures. Zero runtime dependencies.
Mock infrastructure for AI application testing — LLM APIs, MCP tools, A2A agents, vector databases, search, and more. Real HTTP server on a real port, fixture-driven, zero runtime dependencies.

## Quick Start

Expand All @@ -23,72 +23,106 @@ const url = await mock.start();
await mock.stop();
```

## When to Use This vs MSW
## Usage Scenarios

[MSW (Mock Service Worker)](https://mswjs.io/) is a popular API mocking library, but it solves a different problem.
### In-process testing

**The key difference is architecture.** llmock runs a real HTTP server on a port. MSW patches `http`/`https`/`fetch` modules inside a single Node.js process. MSW can only intercept requests from the process that calls `server.listen()` — child processes, separate services, and workers are unaffected.
Use the programmatic API to start and stop the mock server in your test setup. Every test framework works — Vitest, Jest, Playwright, Mocha, anything.

This matters for E2E tests where multiple processes make LLM API calls:
```typescript
import { LLMock } from "@copilotkit/llmock";

const mock = new LLMock({ port: 5555 });
mock.loadFixtureDir("./fixtures");
const url = await mock.start();
process.env.OPENAI_BASE_URL = `${url}/v1`;

// ... run tests ...

await mock.stop();
```

### Running locally

Use the CLI with `--watch` to hot-reload fixtures as you edit them. Point your app at the mock and iterate without touching real APIs.

```bash
llmock -p 4010 -f ./fixtures --watch
```
Playwright test runner (Node)
└─ controls browser → Next.js app (separate process)
└─ OPENAI_BASE_URL → llmock :5555
├─ Mastra agent workers
├─ LangGraph workers
└─ CopilotKit runtime

### CI pipelines

Use the Docker image with `--strict` mode and record-and-replay for deterministic, zero-cost CI runs.

```yaml
# GitHub Actions example
- name: Start aimock
run: |
docker run -d --name aimock \
-v ./fixtures:/fixtures \
-p 4010:4010 \
ghcr.io/copilotkit/aimock \
llmock --strict -f /fixtures

- name: Run tests
env:
OPENAI_BASE_URL: http://localhost:4010/v1
run: pnpm test

- name: Stop aimock
run: docker stop aimock
```

MSW can't intercept any of those calls. llmock can — it's a real server on a real port.
### Cross-language testing

**Use llmock when:**
The Docker image runs as a standalone HTTP server — any language that speaks HTTP can use it. Python, Go, Rust, Ruby, Java, anything.

- Multiple processes need to hit the same mock (E2E tests, agent frameworks, microservices)
- You want multi-provider SSE format out of the box (OpenAI, Claude, Gemini, Bedrock, Azure, Vertex AI, Ollama, Cohere)
- You prefer defining fixtures as JSON files rather than code
- You need a standalone CLI server
```bash
docker run -d -p 4010:4010 ghcr.io/copilotkit/aimock llmock -f /fixtures

**Use MSW when:**
# Python
client = openai.OpenAI(base_url="http://localhost:4010/v1", api_key="mock")

- All API calls originate from a single Node.js process (unit tests, SDK client tests)
- You're mocking many different APIs, not just OpenAI
- You want in-process interception without running a server
# Go
client := openai.NewClient(option.WithBaseURL("http://localhost:4010/v1"))

| Capability | llmock | MSW |
| ---------------------------- | --------------------- | ------------------------------------------------------------------------- |
| Cross-process interception | **Yes** (real server) | **No** (in-process only) |
| OpenAI Chat Completions SSE | **Built-in** | Manual — build `data: {json}\n\n` + `[DONE]` yourself |
| OpenAI Responses API SSE | **Built-in** | Manual — MSW's `sse()` sends `data:` events, not OpenAI's `event:` format |
| Claude Messages API SSE | **Built-in** | Manual — build `event:`/`data:` SSE yourself |
| Gemini streaming | **Built-in** | Manual — build `data:` SSE yourself |
| WebSocket APIs | **Built-in** | **No** |
| Fixture file loading (JSON) | **Yes** | **No** — handlers are code-only |
| Request journal / inspection | **Yes** | **No** — track requests manually |
| Non-streaming responses | **Yes** | **Yes** |
| Error injection (one-shot) | **Yes** | **Yes** (via `server.use()`) |
| CLI for standalone use | **Yes** | **No** |
| Zero dependencies | **Yes** | **No** (~300KB) |
# Rust
let client = Client::new().with_base_url("http://localhost:4010/v1");
```

## Features

- **[Multi-provider support](https://llmock.copilotkit.dev/compatible-providers.html)** — [OpenAI Chat Completions](https://llmock.copilotkit.dev/chat-completions.html), [OpenAI Responses](https://llmock.copilotkit.dev/responses-api.html), [Anthropic Claude](https://llmock.copilotkit.dev/claude-messages.html), [Google Gemini](https://llmock.copilotkit.dev/gemini.html), [AWS Bedrock](https://llmock.copilotkit.dev/aws-bedrock.html) (streaming + Converse), [Azure OpenAI](https://llmock.copilotkit.dev/azure-openai.html), [Vertex AI](https://llmock.copilotkit.dev/vertex-ai.html), [Ollama](https://llmock.copilotkit.dev/ollama.html), [Cohere](https://llmock.copilotkit.dev/cohere.html)
- **[Record-and-replay](https://llmock.copilotkit.dev/record-replay.html)** — VCR-style proxy records real API responses as fixtures for deterministic replay
- **[Multi-provider support](https://llmock.copilotkit.dev/compatible-providers.html)** — [OpenAI Chat Completions](https://llmock.copilotkit.dev/chat-completions.html), [Responses API](https://llmock.copilotkit.dev/responses-api.html), [Anthropic Claude](https://llmock.copilotkit.dev/claude-messages.html), [Google Gemini](https://llmock.copilotkit.dev/gemini.html), [AWS Bedrock](https://llmock.copilotkit.dev/aws-bedrock.html), [Azure OpenAI](https://llmock.copilotkit.dev/azure-openai.html), [Vertex AI](https://llmock.copilotkit.dev/vertex-ai.html), [Ollama](https://llmock.copilotkit.dev/ollama.html), [Cohere](https://llmock.copilotkit.dev/cohere.html)
- **[MCPMock](https://llmock.copilotkit.dev/mcp-mock.html)** — Mock MCP server with tools, resources, prompts, and session management
- **[A2AMock](https://llmock.copilotkit.dev/a2a-mock.html)** — Mock A2A protocol server with agent cards, message routing, and streaming
- **[VectorMock](https://llmock.copilotkit.dev/vector-mock.html)** — Mock vector database with Pinecone, Qdrant, and ChromaDB endpoints
- **[Services](https://llmock.copilotkit.dev/services.html)** — Built-in search (Tavily), rerank (Cohere), and moderation (OpenAI) mocks
- **[Chaos testing](https://llmock.copilotkit.dev/chaos-testing.html)** — Probabilistic failure injection: 500 errors, malformed JSON, mid-stream disconnects
- **[Prometheus metrics](https://llmock.copilotkit.dev/metrics.html)** — Request counts, latencies, and fixture match rates at `/metrics`
- **[Embeddings API](https://llmock.copilotkit.dev/embeddings.html)** — OpenAI-compatible embedding responses with configurable dimensions
- **[Structured output / JSON mode](https://llmock.copilotkit.dev/structured-output.html)** — `response_format`, `json_schema`, and function calling
- **[Sequential responses](https://llmock.copilotkit.dev/sequential-responses.html)** — Stateful multi-turn fixtures that return different responses on each call
- **[Streaming physics](https://llmock.copilotkit.dev/streaming-physics.html)** — Configurable `ttft`, `tps`, and `jitter` for realistic timing
- **[WebSocket APIs](https://llmock.copilotkit.dev/websocket.html)** — OpenAI Responses WS, Realtime API, and Gemini Live
- **[Error injection](https://llmock.copilotkit.dev/error-injection.html)** — One-shot errors, rate limiting, and provider-specific error formats
- **[Chaos testing](https://llmock.copilotkit.dev/chaos-testing.html)** — Probabilistic failure injection: 500 errors, malformed JSON, mid-stream disconnects
- **[Prometheus metrics](https://llmock.copilotkit.dev/metrics.html)** — Request counts, latencies, and fixture match rates at `/metrics`
- **[Request journal](https://llmock.copilotkit.dev/docs.html)** — Record, inspect, and assert on every request
- **[Fixture validation](https://llmock.copilotkit.dev/fixtures.html)** — Schema validation at load time with `--validate-on-load`
- **CLI with hot-reload** — Standalone server with `--watch` for live fixture editing
- **[Docker + Helm](https://llmock.copilotkit.dev/docker.html)** — Container image and Helm chart for CI/CD pipelines
- **Record-and-replay** — VCR-style proxy-on-miss records real API responses as fixtures for deterministic replay
- **[Drift detection](https://llmock.copilotkit.dev/drift-detection.html)** — Daily CI runs against real APIs to catch response format changes
- **Claude Code integration** — `/write-fixtures` skill teaches your AI assistant how to write fixtures correctly

## aimock CLI (Full-Stack Mock)

For projects that need more than LLM mocking, the `aimock` CLI reads a JSON config file and serves all mock services on one port:

```bash
aimock --config aimock.json --port 4010
```

See the [aimock documentation](https://llmock.copilotkit.dev/aimock-cli.html) for config file format and Docker usage.

## CLI Quick Reference

```bash
Expand All @@ -97,6 +131,7 @@ llmock [options]

| Option | Short | Default | Description |
| -------------------- | ----- | ------------ | ------------------------------------------- |
| `--config` | | | Config file for aimock CLI |
| `--port` | `-p` | `4010` | Port to listen on |
| `--host` | `-h` | `127.0.0.1` | Host to bind to |
| `--fixtures` | `-f` | `./fixtures` | Path to fixtures directory or file |
Expand Down Expand Up @@ -137,6 +172,19 @@ Full API reference, fixture format, E2E patterns, and provider-specific guides:

**[https://llmock.copilotkit.dev/docs.html](https://llmock.copilotkit.dev/docs.html)**

## llmock vs MSW

[MSW (Mock Service Worker)](https://mswjs.io/) patches `http`/`https`/`fetch` inside a single Node.js process. llmock runs a real HTTP server on a real port that any process can reach — child processes, microservices, agent workers, Docker containers. MSW can't intercept any of those; llmock can. For a detailed comparison including other tools, see the [full comparison on the docs site](https://llmock.copilotkit.dev/#comparison).

| Capability | llmock | MSW |
| -------------------------- | ---------------------------- | ---------------------- |
| Cross-process interception | **Yes** (real server) | No (in-process only) |
| LLM SSE streaming | **Built-in** (13+ providers) | Manual for each format |
| Fixture files (JSON) | **Yes** | No (code-only) |
| Record & replay | **Yes** | No |
| WebSocket APIs | **Yes** | No |
| Zero dependencies | **Yes** | No (~300KB) |

## Real-World Usage

[CopilotKit](https://github.com/CopilotKit/CopilotKit) uses llmock across its test suite to verify AI agent behavior across multiple LLM providers without hitting real APIs.
Expand Down
Loading
Loading