Colibri

Technically — MCP orchestration runtime. Architecturally — three-axis control plane. Philosophically — system of legitimized agentic activity.

What is Colibri?

Colibri is a documentation-first TypeScript MCP runtime that unites task orchestration, audit trails, and cryptographic proof generation in one stdio server. Its Phase 0 core comprises:

Phase 0 Core — What You Build First

14-tool MCP surface (stdio, shipped) — 5 β Task · 4 ζ Audit · 2 η Proof · 1 ε (skill_list) · 2 System (server_ping, server_health). R74.5 planned 19; Wave H reconciled the shipped list (see ADR-004 R75 amendment). server_info / server_shutdown / task_transition / task_depends_on / audit_session_end remain unimplemented; thought_record_list was added not in the original 19.
Two-phase startup (init then ready) with 4 runtime modes: FULL, READONLY, TEST, MINIMAL
Single-writer SQLite (data/colibri.db) with WAL mode, via better-sqlite3
5-stage α middleware chain — tool-lock → schema validate → audit enter → dispatch → audit exit
8-state β FSM task pipeline — INIT → GATHER → ANALYZE → PLAN → APPLY → VERIFY → DONE (+ CANCELLED), enforced at the middleware layer
Hash-chained ζ thought trail — every decision recorded and verifiable via audit_verify_chain
Merkle η proof store — cryptographic proof of execution, sealed via merkle_finalize / merkle_root

Full Vision — Advanced Components (not in Phase 0)

δ Multi-model router (Phase 1.5; Phase 0 stubs shipped R75 Wave I per ADR-005 §Decision) — Phase 0: constant scoring (always Claude) + single-member fallback chain, library-only. Phase 1.5: weighted multi-model scoring + N-member fallback + circuit breaker
κ Deterministic rule engine (Phase 1–2) — Chevrotain-parsed formal DSL for decision making
λ Reputation model (Phase 1–2) — agent and action credibility tracking
μ BFT consensus (Phase 3+) — Byzantine fault-tolerant agreement for critical decisions
ξ Governance layer (Phase 3+) — institutional rules and policy enforcement
θ Identity fabric (Phase 3+) — cryptographic identity and authorization

Complexity Budget

Colibri has three layers of depth. You only need the first to start.

Layer 1 (Phase 0): Working Agent Runtime

Delivers: MCP server + task pipeline + audit trail + proof store
Build time: ~6 weeks (per phase estimate)
What it does: Execute tasks deterministically, audit every step, prove work happened
Enough for: Internal automation, task orchestration, agent-as-a-service

Layer 2 (Phase 1–2): Intelligent Agent Runtime

Adds: Multi-model routing + rule engine + reputation scoring + fallback chains
What it does: Route work to the right model, enforce rules, track agent credibility
Enough for: Production AI workloads, policy-driven automation, cost/quality optimization

Layer 3 (Phase 3–8): Enterprise Multi-Party Runtime

Adds: BFT consensus + governance + institutional rules + identity fabric + cryptographic proof aggregation
What it does: Byzantine fault tolerance, multi-party decision-making, institutional accountability
Enough for: Regulated industries, multi-agent systems, institutional compliance

Start with Layer 1. Add layers as your needs grow.

Three-Axis Architecture

Colibri operates on three independent but intertwined axes:

Execution of work — Tasks flow through a formal pipeline. Each task state (INIT, GATHER, ANALYZE, PLAN, APPLY, VERIFY, DONE) is executed agentically with full audit instrumentation. Output is deterministic, verifiable, and reproducible.
Management of intelligence — A weighted model router (δ) selects between candidate models scoring across quality, cost, latency, and load, with a fallback chain for graceful degradation. Phase 0 runs Claude-only via library-only stubs (shipped R75 Wave I per ADR-005 §Decision): constant scoring + single-member fallback. Phase 1.5 activates real multi-model scoring and a weighted fallback chain.
Legitimacy of action — Every action must be explainable, verifiable, and institutionally permissible. This axis comprises decision trails (what was decided and why), Merkle proofs (cryptographic proof of work), deterministic rules (formal constraints), reputation (credibility of agents), consensus (agreement when needed), governance (policy enforcement), and identity (who did it).

Together, these axes form a control plane for agentic work — not just a backend that produces results, but a system that ensures results are produced correctly, intelligently, and legitimately.

The Philosophy

Colibri is an attempt to build an agentic system where not just the result matters, but the right to the result.

Work must be done (execution axis). Intelligence must be correctly chosen (intelligence axis). And the action itself must be legitimized (legitimacy axis) — recorded in an immutable trail, proven through cryptographic hashing, verified against deterministic rules, passed through reputation checks, and when necessary confirmed by consensus and governance mechanics.

In this sense, Colibri is not "another AI assistant" but a project of an environment where agentic activity gets:

Memory — decisions recorded and retrievable
Provability — cryptographic evidence of work and reasoning
Accountability — every action traceable to its agent and decision context
Constitutional constraints — enforceable rules and governance

Current State

Phase 0 is 100% complete on non-deferred tasks (28/28 shipped as of R75 Wave I — 2026-04-18). P0.5.1/P0.5.2 shipped as δ Model Router library stubs per ADR-005 §Decision (PR #149 scoring, PR #150 fallback). Full multi-model routing lands in Phase 1.5.

Of the 15 Greek-letter concepts:

8 ship code at Phase 0 granularity (colibri_code: partial): α System Core · β Task Pipeline · γ Server Lifecycle · δ Model Router (library-only stubs) · ε Skill Registry · ζ Decision Trail · η Proof Store · ν Integrations
7 remain spec-only for later phases (colibri_code: none): θ Consensus · ι State Fork · κ Rule Engine · λ Reputation · μ Integrity Monitor · ξ Identity · π Governance

src/server.ts, data/colibri.db (created at runtime in WAL mode), and a Jest suite with 1084 passing tests are all live at the last mainline commit.

Documentation corpus (R75 post-Wave-I):

~700 markdown files across 12 top-level directories (CLAUDE.md §9.2 — CANON · MIRROR · HERITAGE · SCRATCH · VENDOR)
docs/ reorganized around the World Schema tree (0-mutate/ through 5-time/)
49 language-agnostic algorithm extractions in docs/reference/extractions/
63 implementation tasks across 8 phases (Phase 0 = 28; all 28 shipped — 2 as library-only δ stubs per ADR-005)
19 locked protocol specifications (docs/spec/)
14 new conceptual-glue documents linking the 15 Greek-letter concepts

The Phase 0 specification is complete; the implementation is effectively complete; Phase 1 planning is the next round's scope.

Minimum Viable Start

Phase 0 delivers a working agent runtime with a 14-tool stdio MCP surface:

MCP server (src/server.ts, shipped P0.2.1) accepting tool calls over stdio
β Task pipeline — 5 tools (task_create, task_list, task_get, task_update with FSM-routing, task_next_actions) enforcing the INIT → GATHER → ANALYZE → PLAN → APPLY → VERIFY → DONE FSM at the middleware layer
SQLite backend (data/colibri.db, created at runtime in WAL mode; schema shipped P0.2.2) via better-sqlite3
ζ/η Audit + Proof — 6 tools (audit_session_start, thought_record, thought_record_list, audit_verify_chain, merkle_finalize, merkle_root) recording and sealing every decision
ε Skill listing — skill_list for discovery; capability index shipped Wave H (P0.6.3)
System surface — server_ping, server_health (2 live; server_info / server_shutdown were planned but not yet implemented)
Deterministic execution — same inputs always yield same outputs; same-inputs-same-Merkle-root is the legitimacy guarantee

This is enough to orchestrate agent work deterministically, audit every decision, prove work was done correctly, and reproduce any execution from the audit trail.

No consensus, no governance, no multi-model routing, no agent spawning — just solid, verifiable task execution. Intelligence and legitimacy extensions layer in across later phases.

Start Here

CLAUDE.md — Executor rules for Claude and any other AI coding client (four-tier agent hierarchy, worktree rule, writeback protocol)
Colibri System Vision — Canonical vision (single source of truth)
World Schema — The organizational spine: how every concept relates
Task Breakdown — 63 tasks, 8 phases, dependency graph

Documentation

All docs are in docs/, organized by the World Schema tree — the layers a mutation passes through:

Layer	Content
0-mutate	The foundational idea: every interaction is a state mutation
1-transport	MCP + JSON-RPC + 14 tools — how mutations enter
2-plugin	The server: boot, modes, database, middleware
3-world	Runtime model: physics (laws), social (agents), execution (pipeline)
4-additions	ν Integrations: Git, Obsidian, Claude API
5-time	Session → round → task → roadmap
spec	19 locked protocol specifications (s01–s19)
decisions	Architectural Decision Records (ADR-001–006)
agents	Agent contracts: sigma, pm, executor, writeback
guides	Quick-start, implementation tasks, skill authoring
reference	Glossary, Phase 0 tools, heritage extractions

Stack

Shipped at Phase 0:

TypeScript 5.3+ (ESM, NodeNext) — MCP server, middleware, domains
@modelcontextprotocol/sdk — MCP protocol implementation (stdio transport live; streamable HTTP client shipped via ν MCP bridge)
Zod 3.23 — schema validation and type safety (v4 was planned but v3.23 ships)
better-sqlite3 — single-writer SQLite with WAL mode, via a migration runner at src/db/
merkletreejs — η Proof Store Merkle tree construction (shipped Wave E)
gray-matter — ε skill-registry frontmatter parsing (shipped Wave C)
Jest (ESM via --experimental-vm-modules) — 1084 tests passing at the last mainline commit (R75 Wave I close a22dd23e)

Target for later phases (not yet implemented):

Chevrotain — κ rule-engine DSL parser generator (Phase 1)

Repository layout

├── .agents/                  ← Agent-ops corpus (CANON for skills/)
│   ├── skills/               ← 23 canonical colibri-* skill definitions
│   ├── spawns/               ← HERITAGE — Sigma round traces, read-only
│   └── swarms/               ← HERITAGE — donor swarm templates, read-only
├── .claude/
│   └── skills/               ← MIRROR (drifting) — do not edit by hand
├── .github/                  ← Issue/PR templates + docs-integrity CI
├── .worktrees/               ← SCRATCH — per-task feature worktrees
├── docs/                     ← SINGLE ACTIVE CANON (World Schema tree)
│   ├── colibri-system.md     ← Canonical vision
│   ├── world-schema.md       ← Organizational spine (v3)
│   ├── 0-mutate/             ← Foundational idea + mutation lifecycle
│   ├── 1-transport/          ← MCP + 14 tools
│   ├── 2-plugin/             ← Server: boot, modes, database, middleware
│   ├── 3-world/              ← Runtime model (physics/social/execution)
│   ├── 4-additions/          ← ν Integrations
│   ├── 5-time/               ← Session → round → task → roadmap
│   ├── spec/                 ← 19 locked protocol specifications
│   ├── architecture/decisions/← ADRs (ADR-001–006)
│   ├── agents/               ← Agent contracts
│   ├── reference/extractions/← 49 language-agnostic extractions
│   └── guides/implementation/← Phase 0 task prompts + breakdown
├── data/
│   └── ams.db                ← HERITAGE — AMS donor task store, kept
│                                through Phase 0 bootstrap only.
│                                Target: data/colibri.db (P0.2.2)
├── src/                      ← TypeScript runtime (shipped R75 Wave A+)
│   ├── server.ts             ← MCP entry point (P0.2.1)
│   ├── db/                   ← better-sqlite3 + migration runner (P0.2.2)
│   └── domains/              ← β tasks, δ router (stubs), ε skills, ζ trail, η proof, ν integrations
├── src/__tests__/            ← Jest ESM test suite (1001 tests; P0.1.2)
├── AGENTS.md · CLAUDE.md · README.md · CONTRIBUTING.md · SECURITY.md · CODE_OF_CONDUCT.md

Zone model: every top-level folder declares a zone (CANON · HERITAGE · MIRROR · SCRATCH · VENDOR). See CLAUDE.md §9.2 for the full 12-folder manifest.

One canon: docs/ is the single active documentation surface. Everything in .agents/spawns/, .agents/swarms/, projects/, and data/ is HERITAGE — describing pre-R53 donors (AMS, CogniMesh, Phoenix), not Colibri Phase 0.

License

Apache-2.0 WITH Commons-Clause

Name		Name	Last commit message	Last commit date
Latest commit History 748 Commits
.agents		.agents
.claude		.claude
.github		.github
.vscode		.vscode
assets		assets
data		data
docs		docs
projects		projects
scripts		scripts
src		src
temp		temp
.dockerignore		.dockerignore
.env.example		.env.example
.eslintignore		.eslintignore
.eslintrc.json		.eslintrc.json
.gitattributes		.gitattributes
.gitignore		.gitignore
.mcp.json		.mcp.json
.prettierignore		.prettierignore
.prettierrc		.prettierrc
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
jest.config.ts		jest.config.ts
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Colibri

What is Colibri?

Phase 0 Core — What You Build First

Full Vision — Advanced Components (not in Phase 0)

Complexity Budget

Three-Axis Architecture

The Philosophy

Current State

Minimum Viable Start

Start Here

Documentation

Stack

Repository layout

License

About

Uh oh!

Releases 14

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Colibri

What is Colibri?

Phase 0 Core — What You Build First

Full Vision — Advanced Components (not in Phase 0)

Complexity Budget

Three-Axis Architecture

The Philosophy

Current State

Minimum Viable Start

Start Here

Documentation

Stack

Repository layout

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages