Skip to content

LastEld/AMS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

748 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Colibri

Technically — MCP orchestration runtime. Architecturally — three-axis control plane. Philosophically — system of legitimized agentic activity.

What is Colibri?

Colibri is a documentation-first TypeScript MCP runtime that unites task orchestration, audit trails, and cryptographic proof generation in one stdio server. Its Phase 0 core comprises:

Phase 0 Core — What You Build First

  • 14-tool MCP surface (stdio, shipped) — 5 β Task · 4 ζ Audit · 2 η Proof · 1 ε (skill_list) · 2 System (server_ping, server_health). R74.5 planned 19; Wave H reconciled the shipped list (see ADR-004 R75 amendment). server_info / server_shutdown / task_transition / task_depends_on / audit_session_end remain unimplemented; thought_record_list was added not in the original 19.
  • Two-phase startup (init then ready) with 4 runtime modes: FULL, READONLY, TEST, MINIMAL
  • Single-writer SQLite (data/colibri.db) with WAL mode, via better-sqlite3
  • 5-stage α middleware chain — tool-lock → schema validate → audit enter → dispatch → audit exit
  • 8-state β FSM task pipelineINIT → GATHER → ANALYZE → PLAN → APPLY → VERIFY → DONE (+ CANCELLED), enforced at the middleware layer
  • Hash-chained ζ thought trail — every decision recorded and verifiable via audit_verify_chain
  • Merkle η proof store — cryptographic proof of execution, sealed via merkle_finalize / merkle_root

Full Vision — Advanced Components (not in Phase 0)

  • δ Multi-model router (Phase 1.5; Phase 0 stubs shipped R75 Wave I per ADR-005 §Decision) — Phase 0: constant scoring (always Claude) + single-member fallback chain, library-only. Phase 1.5: weighted multi-model scoring + N-member fallback + circuit breaker
  • κ Deterministic rule engine (Phase 1–2) — Chevrotain-parsed formal DSL for decision making
  • λ Reputation model (Phase 1–2) — agent and action credibility tracking
  • μ BFT consensus (Phase 3+) — Byzantine fault-tolerant agreement for critical decisions
  • ξ Governance layer (Phase 3+) — institutional rules and policy enforcement
  • θ Identity fabric (Phase 3+) — cryptographic identity and authorization

Complexity Budget

Colibri has three layers of depth. You only need the first to start.

Layer 1 (Phase 0): Working Agent Runtime

  • Delivers: MCP server + task pipeline + audit trail + proof store
  • Build time: ~6 weeks (per phase estimate)
  • What it does: Execute tasks deterministically, audit every step, prove work happened
  • Enough for: Internal automation, task orchestration, agent-as-a-service

Layer 2 (Phase 1–2): Intelligent Agent Runtime

  • Adds: Multi-model routing + rule engine + reputation scoring + fallback chains
  • What it does: Route work to the right model, enforce rules, track agent credibility
  • Enough for: Production AI workloads, policy-driven automation, cost/quality optimization

Layer 3 (Phase 3–8): Enterprise Multi-Party Runtime

  • Adds: BFT consensus + governance + institutional rules + identity fabric + cryptographic proof aggregation
  • What it does: Byzantine fault tolerance, multi-party decision-making, institutional accountability
  • Enough for: Regulated industries, multi-agent systems, institutional compliance

Start with Layer 1. Add layers as your needs grow.

Three-Axis Architecture

Colibri operates on three independent but intertwined axes:

  1. Execution of work — Tasks flow through a formal pipeline. Each task state (INIT, GATHER, ANALYZE, PLAN, APPLY, VERIFY, DONE) is executed agentically with full audit instrumentation. Output is deterministic, verifiable, and reproducible.

  2. Management of intelligence — A weighted model router (δ) selects between candidate models scoring across quality, cost, latency, and load, with a fallback chain for graceful degradation. Phase 0 runs Claude-only via library-only stubs (shipped R75 Wave I per ADR-005 §Decision): constant scoring + single-member fallback. Phase 1.5 activates real multi-model scoring and a weighted fallback chain.

  3. Legitimacy of action — Every action must be explainable, verifiable, and institutionally permissible. This axis comprises decision trails (what was decided and why), Merkle proofs (cryptographic proof of work), deterministic rules (formal constraints), reputation (credibility of agents), consensus (agreement when needed), governance (policy enforcement), and identity (who did it).

Together, these axes form a control plane for agentic work — not just a backend that produces results, but a system that ensures results are produced correctly, intelligently, and legitimately.

The Philosophy

Colibri is an attempt to build an agentic system where not just the result matters, but the right to the result.

Work must be done (execution axis). Intelligence must be correctly chosen (intelligence axis). And the action itself must be legitimized (legitimacy axis) — recorded in an immutable trail, proven through cryptographic hashing, verified against deterministic rules, passed through reputation checks, and when necessary confirmed by consensus and governance mechanics.

In this sense, Colibri is not "another AI assistant" but a project of an environment where agentic activity gets:

  • Memory — decisions recorded and retrievable
  • Provability — cryptographic evidence of work and reasoning
  • Accountability — every action traceable to its agent and decision context
  • Constitutional constraints — enforceable rules and governance

Current State

Phase 0 is 100% complete on non-deferred tasks (28/28 shipped as of R75 Wave I — 2026-04-18). P0.5.1/P0.5.2 shipped as δ Model Router library stubs per ADR-005 §Decision (PR #149 scoring, PR #150 fallback). Full multi-model routing lands in Phase 1.5.

Of the 15 Greek-letter concepts:

  • 8 ship code at Phase 0 granularity (colibri_code: partial): α System Core · β Task Pipeline · γ Server Lifecycle · δ Model Router (library-only stubs) · ε Skill Registry · ζ Decision Trail · η Proof Store · ν Integrations
  • 7 remain spec-only for later phases (colibri_code: none): θ Consensus · ι State Fork · κ Rule Engine · λ Reputation · μ Integrity Monitor · ξ Identity · π Governance

src/server.ts, data/colibri.db (created at runtime in WAL mode), and a Jest suite with 1084 passing tests are all live at the last mainline commit.

Documentation corpus (R75 post-Wave-I):

  • ~700 markdown files across 12 top-level directories (CLAUDE.md §9.2 — CANON · MIRROR · HERITAGE · SCRATCH · VENDOR)
  • docs/ reorganized around the World Schema tree (0-mutate/ through 5-time/)
  • 49 language-agnostic algorithm extractions in docs/reference/extractions/
  • 63 implementation tasks across 8 phases (Phase 0 = 28; all 28 shipped — 2 as library-only δ stubs per ADR-005)
  • 19 locked protocol specifications (docs/spec/)
  • 14 new conceptual-glue documents linking the 15 Greek-letter concepts

The Phase 0 specification is complete; the implementation is effectively complete; Phase 1 planning is the next round's scope.

Minimum Viable Start

Phase 0 delivers a working agent runtime with a 14-tool stdio MCP surface:

  • MCP server (src/server.ts, shipped P0.2.1) accepting tool calls over stdio
  • β Task pipeline — 5 tools (task_create, task_list, task_get, task_update with FSM-routing, task_next_actions) enforcing the INIT → GATHER → ANALYZE → PLAN → APPLY → VERIFY → DONE FSM at the middleware layer
  • SQLite backend (data/colibri.db, created at runtime in WAL mode; schema shipped P0.2.2) via better-sqlite3
  • ζ/η Audit + Proof — 6 tools (audit_session_start, thought_record, thought_record_list, audit_verify_chain, merkle_finalize, merkle_root) recording and sealing every decision
  • ε Skill listingskill_list for discovery; capability index shipped Wave H (P0.6.3)
  • System surfaceserver_ping, server_health (2 live; server_info / server_shutdown were planned but not yet implemented)
  • Deterministic execution — same inputs always yield same outputs; same-inputs-same-Merkle-root is the legitimacy guarantee

This is enough to orchestrate agent work deterministically, audit every decision, prove work was done correctly, and reproduce any execution from the audit trail.

No consensus, no governance, no multi-model routing, no agent spawning — just solid, verifiable task execution. Intelligence and legitimacy extensions layer in across later phases.

Start Here

  • CLAUDE.md — Executor rules for Claude and any other AI coding client (four-tier agent hierarchy, worktree rule, writeback protocol)
  • Colibri System Vision — Canonical vision (single source of truth)
  • World Schema — The organizational spine: how every concept relates
  • Task Breakdown — 63 tasks, 8 phases, dependency graph

Documentation

All docs are in docs/, organized by the World Schema tree — the layers a mutation passes through:

Layer Content
0-mutate The foundational idea: every interaction is a state mutation
1-transport MCP + JSON-RPC + 14 tools — how mutations enter
2-plugin The server: boot, modes, database, middleware
3-world Runtime model: physics (laws), social (agents), execution (pipeline)
4-additions ν Integrations: Git, Obsidian, Claude API
5-time Session → round → task → roadmap
spec 19 locked protocol specifications (s01–s19)
decisions Architectural Decision Records (ADR-001–006)
agents Agent contracts: sigma, pm, executor, writeback
guides Quick-start, implementation tasks, skill authoring
reference Glossary, Phase 0 tools, heritage extractions

Stack

Shipped at Phase 0:

  • TypeScript 5.3+ (ESM, NodeNext) — MCP server, middleware, domains
  • @modelcontextprotocol/sdk — MCP protocol implementation (stdio transport live; streamable HTTP client shipped via ν MCP bridge)
  • Zod 3.23 — schema validation and type safety (v4 was planned but v3.23 ships)
  • better-sqlite3 — single-writer SQLite with WAL mode, via a migration runner at src/db/
  • merkletreejs — η Proof Store Merkle tree construction (shipped Wave E)
  • gray-matter — ε skill-registry frontmatter parsing (shipped Wave C)
  • Jest (ESM via --experimental-vm-modules) — 1084 tests passing at the last mainline commit (R75 Wave I close a22dd23e)

Target for later phases (not yet implemented):

  • Chevrotain — κ rule-engine DSL parser generator (Phase 1)

Repository layout

├── .agents/                  ← Agent-ops corpus (CANON for skills/)
│   ├── skills/               ← 23 canonical colibri-* skill definitions
│   ├── spawns/               ← HERITAGE — Sigma round traces, read-only
│   └── swarms/               ← HERITAGE — donor swarm templates, read-only
├── .claude/
│   └── skills/               ← MIRROR (drifting) — do not edit by hand
├── .github/                  ← Issue/PR templates + docs-integrity CI
├── .worktrees/               ← SCRATCH — per-task feature worktrees
├── docs/                     ← SINGLE ACTIVE CANON (World Schema tree)
│   ├── colibri-system.md     ← Canonical vision
│   ├── world-schema.md       ← Organizational spine (v3)
│   ├── 0-mutate/             ← Foundational idea + mutation lifecycle
│   ├── 1-transport/          ← MCP + 14 tools
│   ├── 2-plugin/             ← Server: boot, modes, database, middleware
│   ├── 3-world/              ← Runtime model (physics/social/execution)
│   ├── 4-additions/          ← ν Integrations
│   ├── 5-time/               ← Session → round → task → roadmap
│   ├── spec/                 ← 19 locked protocol specifications
│   ├── architecture/decisions/← ADRs (ADR-001–006)
│   ├── agents/               ← Agent contracts
│   ├── reference/extractions/← 49 language-agnostic extractions
│   └── guides/implementation/← Phase 0 task prompts + breakdown
├── data/
│   └── ams.db                ← HERITAGE — AMS donor task store, kept
│                                through Phase 0 bootstrap only.
│                                Target: data/colibri.db (P0.2.2)
├── src/                      ← TypeScript runtime (shipped R75 Wave A+)
│   ├── server.ts             ← MCP entry point (P0.2.1)
│   ├── db/                   ← better-sqlite3 + migration runner (P0.2.2)
│   └── domains/              ← β tasks, δ router (stubs), ε skills, ζ trail, η proof, ν integrations
├── src/__tests__/            ← Jest ESM test suite (1001 tests; P0.1.2)
├── AGENTS.md · CLAUDE.md · README.md · CONTRIBUTING.md · SECURITY.md · CODE_OF_CONDUCT.md

Zone model: every top-level folder declares a zone (CANON · HERITAGE · MIRROR · SCRATCH · VENDOR). See CLAUDE.md §9.2 for the full 12-folder manifest.

One canon: docs/ is the single active documentation surface. Everything in .agents/spawns/, .agents/swarms/, projects/, and data/ is HERITAGE — describing pre-R53 donors (AMS, CogniMesh, Phoenix), not Colibri Phase 0.

License

Apache-2.0 WITH Commons-Clause