Skip to content

Latest commit

 

History

History
204 lines (152 loc) · 8.57 KB

File metadata and controls

204 lines (152 loc) · 8.57 KB

CyDo: A Multi-Agent Orchestration System

Goal: mostly-autonomous software development while maintaining rigorous quality control

Facets

  • Recursive decomposition of tasks
  • Automated quality control using role-based agents
  • Web interface as primary mode of interaction
  • Merge train for CI

Elements

  • Built-in issue tracking: tasks
    • One task is:
      • A description of an actionable issue:
        • A plan generated by an agent that should be executed (or split into sub-tasks)
        • A bug report from the user or an agent which should be investigated
        • A vague or open-ended request from the user which must be expanded into a concrete plan
        • Maybe just the initial prompt of a conversation with the user
      • An agent session which will handle this issue
        • All sessions are tied to a task
        • Tasks start out pending (no session) until they are dequeued and given to an agent
      • Metadata:
        • Link to a parent issue
        • Role of the agent who will handle this issue
        • Optional existing session to resume, and optionally from where (session forking)
        • Dependencies
        • Approvals
          • Operator
          • Stewards
        • Link to execution session
        • "Committable" flag - whether the work should be saved as an individual commit, or is part of a larger commit (parent issue)
          • When "committable" is true, execution runs in a new worktree
        • Status is implicitly derived from dependencies / approvals / execution session status
  • Parallel heavy:
    • All work happens in work trees
    • Merge train - all commits must pass CI
  • CI can be slow, so it is handled by the system (optimized with a merge train) - not executed by agents directly
  • Stewards:
    • Stateful agents which concern themselves with one aspect of the project
    • Stewards review all plans and patches
    • Any steward can reject any plan or patch
    • Examples:
      • Steward of Vision (does this align with the project's overall vision?)
      • Steward of Documentation (does this include necessary documentation updates?)
      • Steward of Testing (does this include necessary tests?)
      • Steward of Security (does this introduce any potential security issues?)
      • Steward of Quality (does this plan use any known antipatterns?)
    • Stewards are notified of landed changes, and maintain their own internal documentation and tooling of what they need to care about, especially regressions in that area
    • Stewards can interrogate the agent which generated the artifact (plan/patch)
    • For plans, stewards either approve or reject with a rationale
    • For patches, stewards may approve, reject, or approve with a list of issues to be addressed in the future
  • Encourage splitting up work:
    • After an agent reads an issue description, the very first thing they must do is decide if it should be split up into multiple issues
  • Wrap Claude Code in headless mode (instead of e.g. tmux like Gas Town)
    • Claude runs sandboxed with --dangerously-skip-permissions (no interactive permission prompts)
  • Provide a custom set of tools instead of the built-in Claude Code ones
  • Hierarchy:
    1. Workspace (sandbox boundary)
    2. Project (has .git)
    3. Branch (has merge train)
    4. Task (runs in parallel)
  • User interaction is via a web interface
    • Task tree - how tasks have been recursively split up
    • Agent log - view agent logs for each task
    • Work queue - for roles that do work synchronously (stewards), show their work queue
    • Pausing/stopping - operator can pause (next tool call will block) or halt (kill process) an agent
    • Live steering - allow injecting user messages into the agent session in real time
    • Interrogation - finished sessions can be forked to create an interactive session
  • "Full-auto" (a.k.a. AFK / overnight) mode (human approval of plans is skipped)

Technology stack

  • Backend: D + ae (~/work/ae)
  • Frontend: Typescript + Preact

Phases

1. Web UI

Wrap Claude Code CLI in a web UI with real-time streaming.

Deliverable: web UI with same capabilities as the first-party CLI.

  • Claude Code CLI wrapped in stream-json protocol
  • Web UI renders sessions with real-time streaming via WebSocket

2. Multi-session

Multiple concurrent agent sessions managed from a single interface.

Deliverable: sidebar for selecting and interacting with all running sessions, with live steering (inject user messages into running sessions).

  • Sidebar for selecting and interacting with multiple sessions
  • Live steering: inject user messages into running sessions

3. Persistence

All state persisted so no data is lost on backend restart.

Deliverable: backend restart preserves all sessions and their full message logs; UI can display previous sessions immediately on reconnect.

  • SQLite data model for tasks (tid, session ID, description, type, parent, status)
  • Session history loaded from Claude's JSONL files

4. Workspaces and sandboxing

Project discovery and bwrap-based sandbox isolation for agent sessions.

Deliverable: agents run inside sandboxed containers with configurable filesystem access; projects are auto-discovered from workspace roots.

  • Workspace configuration with per-workspace sandbox overrides
  • bwrap isolation with read-only / read-write path control
  • Config hot-reload on file change

5. Custom tools

MCP server delivering custom tools to Claude Code sessions.

Deliverable: agents use CyDo's Task tool via MCP to create child sessions visible in the UI task tree, with promise-based result return.

  • Task tool (sub-task creation with result await)
  • Agents use Claude Code's built-in tools for everything else

6. Task type system

YAML-driven task type definitions controlling agent behavior, capabilities, and flow control.

Deliverable: task types configure model, tools, prompt, and sub-task permissions declaratively; a simulator and dot generator validate the design.

  • YAML-defined types with model_class, read_only, output_type, prompt_template
  • creatable_tasks enforcement (parent controls which sub-task types child can create)
  • Prompt template rendering with {{task_description}} substitution
  • Simulator and Graphviz dot generator for design validation

7. Worktrees

Tasks with worktree: true (implement, spike) run in their own git worktree.

Deliverable: an implement sub-task produces a commit in an isolated worktree; the parent can adopt the result without conflicts in the main tree.

  • Create git worktrees on task spawn, pass working directory to agent session
  • Include worktree path in sub-task result for parent to adopt changes

8. Continuations

When a task completes, the system automatically spawns a successor task based on the type's continuation definitions.

Deliverable: the full plan → triage → implement/decompose → review chain runs end-to-end without manual intervention. A full-auto toggle skips operator approval on continuation gates (stewards still review).

  • On task exit, look up chosen continuation and spawn successor
  • keep_context: fork the session (reuse existing JSONL fork logic) so successor inherits conversation history

9. Inter-task communication

Session forking for user-initiated interrogation already works (JSONL truncation). Richer communication between tasks in the tree is needed.

Deliverable: agents can ask questions up the task tree (child → parent → user) and parents can re-engage completed children for follow-up.

  • Parent wakes a completed child to ask follow-up questions
  • Child asks parent for clarification of its prompt
  • Clarification requests bubble up the task tree, ultimately reaching the user if no ancestor can answer

10. Stewards

Stateful review agents that gate continuation spawning via approval.

Deliverable: every plan and patch is reviewed by stewards before landing. Rejections feed back to the originating agent, which can rework and retry.

  • Approval gates on continuations invoke all stewards in parallel (reviews are read-only)
  • Rejection modeled as a retryable tool call — agent reworks and resubmits
  • Knowledge bases loaded from knowledge_base path into steward sessions
  • Steward upkeep: notified of landed changes, maintain internal docs. Upkeep tasks are serial (one at a time per steward); may require a task queue.

11. Merge train

Worktree commits land via a merge queue that runs CI and handles conflicts.

Deliverable: agents produce commits; the system lands them via a serialized queue with CI validation, without agents running CI directly.

  • Investigate whether implementable within the task system (as task types and continuations) or requires a dedicated facility
  • Rebase and retry on conflict or CI failure
  • May require a task queue as prerequisite