Skip to content

SSOT: Ilchul Agent OS — MCP TaskStream control plane MVP #236

@devkade

Description

@devkade

SSOT status

This issue is the active SSOT for the Ilchul Agent OS rewrite.

Ilchul is being reset around a Rust-first, DDD-style MCP TaskStream kernel/control plane.

Previous roadmap issues such as #114, #167, #190, #195, and #201 are no longer active implementation owners. They may be used as design references, but they do not define the organizing spine for this rewrite.

Core identity:

Ilchul = product/repo brand
Rust core = TaskStream authority kernel
MCP = control-plane adapter
Pi/TS = facade/runtime adapter surface
Filesystem = artifacts/projections, not canonical authority

MVP boundary:

one human goal
→ one TaskStream
→ one Run
→ one RuntimeSession
→ evidence
→ verification
→ kernel-owned CompletionDecision

Non-goals for MVP:

multi-runtime orchestration
parallel worker DAGs
repair loop convergence
GitHub PR automation
Discord/Hermes presentation
destructive cleanup automation

Summary

Implement the first Ilchul Agent OS MVP as a single-session governed coding TaskStream over MCP.

main submits a human goal and queries status/evidence, while the Rust TaskStream kernel owns stream state, event/evidence records, runtime dispatch, verification, and the final completion decision.

Runtime output is never completion authority. Runtime output may be a claim or evidence candidate; the kernel decides completion from accepted evidence and verification.

Active child issue queue

Historical reference issues

These prior issues are design reference only. They are not active implementation parents for #236.

MVP proof shape

Use a small repo coding task with a clear failing test:

Human goal:
  Fix the implementation so the intentionally failing test passes.

Expected flow:

MCP goal submission
→ TaskStream created
→ Run created
→ one Pi RuntimeSession started
→ run contract dispatched
→ implementation changed
→ test evidence collected
→ verification gate evaluated
→ completion decided by kernel

Naming decision

  • Workspace folder names should stay generic and structural: core, store, artifacts, mcp, runtime-pi, cli, testkit.
  • Rust package/crate names should use the repository/product namespace: ilchul-core, ilchul-store, ilchul-artifacts, ilchul-mcp, ilchul-runtime-pi, ilchul-cli, ilchul-testkit.
  • Domain type names should remain generic: TaskStream, Run, RuntimeSession, EvidenceRecord, VerificationResult, CompletionDecision, RuntimeAdapter, ArtifactStore, EventStore.
  • Avoid proper-noun leakage into reusable domain concepts unless the name is clearly packaging/product namespace rather than domain semantics.

Core / adapter boundary decision

The #236 rewrite must keep a hard core/adapter boundary. The Rust core is the only place where TaskStream authority lives; adapters are replaceable projections and integrations.

Core owns:

  • TaskStream, Run, RuntimeSession, TurnRef, Event, EvidenceRecord, VerificationResult, CompletionDecision domain models.
  • Contract versioning and immutability rules.
  • Stream/run lifecycle state machines and invariants.
  • Canonical storage writes and transaction boundaries.
  • Append-only accountability events.
  • Evidence envelope validation and artifact reference validation.
  • Verification result acceptance rules.
  • Completion authority: runtime output is input; only core can emit accepted completion decisions.

Core must not own:

  • Pi/Node UI behavior.
  • tmux/process quirks except through ports.
  • GitHub issue/PR lifecycle automation.
  • Runtime-specific prompt syntax beyond abstract contracts.
  • Filesystem artifact formatting beyond artifact refs/inbox rules.
  • Discord/Hermes/main presentation concerns.

Adapters own:

  • MCP tool/resource transport over the core API.
  • SQLite implementation of the storage port.
  • Filesystem artifact inbox/projection implementation.
  • Pi runtime/session adapter and prompt projection.
  • Future Codex/Claude/Hermes adapters.
  • CLI/TS compatibility facades.
  • GitHub/Discord/operator-facing presentation layers.

Architecture style rule:

  • Do not enforce Clean Architecture dogmatically. Use it only where it clarifies dependency direction.
  • Prefer domain-driven development: model the ubiquitous language and invariants first, then place ports/adapters around the domain.
  • The primary boundary is not domain/application/infrastructure purity; it is whether TaskStream authority, state transitions, evidence, verification, and completion semantics remain in the Rust core.
  • Module boundaries should follow domain concepts before framework or transport concerns.

Dependency rule:

core domain model -> no adapter imports
core services -> domain + explicit ports only
adapters -> implement ports around the core
facades -> call core services through adapters

Acceptance rule: a PR that introduces Pi/GitHub/Discord/CLI transport concerns or runtime-specific prompt mechanics into the Rust core authority model fails the #236 architecture boundary. A PR does not fail merely because it is not textbook Clean Architecture.

Implementation language decision

  • Rust is the primary implementation language for the SSOT: Ilchul Agent OS — MCP TaskStream control plane MVP #236 kernel-first MVP.
  • The canonical TaskStream kernel, storage authority, event/evidence/verification/completion model, and MCP control-plane server should be designed Rust-first.
  • TypeScript remains acceptable for Pi extension facades, compatibility shims, UI/command surfaces, and adapter glue where Pi/Node integration requires it.
  • The architecture must avoid letting the TS/Pi facade become the canonical authority; facades call into the Rust kernel/control-plane boundary rather than owning stream state or completion decisions.

Storage / source-of-truth decision

MVP 1 uses a hybrid storage model, with a stricter authority boundary than a plain filesystem workflow:

Canonical authority = kernel-owned SQLite/store
Filesystem = artifacts, runtime logs, transcripts, diffs, test output, and human-readable projections
Events = append-only accountability history, not raw transcript storage

Required boundaries:

  • Kernel-owned typed records are canonical for current/queryable state: TaskStream, TaskStreamContract, Run, RuntimeSession, EvidenceRecord, VerificationResult, and CompletionDecision.
  • Append-only events are canonical for history/accountability: lifecycle transitions, contract finalization, runtime dispatch/output observation, evidence registration, verification, and completion decisions.
  • Filesystem artifacts are referenced by ID/path/digest/summary; they are not completion authority by themselves.
  • Runtime workers may write only to controlled artifact inboxes or sandboxed workspaces. They must not directly mutate canonical stream/run/evidence/verification/completion state.
  • Raw transcripts, long stdout/stderr, diffs, test logs, screenshots, and generated reports stay as artifacts; events carry bounded summaries, hashes/digests, artifact refs, actor/source, and decision facts.
  • completion.decided must be emitted by Ilchul/kernel authority and must reference the evidence and verification result IDs that justify the decision.

Required artifact surface

The MVP must write enough artifacts for mechanical verification:

stream.json
  TaskStream identity, goal, contract/status, selected runtime, completion state

events.jsonl
  compact accountability timeline, not full turn/raw transcript

run/session records
  Run id, runtime adapter, session id, dispatch contract, exit/runtime state

evidence records
  normalized evidence used for completion, including test command, exit code, changed files, before/after status where available

verification output
  raw stdout/stderr/logs stored as artifact refs, not inline event payloads

completion_decision
  completed/failed, rationale, evidence_ids, decided_by=ilchul

Required event policy

Events must not record every token, every turn, or full raw runtime logs inline.

Event log principle:

Event log = compact accountability timeline
Artifacts = raw logs, transcripts, diffs, stdout/stderr, test outputs
Evidence = normalized facts used for completion

Turn and event are not 1:1:

one turn may produce zero events
many turns may be summarized into one event
critical runtime output may become one bounded event plus artifact refs

Use runtime.output_observed / bounded progress summaries instead of mandatory turn_observed.

Minimum required event set:

stream.created
stream.contract_finalized
run.created
runtime.session_started
runtime.dispatched
runtime.output_observed
evidence.recorded
verification.started
verification.finished
completion.decided

Mechanical success minimum must include at least:

stream.created
run.created
runtime.dispatched
evidence.recorded
verification.finished
completion.decided

completion.decided must reference evidence_ids and include decided_by=ilchul.

Adapter boundary

pi and Codex are both first-class reference adapter targets so the core contract does not become dependent on one runtime vocabulary.

However, each MVP execution uses only one adapter/session at a time:

one TaskStream
one Run
one RuntimeSession
one selected adapter: pi OR Codex

The same TaskStreamContract should be projectable to both paths.

Core must avoid adapter-specific leakage:

  • no tmux-specific fields in generic TaskStream core
  • no Codex-specific approval/sandbox fields in generic TaskStream core
  • runtime-specific details belong in adapter projection/session records

TaskStream / RunState boundary

TaskStream must not accidentally replace or duplicate the existing runtime spine.

Use this relationship unless later design explicitly changes it:

TaskStream = Agent OS-facing purpose/governance contract or projection
Run / RunState = runtime operational execution state
RuntimeSession = adapter/substrate execution session
Event = accountability/runtime transition record
Evidence = normalized completion basis
Artifact = raw logs/diffs/stdout/stderr/transcripts/test outputs

The first implementation should prefer a thin projection/contract relationship over a second competing source of truth.

Event compatibility boundary

The event names in this issue define the Agent OS-level accountability timeline. They should remain compatible with #186's runtime event taxonomy.

Acceptable implementation approaches:

  • shared EventStore shape for StreamEvent and RuntimeEvent;
  • StreamEvent records that reference lower-level RuntimeEvent ids;
  • a projection layer that maps runtime events into TaskStream accountability events.

Avoid creating an isolated event vocabulary that cannot replay or audit against the runtime event model.

Non-goals

Out of scope for MVP 1:

  • multi-runtime handoff
  • multi-runtime parallel execution
  • N-session orchestration
  • long-lived ambient agent behavior
  • robot real-time control
  • automatic durable memory promotion
  • full GitHub PR lifecycle automation
  • raw turn/token transcript persistence as default event behavior
  • destructive cleanup of tmux/worktree/session state

Session cleanup must follow #201. MVP 1 may retain runtime session artifacts rather than trying to clean them destructively.

Acceptance criteria

  • Main can submit a human goal to Ilchul through MCP.
  • Ilchul creates one TaskStream from the human goal.
  • Ilchul finalizes a TaskStreamContract.
  • Ilchul creates one Run inside the TaskStream.
  • Ilchul starts one pi or Codex RuntimeSession.
  • Ilchul dispatches the Run contract to the selected runtime adapter.
  • Runtime output is observed or summarized as bounded progress.
  • Ilchul records normalized evidence from the coding fixture.
  • Ilchul runs a verification gate based on test evidence.
  • Ilchul writes stream.json.
  • Ilchul writes events.jsonl with the minimum required events.
  • Ilchul writes run/session records.
  • Ilchul writes evidence records.
  • Ilchul writes verification output as artifact refs.
  • Ilchul writes a final report or completion_decision.
  • completion_decision.status is completed or failed.
  • completion_decision.evidence_ids is non-empty.
  • completion_decision.decided_by is ilchul.
  • A mechanical check can verify required events and completion fields without subjective interpretation.
  • TaskStream is documented or implemented as an Agent OS-facing layer over/alongside the runtime spine rather than an uncoordinated replacement for RunState.
  • Event records are compatible with or explicitly mapped to the runtime event taxonomy from Design: define runtime event taxonomy, replay, and recovery semantics #186.
  • MVP behavior does not require destructive session cleanup; cleanup semantics are deferred to Design: define safe cleanup semantics for leftover issue worker tmux sessions #201.

Verification

Implement targeted tests or fixtures that prove:

  • a small failing test fixture can be driven through one TaskStream/Run/session;
  • required artifacts are written;
  • required minimum events exist in events.jsonl;
  • raw runtime/test output is referenced as artifacts rather than stored inline in events;
  • completion.decided fails if required evidence or decided_by=ilchul is missing;
  • the generic TaskStream core does not require pi- or Codex-specific fields;
  • Stream/Run/Event/Evidence artifacts can be mechanically checked against this issue's minimum criteria.

Notes

Core phrase to preserve:

Turn은 실행의 리듬이고, Event는 책임의 기록이다.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestmeta-ssotSingle source of truth meta issue

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions