This document defines the architecture boundary for issue #170. It is a design contract, not an implementation: current Kapi workflow state, RunContract projection, .ilchul storage, and command behavior remain unchanged until scoped implementation PRs land.
- Connect existing
WorkflowStateand RunContract harness concepts to a futureRunStateexecution runtime. - Define boundaries for objective evaluation, policy selection, task graph execution, worker runtime state, evidence, integration/repair, and learning data.
- Keep generic RunContract core free of GitHub, PR, Discord, Ragna, and kapi-agent semantics.
- Separate completion authority, runtime readiness authority, and advisory evaluation authority.
- Define an event model for replay, recovery, audit, and learning.
- No broad
kapi -> ilchulrename. - No legacy
.kapideletion, command rename, or hidden storage cleanup. - No runtime plugin framework or dynamic module loading authority.
- No hard-blocking score authority in this slice.
- No GitHub/PR/Ragna/kapi-agent meanings in generic runtime core.
WorkflowState + WorkflowDefinition
-> RunContract Core
-> Objective Engine
-> Policy Selector
-> Workflow Engine
-> DAG Runtime
-> Worker Runtime
-> Evidence / Evaluation
-> Integration / Repair
-> Reward Ledger / Policy Update
| Layer | Owns | Does not own |
|---|---|---|
WorkflowState |
Current lifecycle, required artifacts, validation rules, mode-specific source of truth. | DAG claims, worker leases, learning policy, external PR/merge decisions. |
| RunContract Core | Generic projection of goal, evidence expectations, done criteria, artifacts, completion criteria, quality and steering hints. | Durable contract.json authority, GitHub semantics, scheduling, cleanup. |
| Objective Engine | Evaluation intent, metrics, anti-Goodhart constraints, evaluator choice, score/verdict rationale. | Completion by itself or policy changes without recorded selection. |
| Policy Selector | Explicit execution strategy choice and rationale. | Silent behavior changes from reward data or heuristics. |
| Workflow Engine | Mode-specific transitions and artifact obligations. | Worker liveness as the only completion proof. |
| DAG Runtime | Task ids, dependencies, readiness, attempts, claims, leases, and evidence gates. | External merge/tracker authority or destructive cleanup. |
| Worker Runtime | Substrate readiness, heartbeat, retention lifecycle, and owned runtime handles. | User-owned worktrees/branches or uncertain stale handles. |
| Evidence / Evaluation | Durable proof refs, command outputs, artifact refs, score outputs, reviewer/evaluator records. | Narrative-only proof or stale evidence acceptance. |
| Integration / Repair | Explicit merge, conflict, repair, retry, and supersession records. | Hidden source-branch or tracker mutation. |
| Reward Ledger / Policy Update | Cross-run learning observations and policy-hint data. | Changing selected policy unless PolicySelection records the decision. |
interface RunState {
schemaVersion: number;
runId: string;
goal: string;
status: RunStatus;
workflow: WorkflowState;
runContract: RunContractView;
objective: ObjectiveFunction;
policySelection: PolicySelection;
selectedPolicy: ExecutionPolicy;
taskGraph: TaskGraph;
workers: WorkerRuntimeState[];
claims: TaskClaim[];
leases: WorkerLease[];
evidence: EvidenceRef[];
evaluations: EvaluationResult[];
integration?: IntegrationState;
learning?: LearningState;
events: RuntimeEvent[];
}RunState is the future runtime envelope. It references existing workflow truth instead of replacing it: workflow remains authoritative for mode-specific lifecycle and artifact obligations, while runContract is the generic projection supervisors inspect.
WorkflowState remains authoritative for current workflow lifecycle and validation. A run reports workflow completion only when the workflow contract's required artifacts, evidence, and verifier rules pass. Scheduler status can explain execution progress, but it cannot complete the workflow by itself.
RunContract remains a projection/contract boundary. It exposes goal, constraints, evidence expectations, completion criteria, quality dimensions, and steering hints. It must not persist a competing durable source of truth or embed adapter-specific authority such as GitHub review freshness.
ObjectiveFunction records evaluation intent: target, metrics, anti-Goodhart checks, evaluator choice, and optional human override policy. Objective outputs are advisory by default. They can recommend attention, retries, repair, or policy candidates, but they do not complete tasks or block unrelated workflow progress unless a later design explicitly grants that authority.
PolicySelection records the selected policy id, considered alternatives, rationale, objective refs, reward-ledger refs, timestamp, and selector (default, human, supervisor, or simulator). Reward data may inform this record, but it must not silently alter worker counts, scheduler policy, verification depth, or repair behavior without a new selection event.
TaskGraph owns execution decomposition: task ids, dependencies, ready set, attempts, claims, and evidence gates. A task cannot become ready until dependencies are completed, cannot be claimed unless ready, and cannot complete without a valid claim plus evidence refs. This is execution readiness authority, not workflow completion authority.
Worker runtime state owns substrate readiness and retention. It should use the worker lifecycle from docs/ilchul-runtime-config.md: active, completed-retained, safe-to-close, stale-registry, cleanup-released, and closed.
Readiness signals such as tmux marker, process liveness, heartbeat, or prompt-dispatch status prove only substrate state. They do not prove task completion without task evidence, and they do not prove workflow completion without workflow validation.
Evidence refs point to inspectable artifacts, command outputs, reviewer records, or runtime events. Evaluation results reference objective metrics, evidence inputs, verdict (pass, warn, fail, or inconclusive), optional score, and rationale. Evaluation verdicts answer what a supervisor should inspect next. They are not completion authority unless a workflow contract explicitly requires that evaluator result as evidence.
RewardLedger stores cross-run learning data under the future .ilchul/learning/ surface. Entries should preserve objective, selected policy, observed outcome, evaluation refs, and human override rationale. Learning may propose policy hints; runtime behavior must still be selected and recorded through PolicySelection.
| Authority | Source | Can do | Cannot do |
|---|---|---|---|
| Completion authority | WorkflowState, workflow validation, required evidence, verifier/human gates. |
Mark workflow tasks/runs complete when contract obligations pass. | Treat score, worker liveness, or narrative claims as completion. |
| Runtime readiness authority | TaskGraph, claims, leases, worker heartbeat/readiness, retention state. |
Decide which tasks are ready, claimed, in progress, stale, or safe to inspect. | Override workflow completion or close user-owned runtime handles. |
| Advisory evaluation authority | ObjectiveFunction, EvaluationResult, quality dimensions, RewardLedger. |
Recommend policy, repair, retry, review, or human inspection. | Hard-block, auto-merge, silently mutate policy, or close trackers. |
| External adapter authority | Adapter-specific supervisor operations. | Interpret generic state for GitHub/PR/Discord/tool surfaces when explicitly invoked. | Leak adapter meanings into core RunContract or runtime schemas. |
Runtime events are append-only and replayable. Initial event names should stay semantic and generic:
type RuntimeEventType =
| "run.created" | "contract.projected" | "objective.recorded" | "policy.selected"
| "task.ready" | "task.claimed" | "task.heartbeat" | "task.evidence_attached"
| "task.completed" | "task.failed" | "worker.readiness_observed"
| "worker.retention_changed" | "evaluation.recorded"
| "integration.repair_requested" | "integration.completed" | "reward.recorded";Event rules:
- Corrections use superseding events rather than in-place deletion.
- Every event includes
runId, timestamp, actor, schema version, and an idempotency key. - Replay rebuilds runtime projections from events plus existing workflow state; malformed or missing critical events fail closed.
- Recovery may classify unknown worker handles as
stale-registry, but it must not delete or close them without explicit safe-cleanup ownership checks. - Learning events record observations only; policy changes require a separate
policy.selectedevent.
Integration and repair state records how candidate output is accepted, rejected, retried, or superseded. It should link task evidence, evaluation results, conflicts, merge decisions, and repair tasks without hidden branch mutation. External PR/merge semantics stay in adapters and supervisor operations.
- Architecture defines
RunState,ObjectiveFunction,PolicySelection,TaskGraph,WorkerRuntimeState,EvidenceRef,EvaluationResult, andRewardLedgerboundaries. - Existing
WorkflowStateand RunContract responsibilities remain explicit. - Completion authority, runtime readiness authority, and advisory evaluation authority are separated.
- Event model supports replay, recovery, audit, and learning.
- Worker retention lifecycle from #148/#169 is referenced without adding cleanup behavior.
- Add
RunStateand runtime event TypeScript types without changing persistence behavior. - Add read-only RunState projection from existing workflow state and RunContract views.
- Add TaskGraph readiness/claim/lease domain logic with unit tests.
- Add worker heartbeat and retention projection aligned with
docs/ilchul-runtime-config.md. - Add ObjectiveFunction/EvaluationResult records as advisory-only data.
- Add RewardLedger append/read APIs that cannot affect selected policy without
PolicySelection. - Add external adapter views only after generic runtime state is stable and covered by tests.