RunContract-centered learning runtime boundaries

This document defines the architecture boundary for issue #170. It is a design contract, not an implementation: current Kapi workflow state, RunContract projection, .ilchul storage, and command behavior remain unchanged until scoped implementation PRs land.

Goals

Connect existing WorkflowState and RunContract harness concepts to a future RunState execution runtime.
Define boundaries for objective evaluation, policy selection, task graph execution, worker runtime state, evidence, integration/repair, and learning data.
Keep generic RunContract core free of GitHub, PR, Discord, Ragna, and kapi-agent semantics.
Separate completion authority, runtime readiness authority, and advisory evaluation authority.
Define an event model for replay, recovery, audit, and learning.

Non-goals

No broad kapi -> ilchul rename.
No legacy .kapi deletion, command rename, or hidden storage cleanup.
No runtime plugin framework or dynamic module loading authority.
No hard-blocking score authority in this slice.
No GitHub/PR/Ragna/kapi-agent meanings in generic runtime core.

Layer map

WorkflowState + WorkflowDefinition
  -> RunContract Core
  -> Objective Engine
  -> Policy Selector
  -> Workflow Engine
  -> DAG Runtime
  -> Worker Runtime
  -> Evidence / Evaluation
  -> Integration / Repair
  -> Reward Ledger / Policy Update

Layer	Owns	Does not own
`WorkflowState`	Current lifecycle, required artifacts, validation rules, mode-specific source of truth.	DAG claims, worker leases, learning policy, external PR/merge decisions.
RunContract Core	Generic projection of goal, evidence expectations, done criteria, artifacts, completion criteria, quality and steering hints.	Durable `contract.json` authority, GitHub semantics, scheduling, cleanup.
Objective Engine	Evaluation intent, metrics, anti-Goodhart constraints, evaluator choice, score/verdict rationale.	Completion by itself or policy changes without recorded selection.
Policy Selector	Explicit execution strategy choice and rationale.	Silent behavior changes from reward data or heuristics.
Workflow Engine	Mode-specific transitions and artifact obligations.	Worker liveness as the only completion proof.
DAG Runtime	Task ids, dependencies, readiness, attempts, claims, leases, and evidence gates.	External merge/tracker authority or destructive cleanup.
Worker Runtime	Substrate readiness, heartbeat, retention lifecycle, and owned runtime handles.	User-owned worktrees/branches or uncertain stale handles.
Evidence / Evaluation	Durable proof refs, command outputs, artifact refs, score outputs, reviewer/evaluator records.	Narrative-only proof or stale evidence acceptance.
Integration / Repair	Explicit merge, conflict, repair, retry, and supersession records.	Hidden source-branch or tracker mutation.
Reward Ledger / Policy Update	Cross-run learning observations and policy-hint data.	Changing selected policy unless `PolicySelection` records the decision.

Core state candidate

interface RunState {
  schemaVersion: number;
  runId: string;
  goal: string;
  status: RunStatus;
  workflow: WorkflowState;
  runContract: RunContractView;
  objective: ObjectiveFunction;
  policySelection: PolicySelection;
  selectedPolicy: ExecutionPolicy;
  taskGraph: TaskGraph;
  workers: WorkerRuntimeState[];
  claims: TaskClaim[];
  leases: WorkerLease[];
  evidence: EvidenceRef[];
  evaluations: EvaluationResult[];
  integration?: IntegrationState;
  learning?: LearningState;
  events: RuntimeEvent[];
}

RunState is the future runtime envelope. It references existing workflow truth instead of replacing it: workflow remains authoritative for mode-specific lifecycle and artifact obligations, while runContract is the generic projection supervisors inspect.

Model boundaries

`WorkflowState`

WorkflowState remains authoritative for current workflow lifecycle and validation. A run reports workflow completion only when the workflow contract's required artifacts, evidence, and verifier rules pass. Scheduler status can explain execution progress, but it cannot complete the workflow by itself.

`RunContractView`

RunContract remains a projection/contract boundary. It exposes goal, constraints, evidence expectations, completion criteria, quality dimensions, and steering hints. It must not persist a competing durable source of truth or embed adapter-specific authority such as GitHub review freshness.

`ObjectiveFunction`

ObjectiveFunction records evaluation intent: target, metrics, anti-Goodhart checks, evaluator choice, and optional human override policy. Objective outputs are advisory by default. They can recommend attention, retries, repair, or policy candidates, but they do not complete tasks or block unrelated workflow progress unless a later design explicitly grants that authority.

`PolicySelection`

PolicySelection records the selected policy id, considered alternatives, rationale, objective refs, reward-ledger refs, timestamp, and selector (default, human, supervisor, or simulator). Reward data may inform this record, but it must not silently alter worker counts, scheduler policy, verification depth, or repair behavior without a new selection event.

`TaskGraph`

TaskGraph owns execution decomposition: task ids, dependencies, ready set, attempts, claims, and evidence gates. A task cannot become ready until dependencies are completed, cannot be claimed unless ready, and cannot complete without a valid claim plus evidence refs. This is execution readiness authority, not workflow completion authority.

`WorkerRuntimeState`

Worker runtime state owns substrate readiness and retention. It should use the worker lifecycle from docs/ilchul-runtime-config.md: active, completed-retained, safe-to-close, stale-registry, cleanup-released, and closed.

Readiness signals such as tmux marker, process liveness, heartbeat, or prompt-dispatch status prove only substrate state. They do not prove task completion without task evidence, and they do not prove workflow completion without workflow validation.

`EvidenceRef` and `EvaluationResult`

Evidence refs point to inspectable artifacts, command outputs, reviewer records, or runtime events. Evaluation results reference objective metrics, evidence inputs, verdict (pass, warn, fail, or inconclusive), optional score, and rationale. Evaluation verdicts answer what a supervisor should inspect next. They are not completion authority unless a workflow contract explicitly requires that evaluator result as evidence.

`RewardLedger`

RewardLedger stores cross-run learning data under the future .ilchul/learning/ surface. Entries should preserve objective, selected policy, observed outcome, evaluation refs, and human override rationale. Learning may propose policy hints; runtime behavior must still be selected and recorded through PolicySelection.

Authority separation

Authority	Source	Can do	Cannot do
Completion authority	`WorkflowState`, workflow validation, required evidence, verifier/human gates.	Mark workflow tasks/runs complete when contract obligations pass.	Treat score, worker liveness, or narrative claims as completion.
Runtime readiness authority	`TaskGraph`, claims, leases, worker heartbeat/readiness, retention state.	Decide which tasks are ready, claimed, in progress, stale, or safe to inspect.	Override workflow completion or close user-owned runtime handles.
Advisory evaluation authority	`ObjectiveFunction`, `EvaluationResult`, quality dimensions, RewardLedger.	Recommend policy, repair, retry, review, or human inspection.	Hard-block, auto-merge, silently mutate policy, or close trackers.
External adapter authority	Adapter-specific supervisor operations.	Interpret generic state for GitHub/PR/Discord/tool surfaces when explicitly invoked.	Leak adapter meanings into core RunContract or runtime schemas.

Event model

Runtime events are append-only and replayable. Initial event names should stay semantic and generic:

type RuntimeEventType =
  | "run.created" | "contract.projected" | "objective.recorded" | "policy.selected"
  | "task.ready" | "task.claimed" | "task.heartbeat" | "task.evidence_attached"
  | "task.completed" | "task.failed" | "worker.readiness_observed"
  | "worker.retention_changed" | "evaluation.recorded"
  | "integration.repair_requested" | "integration.completed" | "reward.recorded";

Event rules:

Corrections use superseding events rather than in-place deletion.
Every event includes runId, timestamp, actor, schema version, and an idempotency key.
Replay rebuilds runtime projections from events plus existing workflow state; malformed or missing critical events fail closed.
Recovery may classify unknown worker handles as stale-registry, but it must not delete or close them without explicit safe-cleanup ownership checks.
Learning events record observations only; policy changes require a separate policy.selected event.

Integration and repair boundary

Integration and repair state records how candidate output is accepted, rejected, retried, or superseded. It should link task evidence, evaluation results, conflicts, merge decisions, and repair tasks without hidden branch mutation. External PR/merge semantics stay in adapters and supervisor operations.

Design verification checklist

Architecture defines RunState, ObjectiveFunction, PolicySelection, TaskGraph, WorkerRuntimeState, EvidenceRef, EvaluationResult, and RewardLedger boundaries.
Existing WorkflowState and RunContract responsibilities remain explicit.
Completion authority, runtime readiness authority, and advisory evaluation authority are separated.
Event model supports replay, recovery, audit, and learning.
Worker retention lifecycle from #148/#169 is referenced without adding cleanup behavior.

Follow-up implementation slices

Add RunState and runtime event TypeScript types without changing persistence behavior.
Add read-only RunState projection from existing workflow state and RunContract views.
Add TaskGraph readiness/claim/lease domain logic with unit tests.
Add worker heartbeat and retention projection aligned with docs/ilchul-runtime-config.md.
Add ObjectiveFunction/EvaluationResult records as advisory-only data.
Add RewardLedger append/read APIs that cannot affect selected policy without PolicySelection.
Add external adapter views only after generic runtime state is stable and covered by tests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RunContract-centered learning runtime boundaries

Goals

Non-goals

Layer map

Core state candidate

Model boundaries

`WorkflowState`

`RunContractView`

`ObjectiveFunction`

`PolicySelection`

`TaskGraph`

`WorkerRuntimeState`

`EvidenceRef` and `EvaluationResult`

`RewardLedger`

Authority separation

Event model

Integration and repair boundary

Design verification checklist

Follow-up implementation slices

FilesExpand file tree

learning-runtime-boundaries.md

Latest commit

History

learning-runtime-boundaries.md

File metadata and controls

RunContract-centered learning runtime boundaries

Goals

Non-goals

Layer map

Core state candidate

Model boundaries

WorkflowState

RunContractView

ObjectiveFunction

PolicySelection

TaskGraph

WorkerRuntimeState

EvidenceRef and EvaluationResult

RewardLedger

Authority separation

Event model

Integration and repair boundary

Design verification checklist

Follow-up implementation slices

`WorkflowState`

`RunContractView`

`ObjectiveFunction`

`PolicySelection`

`TaskGraph`

`WorkerRuntimeState`

`EvidenceRef` and `EvaluationResult`

`RewardLedger`