Skip to content

Latest commit

 

History

History
162 lines (120 loc) · 10.6 KB

File metadata and controls

162 lines (120 loc) · 10.6 KB

RunContract-centered learning runtime boundaries

This document defines the architecture boundary for issue #170. It is a design contract, not an implementation: current Kapi workflow state, RunContract projection, .ilchul storage, and command behavior remain unchanged until scoped implementation PRs land.

Goals

  • Connect existing WorkflowState and RunContract harness concepts to a future RunState execution runtime.
  • Define boundaries for objective evaluation, policy selection, task graph execution, worker runtime state, evidence, integration/repair, and learning data.
  • Keep generic RunContract core free of GitHub, PR, Discord, Ragna, and kapi-agent semantics.
  • Separate completion authority, runtime readiness authority, and advisory evaluation authority.
  • Define an event model for replay, recovery, audit, and learning.

Non-goals

  • No broad kapi -> ilchul rename.
  • No legacy .kapi deletion, command rename, or hidden storage cleanup.
  • No runtime plugin framework or dynamic module loading authority.
  • No hard-blocking score authority in this slice.
  • No GitHub/PR/Ragna/kapi-agent meanings in generic runtime core.

Layer map

WorkflowState + WorkflowDefinition
  -> RunContract Core
  -> Objective Engine
  -> Policy Selector
  -> Workflow Engine
  -> DAG Runtime
  -> Worker Runtime
  -> Evidence / Evaluation
  -> Integration / Repair
  -> Reward Ledger / Policy Update
Layer Owns Does not own
WorkflowState Current lifecycle, required artifacts, validation rules, mode-specific source of truth. DAG claims, worker leases, learning policy, external PR/merge decisions.
RunContract Core Generic projection of goal, evidence expectations, done criteria, artifacts, completion criteria, quality and steering hints. Durable contract.json authority, GitHub semantics, scheduling, cleanup.
Objective Engine Evaluation intent, metrics, anti-Goodhart constraints, evaluator choice, score/verdict rationale. Completion by itself or policy changes without recorded selection.
Policy Selector Explicit execution strategy choice and rationale. Silent behavior changes from reward data or heuristics.
Workflow Engine Mode-specific transitions and artifact obligations. Worker liveness as the only completion proof.
DAG Runtime Task ids, dependencies, readiness, attempts, claims, leases, and evidence gates. External merge/tracker authority or destructive cleanup.
Worker Runtime Substrate readiness, heartbeat, retention lifecycle, and owned runtime handles. User-owned worktrees/branches or uncertain stale handles.
Evidence / Evaluation Durable proof refs, command outputs, artifact refs, score outputs, reviewer/evaluator records. Narrative-only proof or stale evidence acceptance.
Integration / Repair Explicit merge, conflict, repair, retry, and supersession records. Hidden source-branch or tracker mutation.
Reward Ledger / Policy Update Cross-run learning observations and policy-hint data. Changing selected policy unless PolicySelection records the decision.

Core state candidate

interface RunState {
  schemaVersion: number;
  runId: string;
  goal: string;
  status: RunStatus;
  workflow: WorkflowState;
  runContract: RunContractView;
  objective: ObjectiveFunction;
  policySelection: PolicySelection;
  selectedPolicy: ExecutionPolicy;
  taskGraph: TaskGraph;
  workers: WorkerRuntimeState[];
  claims: TaskClaim[];
  leases: WorkerLease[];
  evidence: EvidenceRef[];
  evaluations: EvaluationResult[];
  integration?: IntegrationState;
  learning?: LearningState;
  events: RuntimeEvent[];
}

RunState is the future runtime envelope. It references existing workflow truth instead of replacing it: workflow remains authoritative for mode-specific lifecycle and artifact obligations, while runContract is the generic projection supervisors inspect.

Model boundaries

WorkflowState

WorkflowState remains authoritative for current workflow lifecycle and validation. A run reports workflow completion only when the workflow contract's required artifacts, evidence, and verifier rules pass. Scheduler status can explain execution progress, but it cannot complete the workflow by itself.

RunContractView

RunContract remains a projection/contract boundary. It exposes goal, constraints, evidence expectations, completion criteria, quality dimensions, and steering hints. It must not persist a competing durable source of truth or embed adapter-specific authority such as GitHub review freshness.

ObjectiveFunction

ObjectiveFunction records evaluation intent: target, metrics, anti-Goodhart checks, evaluator choice, and optional human override policy. Objective outputs are advisory by default. They can recommend attention, retries, repair, or policy candidates, but they do not complete tasks or block unrelated workflow progress unless a later design explicitly grants that authority.

PolicySelection

PolicySelection records the selected policy id, considered alternatives, rationale, objective refs, reward-ledger refs, timestamp, and selector (default, human, supervisor, or simulator). Reward data may inform this record, but it must not silently alter worker counts, scheduler policy, verification depth, or repair behavior without a new selection event.

TaskGraph

TaskGraph owns execution decomposition: task ids, dependencies, ready set, attempts, claims, and evidence gates. A task cannot become ready until dependencies are completed, cannot be claimed unless ready, and cannot complete without a valid claim plus evidence refs. This is execution readiness authority, not workflow completion authority.

WorkerRuntimeState

Worker runtime state owns substrate readiness and retention. It should use the worker lifecycle from docs/ilchul-runtime-config.md: active, completed-retained, safe-to-close, stale-registry, cleanup-released, and closed.

Readiness signals such as tmux marker, process liveness, heartbeat, or prompt-dispatch status prove only substrate state. They do not prove task completion without task evidence, and they do not prove workflow completion without workflow validation.

EvidenceRef and EvaluationResult

Evidence refs point to inspectable artifacts, command outputs, reviewer records, or runtime events. Evaluation results reference objective metrics, evidence inputs, verdict (pass, warn, fail, or inconclusive), optional score, and rationale. Evaluation verdicts answer what a supervisor should inspect next. They are not completion authority unless a workflow contract explicitly requires that evaluator result as evidence.

RewardLedger

RewardLedger stores cross-run learning data under the future .ilchul/learning/ surface. Entries should preserve objective, selected policy, observed outcome, evaluation refs, and human override rationale. Learning may propose policy hints; runtime behavior must still be selected and recorded through PolicySelection.

Authority separation

Authority Source Can do Cannot do
Completion authority WorkflowState, workflow validation, required evidence, verifier/human gates. Mark workflow tasks/runs complete when contract obligations pass. Treat score, worker liveness, or narrative claims as completion.
Runtime readiness authority TaskGraph, claims, leases, worker heartbeat/readiness, retention state. Decide which tasks are ready, claimed, in progress, stale, or safe to inspect. Override workflow completion or close user-owned runtime handles.
Advisory evaluation authority ObjectiveFunction, EvaluationResult, quality dimensions, RewardLedger. Recommend policy, repair, retry, review, or human inspection. Hard-block, auto-merge, silently mutate policy, or close trackers.
External adapter authority Adapter-specific supervisor operations. Interpret generic state for GitHub/PR/Discord/tool surfaces when explicitly invoked. Leak adapter meanings into core RunContract or runtime schemas.

Event model

Runtime events are append-only and replayable. Initial event names should stay semantic and generic:

type RuntimeEventType =
  | "run.created" | "contract.projected" | "objective.recorded" | "policy.selected"
  | "task.ready" | "task.claimed" | "task.heartbeat" | "task.evidence_attached"
  | "task.completed" | "task.failed" | "worker.readiness_observed"
  | "worker.retention_changed" | "evaluation.recorded"
  | "integration.repair_requested" | "integration.completed" | "reward.recorded";

Event rules:

  1. Corrections use superseding events rather than in-place deletion.
  2. Every event includes runId, timestamp, actor, schema version, and an idempotency key.
  3. Replay rebuilds runtime projections from events plus existing workflow state; malformed or missing critical events fail closed.
  4. Recovery may classify unknown worker handles as stale-registry, but it must not delete or close them without explicit safe-cleanup ownership checks.
  5. Learning events record observations only; policy changes require a separate policy.selected event.

Integration and repair boundary

Integration and repair state records how candidate output is accepted, rejected, retried, or superseded. It should link task evidence, evaluation results, conflicts, merge decisions, and repair tasks without hidden branch mutation. External PR/merge semantics stay in adapters and supervisor operations.

Design verification checklist

  • Architecture defines RunState, ObjectiveFunction, PolicySelection, TaskGraph, WorkerRuntimeState, EvidenceRef, EvaluationResult, and RewardLedger boundaries.
  • Existing WorkflowState and RunContract responsibilities remain explicit.
  • Completion authority, runtime readiness authority, and advisory evaluation authority are separated.
  • Event model supports replay, recovery, audit, and learning.
  • Worker retention lifecycle from #148/#169 is referenced without adding cleanup behavior.

Follow-up implementation slices

  1. Add RunState and runtime event TypeScript types without changing persistence behavior.
  2. Add read-only RunState projection from existing workflow state and RunContract views.
  3. Add TaskGraph readiness/claim/lease domain logic with unit tests.
  4. Add worker heartbeat and retention projection aligned with docs/ilchul-runtime-config.md.
  5. Add ObjectiveFunction/EvaluationResult records as advisory-only data.
  6. Add RewardLedger append/read APIs that cannot affect selected policy without PolicySelection.
  7. Add external adapter views only after generic runtime state is stable and covered by tests.