devkade · devkade · May 18, 2026 · May 18, 2026
diff --git a/README.md b/README.md
@@ -303,6 +303,7 @@ Kapi uses Pi extension surfaces as thin safety rails rather than a separate orch
 - `docs/ilchul-naming-policy.md` — product naming, compatibility, and active `.ilchul` storage policy.
 - `docs/ilchul-runtime-config.md` — design contract for `.ilchul/` runtime layout, adapter config defaults, worker retention states, and safe cleanup boundaries.
 - `docs/learning-runtime-boundaries.md` — design contract for future `RunState`, objective, policy selection, task graph, worker runtime, evidence/evaluation, integration/repair, and reward-ledger boundaries.
+- `docs/learning-runtime-verification-matrix.md` — verification matrix for schema/events/DAG/claims/workers/policy/reward/integration/retention/storage readiness before learning-runtime default claims.
 - `docs/ralph-live-qa.md` — operator live QA checklist for proving `/kapi-ralph` start, planning, approval, build, evidence, closeout, and resume behavior in a real Pi/Kapi runtime.
 - `skills/kapi-workflow/SKILL.md` — active-workflow behavior reminders for agents.
 - `prompts/` — Kapi prompt resources exposed to Pi.

diff --git a/docs/learning-runtime-verification-matrix.md b/docs/learning-runtime-verification-matrix.md
@@ -0,0 +1,118 @@
+# Learning runtime verification matrix
+
+Issue: #191
+Parent: #167
+
+This matrix defines the evidence required before Ilchul claims the objective-driven learning parallel runtime is implementation-ready or changes runtime defaults. It is a design contract, not a CI policy flip.
+
+## Scope and rule
+
+The runtime is considered ready only when every MVP invariant maps to at least one unit, integration, E2E smoke, or failure-mode check. Design-only issues can close with documented acceptance evidence, but runtime implementation issues must point at executable tests, fixtures, or smoke records.
+
+Verification layers:
+
+1. **Unit** — pure domain functions, schema parsers, state/event reducers, deterministic policy math.
+2. **Integration** — service or adapter seams with fake stores/substrates; no real agent authority required.
+3. **E2E smoke** — one local substitute or fake-worker run that exercises the full runtime path and records evidence refs.
+4. **Failure mode** — fail-closed behavior for malformed state, stale claims, missing evidence, unsafe storage, and conflicts.
+
+## Unit test matrix
+
+| Runtime area | Required unit coverage | Existing or target surface |
+|---|---|---|
+| Runtime schema validation | `RuntimeState.schemaVersion`, enum fields, optional nested records, artifact/evidence refs, and unknown newer versions fail closed. | `test/runtime-state.test.ts` |
+| Runtime event replay | event envelope validation, monotonic `seq`, exact duplicate idempotency, conflicting duplicate rejection, and sealed-run mutation boundary. | `test/runtime-events.test.ts` |
+| Phase and preset contracts | phase preset schemas, graph-execution boundaries, thin phase outputs, and side-effect request separation. | `test/phase-preset.test.ts`, `test/graph-execution-components.test.ts` |
+| TaskGraph readiness | duplicate ids, missing dependencies, cycles, topological order, explicit ready transition, downstream block after dependency failure, and repair supersession. | `test/task-graph.test.ts` |
+| Claim and lease ownership | token creation, duplicate active ownership rejection, lease renewal, release, expiry, explicit recovery, and completion with matching unexpired claim. | `test/task-graph.test.ts` |
+| Worker execution state | readiness nonce, claimed-task dispatch, duplicate dispatch rejection, heartbeat refresh, stale projection, structured report capture, and evidence-gated completion. | `test/task-graph.test.ts` |
+| Policy simulation determinism | stable policy ids, deterministic scores for identical inputs, invalid objective rejection, exploration caps, blocked policy ids, and human override trails. | `test/policy-selector.test.ts` |
+| Objective/evaluation guardrails | explicit success/failure/repair criteria, metric direction, non-finite score rejection, anti-Goodhart flags, and advisory-only quality output. | `test/objective.test.ts`, `test/quality-probe-matrix.test.ts` |
+| RewardRecord and PolicyHint | prediction-vs-actual records, penalty taxonomy, calibration refs, advisory `PolicyHint`, and no silent policy mutation. | policy/reward unit fixture target |
+| IntegrationCandidate and repair | candidate refs, dry-run/conflict state, superseded tasks, repair budget, and repair evidence requirements. | post-MVP target from #195/#190 |
+| Retention and safe close | worker statuses `completed-retained`, `safe-to-close`, `stale-registry`, `cleanup-released`, and `closed`; no destructive cleanup from validation alone. | runtime/worker state tests |
+| `.ilchul` compatibility | active storage root, unsafe `.kapi` mutation absence, artifact root validation, and compatibility diagnostics. | storage/config tests |
+
+## Integration test matrix
+
+| Integration seam | Required coverage | Minimum fixture |
+|---|---|---|
+| RunState + EventStore | commit transition writes durable intent before side-effect execution and replay reconstructs current snapshot. | temp `.ilchul` workspace with snapshot + events fixture |
+| RunOrchestrator + GateEngine + Verifier | HardInvariantGate, PhasePresetGate, and RunObjectiveGate deny mutation on failure and record blocker evidence. | fake verifier returning pass/block/repair/human-decision |
+| TaskGraph + worker substrate | two independent claimed tasks dispatch through a fake substrate and update worker/task state from reports only. | two fake workers with readiness nonces and evidence refs |
+| Adapter contract | Codex, Pi, and Claude Code compatibility assumptions stay behind the `AgentAdapter` / `ExecutionSubstrate` contract. | fake adapter matrix; no real agent required |
+| Evidence extraction | reports/logs/test output/diff/artifact refs produce bounded `EvidenceRef` values; missing refs deny completion. | synthetic report bundle with one valid and one missing ref |
+| Policy/evaluation/reward | selected policy emits prediction id, evaluation records actual result, reward records delta, but no future policy changes happen without `policy.selected`. | policy-selector fixture plus reward-ledger fixture |
+| Integration dry-run | clean candidate, conflict candidate, and repair candidate remain refs until an explicit integration gate passes. | fake candidate refs and conflict matrix |
+| Retention lifecycle | terminal runs preserve retained worker inspection data and only mark safe close through explicit retention state. | fake worker registry with retained and stale handles |
+| Storage compatibility | `.ilchul` runtime files are read/written under validated roots; legacy `.kapi` is never deleted or silently migrated. | temp workspace containing both roots |
+
+## E2E smoke path
+
+A minimal runtime-readiness smoke must prove this path without relying on narrative agent claims:
+
+1. Start from an approved `RunObjective` with success criteria, failure criteria, repair criteria, and constraints.
+2. Select a deterministic policy and record `policy.selected` with prediction metadata.
+3. Create a concrete `TaskGraph` with at least two independent ready branches and one downstream join task.
+4. Claim both ready branches with lease tokens and dispatch them to fake or local-substitute workers.
+5. Record worker readiness nonces, heartbeats, and structured reports.
+6. Complete branch tasks only when matching unexpired claims and `EvidenceRef` values exist.
+7. Evaluate the objective, record `EvaluationResult`, and emit a `RewardRecord` / advisory `PolicyHint`.
+8. Produce an `IntegrationCandidate` or explicit post-MVP skip reason.
+9. Seal the run with snapshot, event replay check, retained worker state, and closeout evidence.
+
+Minimum E2E evidence bundle:
+
+- `state.json` or equivalent runtime snapshot;
+- `events.jsonl` with replayable event sequence;
+- worker report fixture(s) with evidence refs;
+- objective/evaluation artifact;
+- reward-ledger or explicit shallow-first reward fixture;
+- integration dry-run report or explicit post-MVP skip reason;
+- verification command output.
+
+## Failure-mode matrix
+
+| Failure mode | Required expected behavior |
+|---|---|
+| Unknown newer schema version | Reject/fail closed; no downgrade mutation. |
+| Malformed event or conflicting duplicate | Reject replay/commit; preserve prior snapshot. |
+| Missing dependency or cycle | Reject graph validation; no readiness projection. |
+| Failed dependency | Block downstream tasks unless an explicit repair supersedes it. |
+| Duplicate active claim | Reject second claim; preserve original owner and lease. |
+| Expired lease completion | Reject completion and require explicit recovery. |
+| Stale worker heartbeat | Mark worker unhealthy/stale without completing or deleting the task. |
+| Late duplicate worker report | Reject reports for non-`in_progress` tasks without mutating terminal state. |
+| Missing evidence refs | Reject task, phase, and run completion. |
+| Non-deterministic policy selector | Fail test; identical inputs must produce identical selected policy and score. |
+| Reward/evaluation non-finite values | Reject record serialization and policy scoring. |
+| Conflict during integration dry-run | Record blocked/repair state; no hidden merge or branch mutation. |
+| Unsafe artifact or storage root | Refuse read/write; produce diagnostic blocker. |
+| Retained or stale worker handle | Preserve inspectability; no cleanup unless explicit safe-close policy applies. |
+
+## Child-issue closeout evidence
+
+| Issue | Minimum evidence before close |
+|---:|---|
+| #185 | runtime schema tests, unknown-newer fail-closed test, successful and repair-required examples. |
+| #186 | event taxonomy tests, replay/idempotency/conflict tests, sealed mutation boundary. |
+| #188 | adapter/substrate matrix, fake adapter pass, compatibility docs for Codex/Pi/Claude Code. |
+| #194 | 5-task graph fixture, readiness reasons, dependency failure blocking, graph validation failures. |
+| #197 | duplicate claim race, expired lease, renewal/release, stale recovery, evidence-gated completion. |
+| #196 | two-worker dispatch fixture, heartbeat/stale projection, structured report capture, evidence-gated completion, stale/late report rejection. |
+| #191 | this matrix linked from README and tested for unit/integration/E2E/failure/fixtures/closeout coverage. |
+| #189 | RewardRecord, PredictionDelta, penalties, advisory PolicyHint, and no silent policy mutation. |
+| #187 | deterministic simulator features, exploration caps, blocked policies, override trail. |
+| #190 | repair/supersession semantics, budget limits, failure taxonomy, no hidden mutation. |
+| #195 | IntegrationCandidate refs, dry-run evidence, conflict fixture, repair-loop fixture. |
+
+## Readiness checklist
+
+- [x] Unit test matrix is defined.
+- [x] Integration test matrix is defined.
+- [x] E2E smoke path is defined.
+- [x] Failure-mode tests are defined.
+- [x] Required fixtures/artifacts are listed.
+- [x] Minimum evidence for closing each child issue is documented.
+
+Default/runtime-readiness claims remain blocked until the matrix has executable coverage or recorded smoke evidence for every MVP-critical row.
diff --git a/test/learning-runtime-verification-matrix.test.ts b/test/learning-runtime-verification-matrix.test.ts
@@ -0,0 +1,48 @@
+import * as assert from "node:assert/strict";
+import { readFile } from "node:fs/promises";
+import { test } from "node:test";
+
+const docPath = "docs/learning-runtime-verification-matrix.md";
+
+async function doc(): Promise<string> {
+  return readFile(docPath, "utf8");
+}
+
+test("learning runtime verification matrix covers required verification layers", async () => {
+  const text = await doc();
+
+  for (const heading of ["## Unit test matrix", "## Integration test matrix", "## E2E smoke path", "## Failure-mode matrix", "## Child-issue closeout evidence"]) {
+    assert.match(text, new RegExp(heading));
+  }
+
+  for (const requiredArea of [
+    "Runtime schema validation",
+    "Runtime event replay",
+    "TaskGraph readiness",
+    "Claim and lease ownership",
+    "Worker execution state",
+    "Policy simulation determinism",
+    "RewardRecord and PolicyHint",
+    "IntegrationCandidate and repair",
+    "Retention and safe close",
+    "`.ilchul` compatibility",
+  ]) {
+    assert.match(text, new RegExp(requiredArea.replace(/[.*+?^${}()|[\]\\]/g, "\\$&")));
+  }
+});
+
+test("learning runtime verification matrix defines concrete smoke evidence and closeout gates", async () => {
+  const [readme, text] = await Promise.all([readFile("README.md", "utf8"), doc()]);
+
+  assert.match(readme, /docs\/learning-runtime-verification-matrix\.md/);
+
+  for (const artifact of ["state.json", "events.jsonl", "worker report", "objective/evaluation", "reward-ledger", "integration dry-run"]) {
+    assert.match(text, new RegExp(artifact));
+  }
+
+  for (const issue of ["#185", "#186", "#188", "#194", "#197", "#196", "#191", "#189", "#187", "#190", "#195"]) {
+    assert.match(text, new RegExp(issue));
+  }
+
+  assert.match(text, /Default\/runtime-readiness claims remain blocked/);
+});