Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,20 @@ Format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and

Types of changes: **Added**, **Changed**, **Deprecated**, **Removed**, **Fixed**, **Security**.

## [Unreleased]

### Added

- Recovery protocol (`reference/recovery.md`): formal process for closing gaps in specification, tests, or coverage after code already exists. Covers discovery triggers, gap classification (spec gap, mold gap, coverage erosion, contract gap), severity-based triage, the five-step recovery sequence (audit, spec patch, mold patch, recast, re-review), kanban integration, recurrence prevention, and health metrics.
- Gap assessment template (`harness/templates/gap-assessment-template.md`): structured template for documenting discovered gaps during recovery.
- Recovery acknowledgment in `MANIFESTO.md` Section VII (The Fracture Lines): recognizes that gaps are inevitable and defines recovery as the methodology applied in reverse.
- Recovery procedure in `harness/PLAYBOOK.md`: expanded Iteration and Feedback section with recovery sequence, exit criteria, human and agent responsibilities, and kanban treatment.
- Agent rules R11 (halt and report gaps) and R12 (recovery task behavior) in `harness/agent/AGENT_RULES.md`.
- Retroactive gap discovery failure mode in `reference/agent-operating-model.md`.
- Retroactive Gap Discovery section in `reference/workflow.md` feedback loops.
- Recovery terms added to `reference/glossary.md`: contract gap, coverage erosion, gap assessment, mold gap, recovery, spec gap.
- Recovery constraint and terminology added to `harness/.cursor/rules/codex-automata.mdc`.

## [0.1.0] - 2026-05-11

### Added
Expand Down
8 changes: 8 additions & 0 deletions MANIFESTO.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,14 @@ There is also the question of scale. A solo developer building a weekend project

The deepest limitation is the cold start problem. Writing good specs requires domain knowledge, architectural taste, and the hard-won intuition that comes from years of building systems that failed. You cannot spec what you do not understand. Junior engineers cannot be dropped into the Spec Writing station and expected to produce sharp molds. They must first learn what good looks like, which means, paradoxically, they may need to write bad code, debug it, and internalize the failure modes before they can specify well. The methodology does not eliminate the need for experience. It concentrates experience where it matters most.

There is one more fracture line, subtler than the others because it lives inside the methodology itself. The pipeline assumes that specifications and tests are written before code. In practice, even disciplined teams discover gaps after the fact. A production incident reveals a failure mode that no one specified. A coverage audit exposes a module with tests that were disabled months ago and never restored. A new engineer walks through the codebase and finds behavior that exists in code but in no specification anywhere.

The instinct in these moments is to fix the immediate problem. Add a test for the failure mode. Re-enable the disabled tests. Move on. This instinct is dangerous because it inverts the pipeline. A test written to match existing code is not a mold. It is a tracing. It encodes whatever the code happens to do, correct or not, and calls it specified. The gap in the specification remains, invisible but load-bearing.

The correct response is the same response the methodology prescribes for forward work, applied in reverse. When a gap is discovered, trace it back to its root. If the specification is missing, write it. If the specification exists but the mold does not, derive the tests. If the mold existed but eroded, restore it from the specification, not from the code. Then verify or recast the implementation against the repaired mold. Recovery follows the pipeline. It simply enters at a different point.

This matters because recovery is not an exception. It is a permanent feature of real systems. No methodology eliminates gaps entirely. What a methodology can do is define how gaps are found, classified, and closed with the same discipline that governs forward work. A system that can only build forward is fragile. A system that can also recover is resilient. The mold must be inspected and maintained, not just built once and trusted forever.

None of these limitations invalidate the approach. They define its scope. Codex Automata is a methodology for engineering production systems in an era when implementation is cheap and specification is the binding constraint. Within that scope, it is precise.

---
Expand Down
17 changes: 16 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ codex-automata/
| | |-- hooks.json Event-driven enforcement
| |-- .github/ GitHub CI and templates
| |-- agent/ Detailed agent operating rules
| |-- templates/ All project templates
| |-- templates/ All project templates (spec, test, ADR, contract, task, review, gap assessment)
| |-- docs/ Empty project docs directory
| |-- tests/ Empty project tests directory
| |-- tasks/ Empty agent tasks directory
Expand All @@ -116,6 +116,7 @@ codex-automata/
| |-- architecture.md Architecture patterns and guidance
| |-- kanban.md Flow-based project management
| |-- agent-operating-model.md How agents operate
| |-- recovery.md Recovery protocol for closing gaps
| |-- glossary.md Terminology reference
|
|-- examples/ WORKED EXAMPLES (read, don't copy)
Expand Down Expand Up @@ -143,6 +144,20 @@ Specification --> Tests --> Code

If the casting is defective, fix the mold. If the mold is wrong, fix the specification. Do not debug the implementation directly.

## When Gaps Are Discovered

Real projects discover gaps after code exists: a production incident exposes an unspecified failure mode, a review reveals missing tests, or a new team member finds a module with no contract tests. Codex Automata defines a formal recovery protocol for these situations.

Recovery follows the same pipeline as forward work, applied retroactively:

1. **Audit** the gap using `templates/gap-assessment-template.md`.
2. **Patch the spec** from domain knowledge (not from the existing code).
3. **Patch the mold** by deriving tests from the specification.
4. **Recast** the implementation if the new tests fail.
5. **Re-review** the complete recovery unit.

Recovery tasks are first-class kanban work items, not invisible tech debt. See `reference/recovery.md` for the full protocol.

## How Agents Operate

Agents working in a Codex Automata project follow strict rules (enforced by `.cursor/rules/` and `AGENTS.md`):
Expand Down
1 change: 1 addition & 0 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ This document sketches planned evolution of the Codex Automata methodology harne
## Current

- **Version 0.1.0**: Baseline harness: specification doctrine, molds and casting metaphors across docs, bounded context guidance, agent task boundaries, interface contract discipline, Cursor integration (rules, skills, subagents, hooks), templates, examples, GitHub-oriented quality gate patterns.
- **Recovery protocol**: Formal process for closing gaps in specification, tests, or coverage after code exists. Includes gap classification taxonomy, severity-based triage, recovery sequence (audit, spec patch, mold patch, recast, re-review), kanban integration, gap assessment template, recurrence prevention, and health metrics. Wired into manifesto, playbook, agent rules, Cursor rules, and glossary.

## v0.2.0 (goals)

Expand Down
2 changes: 1 addition & 1 deletion harness/.cursor/hooks.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"afterFileEdit": [
{
"type": "prompt",
"prompt": "A source file was just edited. Verify: (1) Does a specification exist for the module this file belongs to? (2) Do tests exist that cover the behavior being changed? If either is missing, remind the user that Codex Automata requires specifications and tests before implementation. Do not block the edit, just note the gap if one exists. Here is the edit context: $ARGUMENTS",
"prompt": "A source file was just edited. Verify: (1) Does a specification exist for the module this file belongs to? (2) Do tests exist that cover the behavior being changed? If either is missing, remind the user that Codex Automata requires specifications and tests before implementation. If a gap exists, recommend creating a gap assessment using templates/gap-assessment-template.md to document it for triage and recovery. Do not block the edit, but clearly surface the gap and the recommended next step. Here is the edit context: $ARGUMENTS",
"timeout": 15
}
]
Expand Down
6 changes: 4 additions & 2 deletions harness/.cursor/rules/codex-automata.mdc
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,17 @@ This project follows the Codex Automata methodology: Documentation first, Tests
5. Prefer small, atomic commits traceable to specification sections.
6. Surface ambiguity instead of guessing.
7. Every task must map to a specification section and test case.
8. When you discover code without a corresponding specification or tests, halt and report the gap. Do not work around it. Recommend documenting it with `templates/gap-assessment-template.md` for triage and recovery.

## Workflow

- Locate the specification before writing code.
- Locate the test plan before implementing.
- If spec or tests are missing, stop and report before proceeding.
- If spec or tests are missing, stop and report before proceeding. Recommend a gap assessment for triage.
- During recovery tasks (closing gaps in existing code), derive specifications from domain knowledge and tests from specifications. Never derive specs or tests from the existing code.

## Terminology

Use these terms consistently: specification, mold, casting, bounded context, interface contract, quality gate, agent task, human review, flow.
Use these terms consistently: specification, mold, casting, bounded context, interface contract, quality gate, agent task, human review, flow, recovery, gap assessment, spec gap, mold gap, coverage erosion, contract gap.

See `agent/AGENT_RULES.md` for the complete operating manual.
70 changes: 70 additions & 0 deletions harness/PLAYBOOK.md
Original file line number Diff line number Diff line change
Expand Up @@ -371,6 +371,75 @@ The pipeline is not strictly linear. Review can send work back to any earlier ph

The key constraint is that backward movement always starts at the specification. If code is wrong, do not debug the implementation. Fix the spec, fix the tests, recast.

---

## Recovery: Closing Gaps After the Fact

The forward pipeline assumes specs and tests exist before code. When you discover they do not, recovery applies the same pipeline retroactively. Recovery is not an exception or a side project. It is first-class work that flows through the same kanban stations as forward work.

For the full recovery protocol, classification taxonomy, triage guidance, and metrics, see the [Recovery Protocol](https://github.com/0xhackerfren/Codex-Automata/blob/main/reference/recovery.md) in the Codex Automata repository.

### When Recovery Applies

Recovery applies whenever you discover that code exists without the upstream artifacts the methodology requires:

- A module has no specification, or the specification is incomplete.
- A specification has no tests, or tests are too weak to constrain the implementation.
- Tests were deleted, disabled, or made flaky without remediation.
- A module boundary has no contract tests despite a defined interface contract.

These gaps are discovered through production incidents, review findings, coverage audits, team walkthroughs, dependency upgrades, security scans, or agent-detected gaps during routine tasks.

### Recovery Sequence

Recovery mirrors the forward pipeline but starts from an existing codebase.

```text
Audit --> Spec Patch --> Mold Patch --> Recast (if needed) --> Re-review
```

**Step 1: Audit.** Document the gap using `templates/gap-assessment-template.md`. Identify the affected module, gap class, discovery trigger, severity, and current state. This is a human task; agents assist with evidence gathering.

**Step 2: Spec Patch.** If the specification is missing or incomplete, write the missing sections following Phase 2 rules. Derive the specification from domain knowledge and stakeholder intent, not from the existing code. The code may be accidentally correct or silently wrong.

**Step 3: Mold Patch.** Derive tests from the patched specification following Phase 3 rules. Tests must trace back to specification sections. For coverage erosion, compare against the original test intent via version control history before restoring.

**Step 4: Recast (if needed).** If the existing implementation passes the new tests, no recast is needed. If it fails, recast following Phase 4 rules. Agents receive the specification, updated tests, and interface contracts.

**Step 5: Re-review.** A human reviews the recovery as a unit: spec patch, mold patch, and any recast code. The review confirms spec accuracy, test traceability, implementation correctness, and that no new gaps were introduced.

### Recovery Exit Criteria

- [ ] The gap assessment document is complete.
- [ ] The specification is updated and covers the previously missing behavior.
- [ ] Tests exist for every specified behavior and trace to specification sections.
- [ ] All tests pass.
- [ ] A human has reviewed and approved the recovery unit.
- [ ] The recurrence prevention section of the gap assessment is filled in.

### Human Responsibilities During Recovery

- Triage discovered gaps by severity and schedule them on the board.
- Write or approve specification patches. Specification authority remains human-owned.
- Review the complete recovery unit before closing the card.
- Fill in the recurrence prevention section: what process gap allowed this debt to accumulate?

### Agent Responsibilities During Recovery

- When a gap is discovered during routine work, halt and report it using the gap assessment template. Do not silently work around gaps.
- During recovery tasks, follow the same forward rules (R1-R10) in the context of an existing codebase.
- Assist with evidence gathering: scan for related gaps, check version control history, surface specification sections that reference the affected behavior.
- Derive tests from the specification, not from the existing code.
- If the specification appears incorrect (code behavior contradicts it and the code is believed correct), surface the conflict for human resolution. Do not update the specification unilaterally.

### Recovery on the Kanban Board

Recovery cards use a distinct card type or tag ("Recovery" or "Gap Remediation") and flow through the same stations as forward work. They count against the same WIP limits. If a critical recovery card displaces forward work, that tradeoff is visible on the board.

Batch related gaps within a single module into one recovery card. Create separate cards for separate modules to maintain bounded context independence.

---

## Quick Reference

| Phase | Primary Owner | Bottleneck? | Key Template |
Expand All @@ -382,3 +451,4 @@ The key constraint is that backward movement always starts at the specification.
| 4: Code Casting | Agent | No | `agent-task-template.md` |
| 5: Review | Human | **Yes** | `human-review-template.md` |
| 6: Deployment | Human + Agent | No | N/A |
| Recovery | Human + Agent | No | `gap-assessment-template.md` |
4 changes: 4 additions & 0 deletions harness/agent/AGENT_RULES.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,10 @@ R9. Every agent task must map to a specification section and at least one test c

R10. Do not introduce external dependencies not specified in the architecture documents.

R11. When you discover code without a corresponding specification or tests, halt and report the gap. Do not silently work around it, do not write tests derived from the code, and do not treat unspecified behavior as intentional. Report the gap using the gap assessment template (`templates/gap-assessment-template.md`) so a human can triage and schedule recovery.

R12. During recovery tasks, follow rules R1-R11 in the context of an existing codebase. Derive specifications from domain knowledge and stakeholder intent, not from the current implementation. Derive tests from the specification, not from the code. If the specification and the code conflict, surface the conflict for human resolution.

## 3. Task Execution Protocol

- Step 1: Read the agent task definition and locate the specification reference.
Expand Down
99 changes: 99 additions & 0 deletions harness/templates/gap-assessment-template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# Gap Assessment: [Module Name]

<!--
[Use this template when a gap is discovered in specification, tests, or coverage for an existing module. Fill in each section to document the gap, plan recovery, and prevent recurrence. For the full recovery protocol, see https://github.com/0xhackerfren/Codex-Automata/blob/main/reference/recovery.md]
-->

## Metadata

| Field | Value |
|-------|-------|
| Module name | [ ] |
| Assessed by | [ ] |
| Date discovered | [YYYY-MM-DD] |
| Severity | Critical / Significant / Moderate / Low [pick one] |
| Status | Open / In Recovery / Closed [pick one] |

## Gap Classification

[Pick one. Definitions: spec gap = behavior in code but not in spec; mold gap = behavior in spec but no tests; coverage erosion = tests lost over time; contract gap = boundary lacks contract tests.]

- [ ] **Spec gap:** behavior exists in code but is not documented in any specification.
- [ ] **Mold gap:** behavior is documented in the specification but has no tests, or tests are too weak.
- [ ] **Coverage erosion:** tests existed but were deleted, disabled, or made flaky without remediation.
- [ ] **Contract gap:** a module boundary has no contract tests despite a defined interface contract.

## Discovery Trigger

[How was this gap found? Pick one or describe.]

- [ ] Production incident
- [ ] Review finding
- [ ] Coverage audit
- [ ] Team walkthrough
- [ ] Dependency upgrade
- [ ] Security scan
- [ ] Agent-detected during routine task
- [ ] Other: [ ]

[If incident-related, link the incident report or postmortem.]

## Current State

[What exists today? Describe the specification, tests, and code as they currently stand.]

- **Specification:** [complete / partial / missing; cite relevant spec document and sections if they exist]
- **Tests:** [present / partial / missing / disabled; cite test files if they exist]
- **Code:** [describe the behavior that exists without adequate upstream artifacts]
- **Contract tests:** [present / missing; cite the interface contract document]

## Required State

[What should exist according to the methodology? Be specific.]

- **Specification should cover:** [list the behaviors, edge cases, and failure modes that need to be specified]
- **Tests should cover:** [list the test cases that need to exist, traced to specification sections]
- **Implementation changes (if any):** [describe expected recast scope, or note "none expected" if the code is believed correct]

## Recovery Plan

[Specific tasks to close this gap. Each task should map to a recovery sequence step.]

1. **Spec patch:** [ ]
2. **Mold patch:** [ ]
3. **Recast (if needed):** [ ]
4. **Re-review:** [ ]

**Estimated effort:** [ ]

**Assigned to:** [ ]

## Recurrence Prevention

[What process gap allowed this debt to accumulate? What change prevents this class of gap from recurring?]

- **Root cause:** [ ]
- **Process change:** [ ]

<!--
Common root causes and remediation:
- Review checklist did not require coverage verification --> add the check to the review and PR templates.
- Contract tests were not required at the boundary --> update architecture and module boundary templates.
- Spec was reviewed by someone unfamiliar with the domain --> establish domain-owner review requirements.
- Tests were disabled to unblock a deadline --> require tracking issues with due dates for disabled tests.
- Module predates the methodology --> schedule systematic audit of pre-methodology modules.
- Emergency bypass of the forward pipeline --> ensure bypass policy includes mandatory recovery scheduling.
-->

## Resolution

[Fill in when the gap is closed.]

| Field | Value |
|-------|-------|
| Date closed | [YYYY-MM-DD] |
| Reviewed by | [ ] |
| Spec patch commit/PR | [ ] |
| Mold patch commit/PR | [ ] |
| Recast commit/PR | [N/A or link] |
| Review approval | [ ] |
Loading