Skip to content

Enhancement: promote durable lessons onto enforced surfaces (close the capture→behavior gap) #866

@Neikan-BSN

Description

@Neikan-BSN

Summary

Compounding engineering only compounds if a captured lesson reliably becomes
changed behavior. Today the plugin mechanizes every step of the knowledge
pipeline except that last one:

  • Capture is mechanized — ce:compound writes a structured lesson to docs/solutions/.
  • Maintenance is mechanized — ce:compound-refresh keeps/updates/replaces/archives within docs/solutions/.
  • Retrieval is mechanized — learnings-researcher feeds past lessons into ce:plan / ce:review.
  • Promotion to enforced behavior is operator-judgment-only. Nothing decides whether a lesson is durable enough to stop being a doc you might re-read and start being a rule that fires whether or not anyone re-reads it.

The result is an asymmetry: the automation arc runs capture → maintain →
retrieve, then stops at the surface where lessons stay retrieval-dependent. A
lesson only changes behavior if an agent happens to retrieve it and chooses to
apply it. For lessons that should be non-optional, that is the wrong failure
mode.

The plugin already touches the memory surface in one direction: ce:compound's
Phase 0.5 auto-memory scan reads MEMORY.md as context for the writeup.
This proposal extends that adjacency the other way — let ce:compound also route
durable lessons out toward memory and other enforcement rungs.

The proposal: a promotion/escalation step

Add one step to the compound flow. After ce:compound writes the lesson, a
promotion triage judges how durably the behavior change must hold, then
routes the lesson to one rung of an enforcement ladder (the ladder below is
illustrative — the contribution is the triage step, not this exact ordering),
increasing in enforcement strength and decreasing in retrieval-dependence:

captured lesson (docs/solutions/)
        │  promotion triage: how durable must this behavior change be?
        ▼
  rung 1  memory / MEMORY.md pointer        — recalled, still optional
  rung 2  always-loaded rule (CLAUDE.md/AGENTS.md) — read every session
  rung 3  hook (pre/post tool-use)          — fires mechanically
  rung 4  automated/contract test           — fails the build if violated
  rung 5  skill                             — becomes a reusable capability

The default for most lessons stays "leave it in docs/solutions/" — promotion is
the exception, not the rule. The triage answers a single question: is this
lesson durable/recurrent enough that retrieval-dependence is an unacceptable
failure mode?
If yes, it picks the lowest rung that removes that risk.

The triage heuristic: recurrence is the strongest promote-signal

The hard part is deciding what to promote without promoting everything (which
would bloat the enforced surfaces and erode adherence). The signal we found most
reliable is recurrence — a lesson that has bitten more than once — with
blast-radius as a secondary signal. Recurrence is a better trigger than severity
or blast-radius alone because it is observed, not predicted: it distinguishes
"this will keep happening" from "this looked scary once."

This heuristic is earned by evidence, not asserted — see below.

Evidence: worked instances (both halves of the heuristic)

From running this loop in a personal multi-repo fleet (CE across several Active
Projects, prototype living in shared always-loaded agent guidance):

  1. Promote (loop closed). A recurring lesson — "registered / imported /
    tested" proves a tool's wiring EXISTS, not that it adds VALUE; don't
    rubber-stamp wired-up things when culling
    — kept biting across cull rounds.
    We promoted it to a memory + an always-loaded rule (rungs 1–2). The very next
    round, the agent's behavior changed without anyone re-reading the doc: a
    pass that had culled 0 tools on a load-bearing-only test re-vetted to 6 culls.
    Recurrence triggered promotion; promotion closed the loop.
  2. Withhold (loop correctly left open). A different lesson — a
    symlink-assumption bug in a harness-retirement script (it treated a path as a
    real directory when it was a symlink) — was captured but had not yet
    recurred
    . The recurrence heuristic correctly deferred promotion: it sits
    at rung 1 (memory) on a wait-and-see basis. If it recurs it escalates to an
    enforced rung (e.g. a test); if it never does, it never costs enforcement.
    This is the heuristic correctly withholding.

Together these show the trigger working in both directions — promoting what has
proven recurrent, withholding what has not.

A minimal forcing function (so promotion isn't silently skipped)

Promotion is only valuable if it actually happens and actually lands. The following additions
make that auditable without building a telemetry subsystem:

  • A minimal index of promoted lessons — enough fields to answer "what's
    promoted, to which rung, and when was it last checked": roughly captured-date,
    rung, status, last_reviewed.
    (Exact schema intentionally left open.)
  • A periodic effectiveness retrospective — a passive nudge to review whether
    each promoted change actually realized in behavior, and to escalate /
    de-escalate / cull the rung accordingly. This needs some forcing function;
    the right cadence depends on a project's lesson velocity, so it should be
    user-tunable rather than fixed. (Our fleet uses a hybrid "N-new-lessons OR
    M-days, whichever first" trigger as one data point — not a recommendation.)

This is a retrospective, not active monitoring — the loop is closed by a periodic
audit, not by watching every lesson continuously.

Where it fits (recommendation + design space)

We recommend a new phase in ce:compound, immediately after the lesson is
written
— context is freshest there, and the triage is a natural continuation
of capture. Alternatives in the design space:

  • A step inside ce:compound-refresh (promotion as a maintenance action
    alongside keep/update/replace/archive).
  • A dedicated sibling skill (ce:promote-learning) invoked after compound.

We lean toward the in-ce:compound phase because our own prototype lives as the
rung-2 outcome (an always-loaded rule), which is exactly "promote while the
lesson is fresh."

Prototype / prior art (ours)

We run this today as a written workflow definition ("promotion triage → one rung
of {memory · always-loaded rule · hook · contract test · skill}") in our fleet's
always-loaded agent guidance — i.e. the design itself sits at rung 2 of its own
ladder. It is definition-first; the index + retrospective mechanization is in
progress. We're offering the shape as a working spec, not asking the plugin to
adopt our file layout.

Relation to prior issues

Scope / non-goals

  • One node, not a subsystem. A triage step + a minimal index + a
    retrospective nudge. Explicitly not Closed-Loop Self-Improvement System #171's telemetry/scoring/auto-tuning.
  • Promotion is the exception. Most lessons stay in docs/solutions/; the
    default is not to promote.
  • The triage is a judgment node, performed by the agent/operator at compound
    time — not a fully automated classifier. Recurrence is the headline signal, not
    a hard rule.
  • No fixed cadence or schema mandated. The index fields and retrospective
    cadence are left to the implementation / user configuration.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions