You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Compounding engineering only compounds if a captured lesson reliably becomes changed behavior. Today the plugin mechanizes every step of the knowledge
pipeline except that last one:
Capture is mechanized — ce:compound writes a structured lesson to docs/solutions/.
Maintenance is mechanized — ce:compound-refresh keeps/updates/replaces/archives within docs/solutions/.
Retrieval is mechanized — learnings-researcher feeds past lessons into ce:plan / ce:review.
Promotion to enforced behavior is operator-judgment-only. Nothing decides whether a lesson is durable enough to stop being a doc you might re-read and start being a rule that fires whether or not anyone re-reads it.
The result is an asymmetry: the automation arc runs capture → maintain →
retrieve, then stops at the surface where lessons stay retrieval-dependent. A
lesson only changes behavior if an agent happens to retrieve it and chooses to
apply it. For lessons that should be non-optional, that is the wrong failure
mode.
The plugin already touches the memory surface in one direction: ce:compound's Phase 0.5 auto-memory scan reads MEMORY.md as context for the writeup.
This proposal extends that adjacency the other way — let ce:compound also route
durable lessons out toward memory and other enforcement rungs.
The proposal: a promotion/escalation step
Add one step to the compound flow. After ce:compound writes the lesson, a promotion triage judges how durably the behavior change must hold, then
routes the lesson to one rung of an enforcement ladder (the ladder below is
illustrative — the contribution is the triage step, not this exact ordering),
increasing in enforcement strength and decreasing in retrieval-dependence:
captured lesson (docs/solutions/)
│ promotion triage: how durable must this behavior change be?
▼
rung 1 memory / MEMORY.md pointer — recalled, still optional
rung 2 always-loaded rule (CLAUDE.md/AGENTS.md) — read every session
rung 3 hook (pre/post tool-use) — fires mechanically
rung 4 automated/contract test — fails the build if violated
rung 5 skill — becomes a reusable capability
The default for most lessons stays "leave it in docs/solutions/" — promotion is
the exception, not the rule. The triage answers a single question: is this
lesson durable/recurrent enough that retrieval-dependence is an unacceptable
failure mode? If yes, it picks the lowest rung that removes that risk.
The triage heuristic: recurrence is the strongest promote-signal
The hard part is deciding what to promote without promoting everything (which
would bloat the enforced surfaces and erode adherence). The signal we found most
reliable is recurrence — a lesson that has bitten more than once — with
blast-radius as a secondary signal. Recurrence is a better trigger than severity
or blast-radius alone because it is observed, not predicted: it distinguishes
"this will keep happening" from "this looked scary once."
This heuristic is earned by evidence, not asserted — see below.
Evidence: worked instances (both halves of the heuristic)
From running this loop in a personal multi-repo fleet (CE across several Active
Projects, prototype living in shared always-loaded agent guidance):
Promote (loop closed). A recurring lesson — "registered / imported /
tested" proves a tool's wiring EXISTS, not that it adds VALUE; don't
rubber-stamp wired-up things when culling — kept biting across cull rounds.
We promoted it to a memory + an always-loaded rule (rungs 1–2). The very next
round, the agent's behavior changed without anyone re-reading the doc: a
pass that had culled 0 tools on a load-bearing-only test re-vetted to 6 culls.
Recurrence triggered promotion; promotion closed the loop.
Withhold (loop correctly left open). A different lesson — a
symlink-assumption bug in a harness-retirement script (it treated a path as a
real directory when it was a symlink) — was captured but had not yet
recurred. The recurrence heuristic correctly deferred promotion: it sits
at rung 1 (memory) on a wait-and-see basis. If it recurs it escalates to an
enforced rung (e.g. a test); if it never does, it never costs enforcement.
This is the heuristic correctly withholding.
Together these show the trigger working in both directions — promoting what has
proven recurrent, withholding what has not.
A minimal forcing function (so promotion isn't silently skipped)
Promotion is only valuable if it actually happens and actually lands. The following additions
make that auditable without building a telemetry subsystem:
A minimal index of promoted lessons — enough fields to answer "what's
promoted, to which rung, and when was it last checked": roughly captured-date,
rung, status, last_reviewed. (Exact schema intentionally left open.)
A periodic effectiveness retrospective — a passive nudge to review whether
each promoted change actually realized in behavior, and to escalate /
de-escalate / cull the rung accordingly. This needs some forcing function;
the right cadence depends on a project's lesson velocity, so it should be
user-tunable rather than fixed. (Our fleet uses a hybrid "N-new-lessons OR
M-days, whichever first" trigger as one data point — not a recommendation.)
This is a retrospective, not active monitoring — the loop is closed by a periodic
audit, not by watching every lesson continuously.
Where it fits (recommendation + design space)
We recommend a new phase in ce:compound, immediately after the lesson is
written — context is freshest there, and the triage is a natural continuation
of capture. Alternatives in the design space:
A step inside ce:compound-refresh (promotion as a maintenance action
alongside keep/update/replace/archive).
A dedicated sibling skill (ce:promote-learning) invoked after compound.
We lean toward the in-ce:compound phase because our own prototype lives as the
rung-2 outcome (an always-loaded rule), which is exactly "promote while the
lesson is fresh."
Prototype / prior art (ours)
We run this today as a written workflow definition ("promotion triage → one rung
of {memory · always-loaded rule · hook · contract test · skill}") in our fleet's
always-loaded agent guidance — i.e. the design itself sits at rung 2 of its own
ladder. It is definition-first; the index + retrospective mechanization is in
progress. We're offering the shape as a working spec, not asking the plugin to
adopt our file layout.
Relation to prior issues
Closed-Loop Self-Improvement System #171 (Closed-Loop Self-Improvement, closed NOT_PLANNED) — Closed-Loop Self-Improvement System #171 proposed a
broad self-improving subsystem: auto-capture, agent scoring, execution
telemetry, prompt evolution, dashboards. This proposal is deliberately not
that. No telemetry, no scoring, no dashboards, no auto-tuning. More
importantly, every loop-closing mechanism in Closed-Loop Self-Improvement System #171 (auto-capture + retrieval +
review-agent prompt-tuning) leaves a lesson in the retrieval-dependent layer —
it makes capture cheaper and retrieval likelier, but a durable lesson still
only changes behavior if re-read. The promotion-to-enforcement rung is the one
thing Closed-Loop Self-Improvement System #171 did not cover, and it is a single node, not a subsystem — which is
also why it may be a more tractable scope than Closed-Loop Self-Improvement System #171 was.
One node, not a subsystem. A triage step + a minimal index + a
retrospective nudge. Explicitly not Closed-Loop Self-Improvement System #171's telemetry/scoring/auto-tuning.
Promotion is the exception. Most lessons stay in docs/solutions/; the
default is not to promote.
The triage is a judgment node, performed by the agent/operator at compound
time — not a fully automated classifier. Recurrence is the headline signal, not
a hard rule.
No fixed cadence or schema mandated. The index fields and retrospective
cadence are left to the implementation / user configuration.
Summary
Compounding engineering only compounds if a captured lesson reliably becomes
changed behavior. Today the plugin mechanizes every step of the knowledge
pipeline except that last one:
ce:compoundwrites a structured lesson todocs/solutions/.ce:compound-refreshkeeps/updates/replaces/archives withindocs/solutions/.learnings-researcherfeeds past lessons intoce:plan/ce:review.The result is an asymmetry: the automation arc runs capture → maintain →
retrieve, then stops at the surface where lessons stay retrieval-dependent. A
lesson only changes behavior if an agent happens to retrieve it and chooses to
apply it. For lessons that should be non-optional, that is the wrong failure
mode.
The plugin already touches the memory surface in one direction:
ce:compound'sPhase 0.5 auto-memory scan reads
MEMORY.mdas context for the writeup.This proposal extends that adjacency the other way — let
ce:compoundalso routedurable lessons out toward memory and other enforcement rungs.
The proposal: a promotion/escalation step
Add one step to the compound flow. After
ce:compoundwrites the lesson, apromotion triage judges how durably the behavior change must hold, then
routes the lesson to one rung of an enforcement ladder (the ladder below is
illustrative — the contribution is the triage step, not this exact ordering),
increasing in enforcement strength and decreasing in retrieval-dependence:
The default for most lessons stays "leave it in
docs/solutions/" — promotion isthe exception, not the rule. The triage answers a single question: is this
lesson durable/recurrent enough that retrieval-dependence is an unacceptable
failure mode? If yes, it picks the lowest rung that removes that risk.
The triage heuristic: recurrence is the strongest promote-signal
The hard part is deciding what to promote without promoting everything (which
would bloat the enforced surfaces and erode adherence). The signal we found most
reliable is recurrence — a lesson that has bitten more than once — with
blast-radius as a secondary signal. Recurrence is a better trigger than severity
or blast-radius alone because it is observed, not predicted: it distinguishes
"this will keep happening" from "this looked scary once."
This heuristic is earned by evidence, not asserted — see below.
Evidence: worked instances (both halves of the heuristic)
From running this loop in a personal multi-repo fleet (CE across several Active
Projects, prototype living in shared always-loaded agent guidance):
tested" proves a tool's wiring EXISTS, not that it adds VALUE; don't
rubber-stamp wired-up things when culling — kept biting across cull rounds.
We promoted it to a memory + an always-loaded rule (rungs 1–2). The very next
round, the agent's behavior changed without anyone re-reading the doc: a
pass that had culled 0 tools on a load-bearing-only test re-vetted to 6 culls.
Recurrence triggered promotion; promotion closed the loop.
symlink-assumption bug in a harness-retirement script (it treated a path as a
real directory when it was a symlink) — was captured but had not yet
recurred. The recurrence heuristic correctly deferred promotion: it sits
at rung 1 (memory) on a wait-and-see basis. If it recurs it escalates to an
enforced rung (e.g. a test); if it never does, it never costs enforcement.
This is the heuristic correctly withholding.
Together these show the trigger working in both directions — promoting what has
proven recurrent, withholding what has not.
A minimal forcing function (so promotion isn't silently skipped)
Promotion is only valuable if it actually happens and actually lands. The following additions
make that auditable without building a telemetry subsystem:
promoted, to which rung, and when was it last checked": roughly captured-date,
rung, status, last_reviewed. (Exact schema intentionally left open.)
each promoted change actually realized in behavior, and to escalate /
de-escalate / cull the rung accordingly. This needs some forcing function;
the right cadence depends on a project's lesson velocity, so it should be
user-tunable rather than fixed. (Our fleet uses a hybrid "N-new-lessons OR
M-days, whichever first" trigger as one data point — not a recommendation.)
This is a retrospective, not active monitoring — the loop is closed by a periodic
audit, not by watching every lesson continuously.
Where it fits (recommendation + design space)
We recommend a new phase in
ce:compound, immediately after the lesson iswritten — context is freshest there, and the triage is a natural continuation
of capture. Alternatives in the design space:
ce:compound-refresh(promotion as a maintenance actionalongside keep/update/replace/archive).
ce:promote-learning) invoked after compound.We lean toward the in-
ce:compoundphase because our own prototype lives as therung-2 outcome (an always-loaded rule), which is exactly "promote while the
lesson is fresh."
Prototype / prior art (ours)
We run this today as a written workflow definition ("promotion triage → one rung
of {memory · always-loaded rule · hook · contract test · skill}") in our fleet's
always-loaded agent guidance — i.e. the design itself sits at rung 2 of its own
ladder. It is definition-first; the index + retrospective mechanization is in
progress. We're offering the shape as a working spec, not asking the plugin to
adopt our file layout.
Relation to prior issues
NOT_PLANNED) — Closed-Loop Self-Improvement System #171 proposed abroad self-improving subsystem: auto-capture, agent scoring, execution
telemetry, prompt evolution, dashboards. This proposal is deliberately not
that. No telemetry, no scoring, no dashboards, no auto-tuning. More
importantly, every loop-closing mechanism in Closed-Loop Self-Improvement System #171 (auto-capture + retrieval +
review-agent prompt-tuning) leaves a lesson in the retrieval-dependent layer —
it makes capture cheaper and retrieval likelier, but a durable lesson still
only changes behavior if re-read. The promotion-to-enforcement rung is the one
thing Closed-Loop Self-Improvement System #171 did not cover, and it is a single node, not a subsystem — which is
also why it may be a more tractable scope than Closed-Loop Self-Improvement System #171 was.
scaffolding + agentic feedback) — both improve the
docs/solutions/layer(sharing it across repos; bootstrapping its schema). Orthogonal and
complementary: this proposal is about lessons graduating out of that layer
onto enforced surfaces, not about the layer itself.
Scope / non-goals
retrospective nudge. Explicitly not Closed-Loop Self-Improvement System #171's telemetry/scoring/auto-tuning.
docs/solutions/; thedefault is not to promote.
time — not a fully automated classifier. Recurrence is the headline signal, not
a hard rule.
cadence are left to the implementation / user configuration.