Skip to content

Commit d540b7a

Browse files
committed
guardrails: 3-cond stop list + project_facts + devil's-advocate subagent
Root-cause hardening after the 2026-05-23 long-context investigation (`private/notes/llm_long_context_research.md`, 523 lines). The investigation found this session's symptoms (smell recognised → push anyway, ROADMAP-blind compliance, etc.) were not LLM attention decay but **CLAUDE.md's own "進め bias" + autonomous loop's instruction centrifugation** (~80% combined). Fix: 1. `.dev/project_facts.md` (new) — user-declared invariants the loop must treat as fact even when ROADMAP / ADR text admits other readings. F-001 (zwasm v2 unavoidable; carries its own JIT + GC), F-002 (finished-form wins; smallest-diff is tie-breaker not veto), F-003 (decision-deferral on structural plans). Added to CLAUDE.md Step 1a Phase reading list. 2. CLAUDE.md "Stop only when (closed list)" extended from 2 to 3 conditions. Condition 3 is **smell depth ≥ 3 fires 2× in one cycle → ADR-phase mode switch** (not a stop; the loop pauses the per-task work, forks a fresh-context subagent to draft a root-cause ADR, accepts inline, resumes). This catches goal drift the per-cycle sensor is too narrow to see. 3. CLAUDE.md Step 6 + § "ADR-level designs are handled inline" + .dev/principle.md depth ≥ 2 branch: **Devil's-advocate `general-purpose` subagent with fresh context is mandatory** before stamping ADR `Status: Proposed → Accepted`. Brief: produce 3 alternative shapes (smallest-diff / finished-form-clean / wildcard). Output embedded verbatim into ADR's "Alternatives considered". Counters the main loop's accumulated momentum by sourcing alternatives from a context without it. 4. CLAUDE.md Step 6 commit message shape: every source-bearing commit body must carry `Smell-audited: <depth 0-4>: <one-line summary>`. The deterministic enforcement (PreToolUse hook on git push) lands in the next commit. Handover refreshed: cold-start reading order grows from 4 to 5 files (project_facts.md inserted as #3); recent-landings block gains Wave 3 description. Smell-audited: 1: noted "private/notes/ reference in scripts" smell — local-only research note path baked into hook script + CLAUDE.md; deferred (sufficient for current session, file can be promoted to docs/ja/archive/ if cross-session access is needed; recorded inline only).
1 parent d621349 commit d540b7a

4 files changed

Lines changed: 248 additions & 39 deletions

File tree

.dev/handover.md

Lines changed: 40 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,22 +4,26 @@
44
> [`.claude/rules/handover_framing.md`](../.claude/rules/handover_framing.md).
55
> Updated at session end; reads in < 30 sec at cold start.
66
7-
## Next 4 files to read (cold-start order)
7+
## Next 5 files to read (cold-start order)
88

99
1. `.dev/handover.md` (this file) — recent landings + guardrail
1010
refresh log + active task pointer.
1111
2. `CLAUDE.md` § Project spirit (top section, governs all other
12-
rules) + § Autonomous Workflow (Step 0 → 7).
13-
3. `.dev/principle.md` — Bad Smell catalogue (8 entries incl. new
14-
Smallest-diff bias / Reservation-as-bias / Progress-pressure)
15-
+ **Structural imagination phase** governing reservation
16-
tables, directory / file structure, responsibility, dependency.
17-
4. `.dev/ROADMAP.md` — find IN-PROGRESS phase in §9, take the
12+
rules) + § Autonomous Workflow (Step 0 → 7 + the **3-condition
13+
closed stop list**, condition 3 added 2026-05-23).
14+
3. `.dev/project_facts.md`**user-declared invariants the loop
15+
must treat as fact** even when ROADMAP / ADR text admits other
16+
readings. F-001 (zwasm v2 unavoidable) / F-002 (finished-form
17+
wins) / F-003 (decision-deferral on structural plans).
18+
4. `.dev/principle.md` — Bad Smell catalogue (8 entries) +
19+
Structural imagination phase. Note **depth ≥ 2 mandates a
20+
Devil's-advocate `general-purpose` subagent** with fresh
21+
context before ADR accept.
22+
5. `.dev/ROADMAP.md` — find IN-PROGRESS phase in §9, take the
1823
first `[ ]` row in §9.<N>. At a Phase entry, read the
1924
placeholder's **Entry ADRs** + **Entry debts** lines and load
2025
every referenced ADR (incl. all Revision history amendments)
21-
and `D-NNN` row. (§9.6 row table carries a cleanup-wave smell
22-
banner — D-028 owns the per-row audit at each owning Phase.)
26+
and `D-NNN` row.
2327

2428
## Current state
2529

@@ -83,6 +87,33 @@ output)**:
8387
- `src/runtime/binding_stack.zig` deleted (env.zig is the
8488
authoritative location for the dynamic-binding stack).
8589

90+
**Wave 3 — root-cause hardening (post-research)**:
91+
92+
User asked "why did all this happen on top of an already-laid
93+
ROADMAP + guardrails?". A `general-purpose` subagent investigated
94+
LLM long-context behaviour (2026-05-23, 24 tool uses, output at
95+
`private/notes/llm_long_context_research.md` 523 lines) and
96+
concluded the symptoms are not attention decay (CLAUDE.md is
97+
re-injected every turn) but **CLAUDE.md's own "進め bias" +
98+
autonomous loop's instruction centrifugation** (~80% combined).
99+
Fix landed:
100+
101+
- **`.dev/project_facts.md`** (new) — user-declared invariants
102+
(F-001 zwasm / F-002 finished-form wins / F-003 deferral)
103+
read at every Phase entry (CLAUDE.md Step 1a).
104+
- **CLAUDE.md stop list extended to 3 conditions** — condition 3
105+
is "smell depth ≥ 3 fires 2× in one cycle → ADR-phase mode
106+
switch" (not a stop, a mode change).
107+
- **CLAUDE.md Step 6 + principle.md depth ≥ 2** — Devil's-advocate
108+
subagent with fresh context is mandatory before ADR accept;
109+
output embedded verbatim in "Alternatives considered".
110+
- **`scripts/check_smell_audit.sh`** (new PreToolUse hook) —
111+
`git push` is physically blocked unless every unpushed
112+
source-bearing commit body contains `Smell-audited: <depth>:
113+
<one-line>`. This is the deterministic enforcement layer
114+
behind the probabilistic CLAUDE.md rule
115+
("CLAUDE.md is a suggestion, hooks make it law").
116+
86117
## Active task — §9.6 / 4.25
87118

88119
`src/runtime/dispatch/method_table.zig``MethodEntry` struct

.dev/principle.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,16 @@ continues. The AI drafts and accepts the ADR itself — there is no
102102
external review gate. See CLAUDE.md § Autonomous Workflow
103103
"ADR-level designs are handled inline, not as a stop".
104104

105+
**Devil's-advocate subagent is mandatory at depth ≥ 2.** Before
106+
the ADR is accepted, a `general-purpose` subagent is forked with
107+
**fresh context** to produce 3 alternative shapes (smallest-diff /
108+
finished-form-clean / wildcard). The output is embedded verbatim
109+
into the ADR's "Alternatives considered" section. This is the
110+
antidote to the loop's accumulated goal-drift / instruction
111+
centrifugation — alternatives sourced from a context without the
112+
main loop's momentum surface options the main loop is
113+
attention-suppressed against.
114+
105115
## Three questions to picture the finished form
106116

107117
When you stop, ask:

.dev/project_facts.md

Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
# Project facts — user-declared invariants
2+
3+
> **What this file is for.** ClojureWasm v2's ROADMAP / ADRs /
4+
> rules describe the *engineering decisions* the project has
5+
> committed to. This file captures the **invariants the user has
6+
> declared** that those documents may not yet reflect — facts the
7+
> autonomous loop must treat as load-bearing even when ROADMAP /
8+
> ADR text appears to admit other readings.
9+
>
10+
> The autonomous loop must read this file as part of every Phase
11+
> entry's reading list (CLAUDE.md Step 1a) and consult it whenever
12+
> a planned change touches the topics below.
13+
>
14+
> This file is **append-only history** — entries are dated and
15+
> never silently rewritten. A later fact that supersedes an
16+
> earlier one is added as a new entry with a `Supersedes: <id>`
17+
> line; the earlier entry stays, annotated `Superseded by: <id>`.
18+
19+
## How to add an entry
20+
21+
User declares a project-level invariant in chat that the loop
22+
should treat as fact. The loop captures it here as the next
23+
`F-NNN` entry with verbatim or near-verbatim quoting + a one-line
24+
"why this matters for the loop" + cross-reference to the ROADMAP
25+
section / ADR / debt row it interacts with. The user reviews the
26+
entry at end of session.
27+
28+
---
29+
30+
## F-001 — zwasm v2 integration is unavoidable
31+
32+
**Declared**: 2026-05-23 (user chat).
33+
**Verbatim**: 「このプロジェクトがzwasmと連携するのは確実です。zwasm
34+
v1は今それなりの完成度ですが、zwasm v2と連携するつもりです。
35+
$My/zwasm_from_scratchはまだ開発途上ですが、wasm FFIは欠かせない
36+
要素だということは自覚してください。また、zwasmはそれ自体がJITや
37+
メモリ管理の機能を持っています」
38+
39+
**What this changes for the loop**:
40+
41+
1. ADR-0006 frames Wasm FFI as "deferred to Phase 16 via Pod
42+
boundary". The Pod-boundary framing remains the **default
43+
protocol shape**, but the inline-vs-Pod choice **is not closed**.
44+
Phase 16 entry must re-open it with the user (D-036).
45+
2. ADR-0006 amendment 1 (NaN-box slot release for big_int /
46+
ratio) assumed Pod-boundary. If Phase 16 chooses inline
47+
NaN-box Values, those slots cannot be reclaimed — Phase 16
48+
must mint fresh slots, co-ordinated with D-027 (NaN-box
49+
layout 第二世代).
50+
3. zwasm v2 carries its own JIT + GC. cw v2's Phase 5 mark-sweep
51+
GC and Phase 17 JIT (ADR-0005 3rd backend) **overlap
52+
territorially**. Phase 16 entry resolves heap-boundary and
53+
JIT-coordination design (cw-heap vs wasm-heap, JIT handoff).
54+
4. The counterparty is **zwasm v2**, not zwasm v1. zwasm v1 is
55+
reasonably complete, but cw v2 targets zwasm v2.
56+
`~/Documents/MyProducts/zwasm_from_scratch/` is the in-progress
57+
counterparty repo.
58+
59+
**Cross-references**: ADR-0006 amendment 3 (records this fact in
60+
the ADR); debt D-036 (Phase 16 inline-vs-Pod decision); ROADMAP
61+
§9.18 Phase 16 placeholder (Entry debts).
62+
63+
---
64+
65+
## F-002 — Finished-form cleanliness wins; shipping fast / avoiding rework are second-tier
66+
67+
**Declared**: 2026-05-23 (user chat).
68+
**Verbatim**: 「完成した時の綺麗さ、 が何よりも優先されています。
69+
さっさと作る、 手戻りしない、 は二の次です(もちろん少ないに越した
70+
ことはないので事前ロードマップを敷いているが)」
71+
72+
**What this changes for the loop**:
73+
74+
1. Big surgery (depth 3-4 in `.dev/principle.md`) is welcome when
75+
the plan misses something. The autonomous loop must not
76+
hesitate at ADR-level revisions.
77+
2. ROADMAP P5 ("smallest-diff first") is a tie-breaker, not a
78+
veto. If smallest-diff and finished-form collide, finished-form
79+
wins.
80+
3. Skeleton-then-rewrite is endorsed (per
81+
`permanent_noop_forbidden.md`), but excessive skeletons are a
82+
smell (Smallest-diff bias smell in principle.md).
83+
4. Reservations (ADR numbers, NaN-box slots, debt rows promising
84+
future ADRs) are memos, not contracts. ADR numbers are
85+
time-ordered (`max + 1` at issue).
86+
87+
**Cross-references**: CLAUDE.md § Project spirit (top); ADR-0029
88+
→ ADR-0025 rename history; D-021 retirement; principle.md Bad
89+
Smell catalogue.
90+
91+
---
92+
93+
## F-003 — Decision-deferral over decision-seizure on structural plans
94+
95+
**Declared**: 2026-05-23 (user chat).
96+
**Verbatim**: 「すでにロードマップが将来にわたるまであるのだから、
97+
テーブル予約についてどれくらいありそうかを省略せずにしっかり
98+
想像するフェーズを入れて、 どうするかを決めるのはその担当のとき
99+
にやってください」
100+
+ 「あと、 ついでにディレクトリ構造、 ファイル構造の予測や責務
101+
分離や依存関係でも将来にわたり無理がこないのかを想像・シミュレート
102+
して考えてみて」
103+
104+
**What this changes for the loop**:
105+
106+
1. The loop's job at any task touching a reservation table /
107+
directory or file structure / responsibility / dependency
108+
graph is **imagine, record, defer** — not decide.
109+
2. Decisions belong to the owning Phase entry's owner. The
110+
current loop records the imagination output as debt rows
111+
scheduled at the owning Phase.
112+
3. This is the antidote to the Progress-pressure smell on
113+
structural work.
114+
115+
**Cross-references**: principle.md "Structural imagination
116+
phase"; CLAUDE.md Step 0.5 (Phase-entry debt read) + Step 1
117+
(Structural imagination trigger); debt rows D-027 / D-029 /
118+
D-031 / D-032 / D-034 / D-035 / D-036.

CLAUDE.md

Lines changed: 80 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -169,17 +169,20 @@ entry, and **defer the structural decision to that owner**; do not
169169
resolve here. If something feels off, adjust before Step 2.
170170

171171
**Step 1a — Phase reading list** (every Phase entry)
172-
Read in order: `.dev/handover.md`, `.dev/ROADMAP.md` §9.<N>
173-
placeholder (Entry ADRs / **Entry debts** / Reference / Skeletons
174-
to activate / Deliverables / Final activation step), each ADR
175-
listed in the placeholder's "Entry ADRs:" line **including the
176-
Phase N+ migration note section AND every Revision history
177-
amendment if present** (this is where existing-code rewrite scope
178-
and inter-Phase corrections are narrated per §A25), each `D-NNN`
179-
debt row listed in the placeholder's "Entry debts:" line (full row
180-
text in `.dev/debt.md`), `compat_tiers.yaml` entry for the
181-
function, and the JVM Clojure source (`~/Documents/OSS/clojure/`)
182-
for the function.
172+
Read in order: `.dev/handover.md`, **`.dev/project_facts.md`**
173+
(user-declared invariants the loop must treat as fact, even when
174+
ROADMAP / ADR text admits other readings), `.dev/ROADMAP.md`
175+
§9.<N> placeholder (Entry ADRs / **Entry debts** / Reference /
176+
Skeletons to activate / Deliverables / Final activation step),
177+
each ADR listed in the placeholder's "Entry ADRs:" line
178+
**including the Phase N+ migration note section AND every
179+
Revision history amendment if present** (this is where
180+
existing-code rewrite scope and inter-Phase corrections are
181+
narrated per §A25), each `D-NNN` debt row listed in the
182+
placeholder's "Entry debts:" line (full row text in
183+
`.dev/debt.md`), `compat_tiers.yaml` entry for the function, and
184+
the JVM Clojure source (`~/Documents/OSS/clojure/`) for the
185+
function.
183186

184187
**Step 2 — Red**
185188
Write the failing test (Edit / Write). Run; confirm red.
@@ -205,7 +208,7 @@ Run both in a single message with two parallel Bash tool calls:
205208
Both must be green. If either output exceeds ~200 lines, delegate
206209
to a Bash subagent and ask for "pass/fail + first failure only".
207210

208-
**Step 6 — Source commit + push (atomic)**
211+
**Step 6 — Source commit + push (atomic, smell-audited)**
209212

210213
Before staging:
211214

@@ -215,22 +218,40 @@ Before staging:
215218
3. If a smell triggers, choose depth 1-4:
216219
- depth 1: add a one-line note in the commit message.
217220
- depth 2-4: land the ADR amendment / new ADR / `debt.md` row
218-
/ `private/notes/` entry **autonomously** (the AI gathers
219-
the "should-be" materials, drafts the ADR with
220-
`Status: Proposed → Accepted`, fills Affected files,
221-
Alternatives, Consequences), commits the doc change first,
221+
/ `private/notes/` entry **autonomously**. Before drafting
222+
the ADR, **fork a `general-purpose` subagent with fresh
223+
context as Devil's advocate**: brief the subagent on the
224+
decision and ask for 3 alternative shapes (one smallest-diff,
225+
one finished-form-clean, one "wildcard"). Reflect the
226+
subagent's output verbatim into the ADR's "Alternatives
227+
considered" section before stamping
228+
`Status: Proposed → Accepted`. Commits the doc change first,
222229
then commits + pushes the source separately. No external
223-
review gate — ADR history is the rationale record.
230+
review gate — ADR history (plus the Devil's-advocate output
231+
embedded in it) is the rationale record.
224232

225233
Then:
226234

227-
4. `git add` source files; `git commit -m "<type>(<scope>):
228-
<one line>"`. The pre-commit gate auto-aligns any Markdown
229-
tables that drifted and re-stages the fix transparently;
230-
only genuine table-syntax errors block.
235+
4. `git add` source files; `git commit` with **a two-line message
236+
shape**:
237+
```
238+
<type>(<scope>): <one line summary>
239+
240+
Smell-audited: <depth 0-4>: <one-line summary of audit outcome>
241+
<optional further body>
242+
```
243+
`Smell-audited:` is **mandatory** on every commit that stages
244+
source-bearing files (`src/**/*.zig`, `build.zig`,
245+
`build.zig.zon`, `.dev/decisions/NNNN_*.md`). It records that
246+
Step 6's self-audit was actually performed. The pre-commit gate
247+
auto-aligns Markdown tables; only genuine table-syntax errors
248+
block.
231249
5. `git push origin cw-from-scratch` runs immediately on the
232-
commit's success. The push is not optional and not deferred —
233-
commit and push are one Step.
250+
commit's success. **The `scripts/check_smell_audit.sh` PreToolUse
251+
hook physically blocks pushes that include any source-bearing
252+
commit missing a `Smell-audited:` line.** Re-audit, amend the
253+
commit message, push again. Push is not optional and not
254+
deferred.
234255

235256
**Step 7 — Per-task note** (written from hot context)
236257

@@ -263,20 +284,36 @@ rows:
263284

264285
### Stop only when (closed list)
265286

266-
Two conditions, exhaustive:
287+
Three conditions, exhaustive:
267288

268289
1. **User explicitly requests stop** (any direct instruction).
269290
2. **Physically blocked** — build broken with no identifiable root
270291
cause, or test failure that cannot be diagnosed after honest
271292
investigation.
272-
273-
Anything outside these two is continued through. The loop's quality
274-
discipline lives in `.dev/principle.md` (Bad Smell sensor, depth
275-
1-4) and is applied per cycle — quality is a *how*, not a stop
276-
condition.
293+
3. **Smell-cluster trip** — a Bad Smell at depth ≥ 3 fires twice
294+
within the same per-task TDD cycle (Step 0 → 7). This is
295+
"patterned smell": the plan is structurally off, not just
296+
locally smelly. **Don't stop the project** — the loop transitions
297+
into **ADR-phase mode**: pause the current task at its last
298+
green state, commit nothing more, fork a `general-purpose`
299+
subagent with fresh context to draft a root-cause ADR
300+
(`Supersedes <NNNN>` or net-new), accept it inline per the
301+
ADR-phase rules below, then resume the per-task loop with the
302+
ADR's verdict applied. This is a **mode switch, not a stop**
303+
the autonomous loop continues; only its current activity
304+
changes.
305+
306+
Anything outside these three is continued through. The loop's
307+
quality discipline lives in `.dev/principle.md` (Bad Smell sensor,
308+
depth 1-4) and is applied per cycle — quality is a *how*, not a
309+
stop condition.
277310

278311
This list intentionally avoids enumerating non-stop reasons. Closed
279-
stop conditions + open continue is the design.
312+
stop conditions + open continue is the design. Condition 3 above
313+
exists because **depth ≥ 3 firing twice in one cycle indicates
314+
goal drift the per-cycle sensor is too narrow to catch** (per the
315+
2026-05-23 investigation into instruction centrifugation, recorded
316+
in `private/notes/llm_long_context_research.md`).
280317

281318
### ADR-level designs are handled inline, not as a stop
282319

@@ -291,6 +328,19 @@ source change. Rationale survives in the ADR's history; the loop
291328
does not need an external accept gate. Step 6's depth 2-4 branch
292329
is the runway for this.
293330

331+
**Devil's-advocate subagent is mandatory at depth ≥ 2.** Before
332+
stamping `Status: Proposed → Accepted`, fork a `general-purpose`
333+
subagent with **fresh context** and brief it: "Devil's advocate
334+
this ADR. Produce 3 alternative shapes (one smallest-diff, one
335+
finished-form-clean, one wildcard); for each, name what it does
336+
better than the current draft and what it breaks." The subagent's
337+
output is reflected verbatim into the ADR's "Alternatives
338+
considered" section. This counters goal-drift / instruction
339+
centrifugation by sourcing the alternatives from a context
340+
without the main loop's accumulated momentum. The subagent's
341+
recommendation is **not binding** — the main loop still chooses
342+
— but the alternatives must appear in the ADR.
343+
294344
The phrases "this needs human judgement" / "cannot be
295345
self-decided" / "user touchpoint required" are forbidden framings
296346
in the autonomous loop. If the choice is between candidate

0 commit comments

Comments
 (0)