docs(openspec): propose approval-policy-v2 by Aaronontheweb · Pull Request #940 · netclaw-dev/netclaw

Aaronontheweb · 2026-05-08T15:57:13Z

Summary

OpenSpec change proposing a breaking redesign of the persistent tool-approval store and the Slack/Discord approval prompt UX. No code changes in this PR — the proposal, design, delta specs, and tasks list. /opsx-apply work follows in subsequent PRs.

What's in scope

Storage v2 — typed (verb, directory) ApprovalEntry replaces the v1 flat string list. v1 file quarantines to .v1.bak on first read; no automatic migration (we have no production users).
Safe-verb ∩ safe-space short-circuit — per-OS curated verb list (safe-verbs.linux.json, safe-verbs.windows.json) plus the agent's existing session_dir and optional project_dir form a three-position policy: auto-run when both axes match; prompt otherwise; hard-deny list unchanged. Reuses ToolAudienceProfileResolver and the symlink-segment guard from ScopedFileAccessPolicy.
ShellTool cwd default — falls back to project_dir if set, else session_dir. Today it inherits the daemon-process cwd, which is a footgun.
Bash control-flow refusal — ShellTokenizer returns no verb chains for for/while/do/done/etc. or unbalanced quotes/brackets. Approval gate offers only Once and Deny for messy input. Stops the on-disk store from accumulating fragments like done, for pid, awk {print $2}).
Five-button prompt — Once / This chat / Always here / Always anywhere / Deny, with danger styling on Always anywhere and Deny. Replaces the v1 Patterns + Directory Roots body sections with a single Approve in <cwd>? header + verb bullets. Resolution message collapses to one line.
CLI — netclaw approvals trust-verb <verb> writes a global wildcard (verb, null) entry; list and revoke label entries by scope (<verb> in <dir> / <verb> anywhere).
Agent guidance — load-bearing AGENTS.md instruction to call set_working_directory early, tool-description rewrite, shell failure-path hint pointing at set_working_directory. Three new eval cases (positive, negative, recovery) lock adoption in. Schedule-creation flow proactively suggests pre-approval for unattended verbs.

Phasing

The tasks.md plans two implementation PRs under this single change:

PR 1 — storage / matcher / cwd default / safe-verb policy / CLI. No prompt UI changes; channel adapters keep rendering today's body off the new typed data.
PR 2 — prompt redesign, resolution message, agent guidance, schedule-creation flow, evals.

Validation

$ openspec validate approval-policy-v2 --strict
Change 'approval-policy-v2' is valid

$ openspec status --change approval-policy-v2
Progress: 4/4 artifacts complete
[x] proposal  [x] design  [x] specs  [x] tasks

Test plan

Review proposal for accuracy on the "why" — does it capture the friction we observed in session D0AC6CKBK5K/1778238489.065719?
Review design.md decisions 1–10; flag any decision where the alternatives consideration misses a path you'd prefer.
Review tool-approval-gates delta — confirm MODIFIED requirements preserve enough of v1 behavior we still want, and REMOVED requirements are genuinely subsumed.
Review session-cwd delta — confirm cwd-default chain and failure-path hint match how you want the agent to recover.
Review netclaw-cli delta — confirm trust-verb is the only path to (verb, null) and that we're not regressing anything in list / revoke.
Review tasks.md acceptance gates — anything missing from the manual smoke list?

Aaronontheweb · 2026-05-09T02:43:36Z

Eval Suite Results — Approval Policy v2

Targeted N=5 runs against the local inference endpoint (openai-compatible @ llm.testlab.petabridge.net, model Qwen3.6-27B-UD-Q4_K_XL.gguf). All four cases under the new Approval Policy v2 category were exercised after fixing two pre-existing eval-infra bugs (see commits 914b8e9f, c38eb08c).

Baseline → Variant B (prompt rewrites only)

Case	Baseline	Variant B	Notes
`approval_set_working_directory_positive`	1/5 (0.20)	4/5 (0.80) ✓	Original prompt was ambiguous between "one-shot ls" and "sustained project work"; v2 spec actually says don't preempt for one-shots. Rewrote to make sustained-work signal explicit.
`approval_set_working_directory_negative`	5/5 (1.00) ✓	5/5 (1.00) ✓	Restraint behavior solid — model correctly does NOT preemptively declare project root for unrelated prompts.
`approval_recovery_hint`	2/5 (0.40)	1/5 (0.20)	Original multi-turn structure had T1 say "do not call any tools yet" then T2 say "now call the tool." Several failures showed the model stuck in T1's no-tools conditioning, replying "I will not call any tools." Rewrote as a single conversational prompt. Variant B's lower score reflects infra flake (see below), not the rewrite.
`approval_schedule_pre_approval`	5/5 (1.00) ✓	1/5 (0.20)	Same prompt, no changes between runs. 5/5 → 1/5 is pure infra variance.

Infrastructure observation

The local inference endpoint is intermittently producing stalled streams — [usage] out=3 tok_s=8 instead of normal out=80+ tok_s=27. That's the same "Dutchman" pattern (HTTP/streaming response opens, emits the usage event, then connection collapses before any meaningful content lands). Schedule's 5/5 → 1/5 swing on an unchanged prompt is a clean infra fingerprint, not a behavior change.

This affects eval reliability but is outside this PR's scope — it points at adding a streaming idle timeout on the daemon's OpenAI-compatible HTTP client so stalled streams surface as real errors instead of producing partial responses. Will file separately.

Eval-infra fixes shipped in this PR

Both were latent bugs, surfaced when investigating why the v2 cases scored 0% initially:

evals/run-evals.sh skill loading (914b8e9f) — the script was copying feeds/skills/.system/files/<skill>/ to $EVAL_HOME/skills/.system/files/<skill>/ (extra files/ segment) which the SkillScanner skips. The daemon was downloading from the live R2 manifest instead, so unpublished local skill changes (e.g. netclaw-operations v2.0.0 in this PR) were never tested. Fixed by mirroring runtime layout (.system/<skill>/) and setting NETCLAW_SkillSync__DisableSystemSkillSync=true in the eval container.
Prompt rewrites (c38eb08c) — fixed the two prompt-design issues described in the table above.

What we know works

Three of four cases pass at the 0.80 threshold when the endpoint is healthy (positive @ Variant B = 4/5; negative = 5/5; schedule when healthy = 5/5).
Recovery is the open question. With the rewritten prompt and a healthy endpoint, expected pass rate is high but unverified — the runs we have are contaminated by stream stalls. Manual binary-swap validation will close this.

Acceptance gate status

Per tasks.md section 11 acceptance gates 65–68: manual binary-swap validation by Aaron remains the real go/no-go. The eval suite is now structurally honest (skills load from source, prompts test what they claim to test); pass rates depend on inference availability.

Breaking redesign of the persistent approval store and prompt UX: - typed (verb, directory) ApprovalEntry replaces the v1 flat string list; v1 file quarantines to .v1.bak on first read (no migration) - safe-verb ∩ safe-space short-circuit (per-OS verb list, audience-aware roots from ToolAudienceProfileResolver) auto-runs read-only inspection inside session_dir / project_dir - ShellTool cwd defaults to project_dir → session_dir (today inherits daemon-process cwd) - ShellTokenizer refuses pattern extraction on bash control-flow / unbalanced input so junk fragments never persist - 5-button prompt (Once / This chat / Always here / Always anywhere / Deny) with danger styling on the destructive options; one-line resolution message - netclaw approvals trust-verb <verb> CLI for unattended/scheduled grants - AGENTS.md + tool description + failure-path hint coordinate to push the agent toward set_working_directory; eval cases (positive/negative/recovery/ schedule pre-approval) lock the behavior in

Foundation for the approval-policy-v2 storage refactor. Adds: - ApprovalEntry record (Verb required, Directory nullable for global wildcard) - ToolApprovalEntryComparer.Equals(ApprovalEntry, ApprovalEntry) overload that delegates to the existing platform-correct string comparison No behavior change: ToolApprovalStore still operates on the v1 string-based API and the existing test suite (274 tests) passes unchanged. The actual storage cutover, matcher refactor, and caller updates land in subsequent commits per openspec/changes/approval-policy-v2/tasks.md sections 1-6.

Section 1 of the approval-policy-v2 OpenSpec change. Refactors ToolApprovalStore to a typed (verb, directory) ApprovalEntry model with a versioned on-disk schema, replacing the v1 flat string list. What changed: - ToolApprovalStore now serializes/deserializes ToolApprovalData with "version": 2 and List<ApprovalEntry> per (audience, tool). - Two-step Load(): peek schema version via JsonDocument; quarantine legacy v1 files to tool-approvals.json.v1.bak; quarantine unparseable files to .invalid; in either case, return an empty store. - AddApproval/RemoveApproval/RemoveAllForTool/Snapshot operate on ApprovalEntry. New GetApprovedEntries replaces GetApprovedPatterns. - AddApproval normalizes the directory portion (trims trailing separators while preserving "/" and "C:\") so the on-disk file does not accumulate trailing-slash variants of the same logical entry. - ToolApprovalEntryComparer gains NormalizeDirectory + Normalize(entry) helpers; Equals(ApprovalEntry, ApprovalEntry) normalizes both sides. Caller updates required to compile: - ToolApprovalActor: persistent writes wrap incoming verb strings as ApprovalEntry { Verb=pattern, Directory=null } (interim semantic preserved until section 2 lands the directory-aware matcher). - ApprovalsListView/ApprovalsCommand: list output renders entries as "<verb> in <dir>" or "<verb> anywhere"; --json emits the typed ApprovalEntry shape; --json uses IndentedOmitNull so the CLI shape matches the file shape (nulls omitted). - ApprovalsCommand.WarnIfQuarantined surfaces both .v1.bak and .invalid quarantine paths with distinct remediation guidance. - ApprovalsManagerViewModel/Page: rendering uses entry.DisplayText. - ToolAudienceProfilesDoctorCheck: drops the v1 stale-path-aware pattern detection (irrelevant under v2; v1 contents quarantine on first read). Tests: - ToolApprovalStoreTests rewritten for the v2 API and gain coverage for v1 quarantine, malformed quarantine, fresh-write-after-quarantine, trailing-slash normalization, and idempotent add. - ApprovalsCommand/ApprovalsManagerPage tests rewritten to use ApprovalEntry and the new "<verb> in <dir>" / "<verb> anywhere" rendering. - Stale-pattern doctor test removed. All 3348 tests pass; dotnet slopwatch analyze reports no new violations; file-header verification passes.

Section 2 of the approval-policy-v2 OpenSpec change. Refactors the approval matcher and gate to consume v2 typed ApprovalEntry records, plumbs the candidate cwd through the execution context, and deletes the v1 string-shape inspection logic. Matcher contract changes: - IToolApprovalMatcher.ExtractDirectoryRoots is removed; the v2 matcher has no concept of "directory roots extracted from arguments." The directory half of every (verb, directory) pair is the candidate's cwd from ToolExecutionContext. - ExtractApprovalEntries renamed to ExtractCandidateVerbs and now returns pure verb chains. The v1 fallback to normalized commands or bare directory roots is gone. - IsApproved signature: now takes (toolName, args, IReadOnlyList<ApprovalEntry>, cwd) and dispatches to ApprovalPatternMatching.MatchesShellApproval which enforces verb equality + (directory null || cwd under directory) + no-symlink-segment. Cwd plumbing: - ToolExecutionContext gains a Cwd property the session pipeline sets from candidate args / WorkingContext.ProjectDirectory / session_dir (sections 4 + 5 cover the resolution side). - IToolApprovalService.GetUnapprovedPatternsAsync and RecordApprovalAsync take a cwd parameter; AkkaToolApprovalService threads it through GetUnapprovedPatterns and RecordToolApproval actor messages. - ToolApprovalContext: ApprovalEntries field renamed to CandidateVerbs; DirectoryRoots stays but is always populated empty by the gate (section 7's prompt redesign removes the field). SessionOutput, SessionOutputDto, ParentSessionApprovalBridge, PendingToolInteraction, and the protocol mapper rename consistently. Shared symlink-segment guard: - PathUtility.ContainsSymlinkSegment hoisted from ScopedFileAccessPolicy so the matcher and the file-access policy share one implementation. Tests: - Configuration.Tests, Cli.Tests, Daemon.Tests, MemoryRetrievalPoC.Tests, Search.Tests, Security.Tests (397 incl new matcher cases), and Actors.Tests (1483) all pass. - ShellApprovalMatcherTests rewritten to assert the v2 (verb, cwd, entries) semantics: global-wildcard matches anywhere, folder-scoped matches when cwd under directory, requires concrete cwd, recurses into bash -c. - ToolApprovalGateTests' v1 directory-roots assertions replaced with v2 candidate-verb assertions; DirectoryRoots is asserted empty. - ToolApprovalActor's session HashSet now uses ToolApprovalEntryComparer.Comparer so session approvals follow the same platform-correct case rules as the persistent store. - Test plumbing across the codebase passes cwd: null where the invocation isn't directory-anchored. Slopwatch clean; file headers verified.

Section 3 of the approval-policy-v2 OpenSpec change. Adds a cheap structural scan to ShellTokenizer that detects bash control-flow keywords and unbalanced quotes/brackets, refuses verb-chain extraction in those cases, and plumbs an IsMessy flag through the gate and protocol so the section 7 prompt builder can show "complex command" hints and omit persistent-grant buttons. Detection (ShellTokenizer.IsMessyCompoundCommand): - Single-pass scan that tracks quote state and (), [], {} balance. - Flags any unquoted standalone token equal to one of: for, while, do, done, then, fi, case, esac. - Flags unbalanced quotes (open without close) and unbalanced brackets (close without open OR open without close). - Cheap structural only — no semantic bash parsing. Heredocs, command substitution, and process substitution are not analyzed beyond bracket balance. SplitCompoundCommand: - Returns an empty list when IsMessyCompoundCommand returns true. The matcher's ExtractCandidateVerbs and ExtractPatterns therefore both return empty for messy commands, and ShellApprovalMatcher.IsApproved short-circuits to false (cannot auto-approve what we cannot extract). Gate / protocol plumbing: - IToolApprovalMatcher gains IsMessy(toolName, args). Default-false for DefaultApprovalMatcher and FilePathApprovalMatcher; ShellApprovalMatcher delegates to ShellTokenizer.IsMessyCompoundCommand. - ToolApprovalContext gains an IsMessy bool field. - ToolInteractionRequest, SessionOutputDto (InteractionIsMessy), PendingToolInteraction, IParentApprovalBridge.RequestApprovalAsync, and ParentSessionApprovalBridge all carry the flag through. - DispatchingToolExecutor short-circuits messy invocations to RequiresApproval regardless of empty CandidateVerbs, so the user always sees the prompt for messy input. Trade-off accepted: a bare standalone `done`/`fi`/`esac` token at the end of a command (e.g. `git fetch && echo done`) is a false positive for the cheap heuristic — the user gets the "complex command" prompt (Once/Deny only) instead of the full 4-button row. The mitigation if this bites real usage is a smarter detector that requires the keyword to appear in a syntactically meaningful position; for now the trade favors a clean approval store over coverage of edge bash idioms. One existing test (SplitCompound_preserves_quoted_operators) updated accordingly to use a different sentinel word. Tests: - ShellTokenizerTests: positive cases (for/while/if/case/unbalanced quote/unbalanced bracket), negative cases (well-formed compounds, command substitution, brace expansion, trailing commands), and guards against keyword-substring false positives ("format", "fido"). SplitCompoundCommand returns empty for messy input; still splits well-formed compounds. - ShellApprovalMatcherTests: IsMessy true for control-flow, IsMessy false for well-formed; IsApproved returns false for messy commands even when every conceivable verb is approved. - All 3367 tests pass; slopwatch clean; file headers verified.

Section 4 of the approval-policy-v2 OpenSpec change. Establishes a deterministic cwd resolution chain for shell invocations so the approval policy can reason about safe-space membership and the spawned process never inherits the daemon's cwd. Resolution order (ToolExecutionContext.ResolveShellCwd): 1. Explicit args.WorkingDirectory when the agent provided one. 2. WorkingContext.ProjectDirectory when set via set_working_directory. 3. SessionDirectory (the per-session ~/.netclaw/sessions/<id>/ scratch). 4. null only when none is available. Plumbing: - ToolExecutionContext gains ProjectDirectory and ResolveShellCwd. The session pipeline populates ProjectDirectory at context-build time from _state.WorkingContext.ProjectDirectory. - SessionToolExecutionPipeline.ExecuteToolsAsync / ExecuteSingleToolAsync / BuildToolExecutionContext gain a projectDirectory parameter; LlmSessionActor passes _state.WorkingContext.ProjectDirectory at every dispatch. - ShellTool.ExecuteAsync uses context.ResolveShellCwd(args.WorkingDirectory) to set psi.WorkingDirectory; never falls through to ProcessStartInfo's default-of-inheriting-the-daemon's-cwd, which is a footgun the approval policy cannot reason about. - DispatchingToolExecutor.AuthorizeCoreAsync calls the same resolver and writes context.Cwd before GetUnapprovedPatternsAsync, so the approval gate evaluates folder-scoped ApprovalEntry records against the same cwd the spawned process will run in. Tests: - Cwd_falls_back_to_project_directory_when_no_explicit_arg - Cwd_falls_back_to_session_directory_when_project_directory_null - Cwd_explicit_arg_overrides_project_and_session_directories - Cwd_does_not_inherit_daemon_process_directory (asserts the spawned pwd output is the resolved session_dir, not Environment.CurrentDirectory) All 3371 tests pass; slopwatch clean; file headers verified.

Section 5 of the approval-policy-v2 OpenSpec change. Adds the load-bearing friction-reduction layer: read-only verbs invoked inside declared safe spaces auto-allow without prompting, while every other combination still routes through the interactive approval gate. Three-position policy: layer 1 ToolPathPolicy hard-deny (unchanged) layer 1.5 NEW: safe-verb ∩ safe-space short-circuit (this commit) layer 2 interactive approval gate (unchanged) A candidate (verb, cwd) short-circuits to Allow when ALL hold: - verb is on the curated SafeVerbList for the current OS - cwd resolves under one of the audience-aware safe-space roots (Personal/Team: session_dir + project_dir; Public: session_dir) - no segment of the cwd path is a filesystem symlink (reparse point) Bundled lists (Netclaw.Configuration/SafeVerbs/safe-verbs.*.json embedded as resources, additive user override at ~/.netclaw/config/safe-verbs.<os>.json): Linux/macOS: ls, find, grep, egrep, fgrep, rg, cat, head, tail, wc, sort, uniq, cut, tr, awk, sed -n, file, pwd, which, stat, tree, du, df, git status, git log, git diff, git show, git branch, git remote, git rev-parse, git ls-files, git blame. Windows: dir, type, more, where, findstr, Get-ChildItem, Get-Content, Select-String, Get-Item, Test-Path, Get-Location, Resolve-Path, plus the same git read subcommands. Mutating verbs (git push, sed -i, awk -i inplace, rm, mv, etc.) are intentionally absent from both lists. sed is pinned to "sed -n" so the matcher refuses to short-circuit "sed -i". The verb-chain matcher means "awk" auto-allows but "awk -i inplace" hits the gate because ExtractVerbChain stops at the first flag. Plumbing: - New SafeVerbList (Configuration) with platform-correct comparer. - New SafeVerbLoader that reads the bundled JSON resource and merges the user override file additively. Malformed override → silently fall back to bundled defaults (the doctor will surface the problem out of band; we do not refuse to start the daemon). - New ScopedShellSafeVerbPolicy (Netclaw.Actors.Tools) mirroring ScopedFileAccessPolicy: takes (verb, cwd, context), returns a short-circuit decision; reuses PathUtility.ContainsSymlinkSegment and the audience model. - ToolAccessPolicy gains a SafeVerbList ctor parameter and runs the safe-verb check inline in CheckApprovalGate after the messy/Auto filters but before producing the approval-prompt context. The cwd it evaluates is resolved by ToolExecutionContext.ResolveShellCwd and written back to context.Cwd so the downstream gate and the spawned process agree on "where this runs." - DispatchingToolExecutor's duplicate cwd resolution removed — CheckApprovalGate now owns the write to context.Cwd. - Program.cs constructs a SafeVerbList at startup and registers it alongside ToolAccessPolicy. - NetclawPaths.SafeVerbsOverridePath returns the per-OS user file. Tests (3388 → 3398 across the suite): - SafeVerbLoaderTests: bundled defaults present per OS, user override extends additively, malformed override falls back, missing override ignored, platform-correct case rules. - ScopedShellSafeVerbPolicyTests: all seven scenarios from the spec — safe verb + project_dir → allow; safe verb + session_dir → allow; safe verb + outside → prompt; mutating verb in safe space → prompt; Public audience cannot use project_dir as safe space; symlink segment in cwd breaks short-circuit; AllShortCircuit fails-loud when any candidate is unsafe. Slopwatch clean; file headers verified.

Section 6 of the approval-policy-v2 OpenSpec change. Replaces the section 1 interim revoke parser with a strict parser for the user- visible scope labels emitted by 'list', and adds the 'trust-verb' subcommand for pre-approving global wildcards from the CLI. Revoke parser: - Accepts only the two forms 'list' emits: '<verb> in <directory>' -> (verb, directory) entry '<verb> anywhere' -> (verb, null) global wildcard - Anything else exits 1 with a clear message — bare verb input no longer silently treated as a global wildcard, so an operator typo cannot widen the intended scope. The TryParseRevokePattern helper is internal so tests can exercise the parser surface directly without the CLI shell. trust-verb subcommand: - 'netclaw approvals trust-verb <verb> [--audience <a>] [--tool <t>]' - Default audience = personal, default tool = shell_execute. - Writes a (verb, null) entry to tool-approvals.json — the global-wildcard form. Idempotent: existing entry exits zero with a "No changes" message; otherwise prints "Trusted '<verb> anywhere' for <audience> / <tool>". - This is the deliberate scriptable path the spec calls out for unattended/scheduled task pre-approval. Combined with section 5's safe-verb short-circuit it covers two distinct user goals: short-circuit (read-only verbs in safe spaces, no persistence) versus trust-verb (any verb, anywhere, persisted). Help text updated to document both new forms; quarantine note from section 1 already covers the .v1.bak case. Tests (Cli.Tests 620 -> 629): - Revoke folder-scoped form removes entry with matching directory; folder-scoped form does not match a global-wildcard entry; unrecognized pattern exits 1 with clear message. - trust-verb adds global wildcard with default audience/tool; idempotent on repeated invocation; honors --audience/--tool; missing verb argument exits 1 with usage; unknown audience flag exits 1. - Help output mentions trust-verb subcommand. TUI display already shows verb + directory via DisplayText (landed in section 1). The trust-verb-from-TUI affordance is deferred — the agent path is CLI-only and the CLI works for human operators too; revisit if friction surfaces. All 3397 tests pass; slopwatch clean; file headers verified.

Section 7 of the approval-policy-v2 OpenSpec change. Replaces the v1 Slack approval prompt (4 buttons + Patterns/Directory Roots sections) with the v2 design: 5 buttons, danger styling on the elevated decisions, cwd in the header, verbs as bullets, and a single-line resolution message. Five-button row (ApprovalOptionKeys): Once (primary) - no persist This chat (default) - session-scoped only Always here (default) - persist (verb, cwd) Always anywhere (danger) - persist (verb, null) Deny (danger) - refuse this call ApprovalOptionKeys gains ApproveEverywhere/ApproveEverywhereLabel ("Always anywhere") and renames the existing labels to the spec spelling: "Once" / "This chat" / "Always here" / "Deny". The wire keys are unchanged so persisted resolutions still decode. ApprovalDecision and ParentApprovalDecision gain ApprovedEverywhere so the runtime can distinguish folder-scoped persistence from global wildcard. LlmSessionActor maps the new button key, picks cwd-or-null based on which decision was chosen, and threads through RecordApprovalAsync. ToolApprovalActor's persistent-write path now uses msg.Cwd directly (replacing the section 1 interim that always wrote null), so: Always here -> AddApproval(audience, tool, (verb, msg.Cwd)) Always anywhere -> AddApproval(audience, tool, (verb, null)) Button-row gating by IsMessy / cwd-shallow: IsMessy -> only Once + Deny (no persistence possible) cwd shallow -> Always here omitted (This chat / Always anywhere still available; matches the tool-approval-gates "Shallow directory prevents Always here" scenario) otherwise -> all five buttons Cwd-shallow check in ToolAccessPolicy: a path with fewer than two non-empty path segments under its root (e.g. /, /etc/, C:\) cannot host a folder-scoped grant; fail-closed on Always here so an operator cannot accidentally persist a too-shallow root. Slack prompt body changes: Header (single verb): "Approve git status in /home/user/repos/foo?" Header (multi-verb): "Approve in /home/user/repos/foo?" + "• git fetch / • git rebase / • git status" Messy: "_complex command — only one-shot approval available_" The Patterns and Directory Roots sections are gone; verb display flows from CandidateVerbs (the v2 matcher's pure verb-chain extraction) with a Patterns fallback for legacy callers. Resolution message single-line format: Always here -> "Saved: <verbs> in <cwd>" Always anywhere -> "Saved: <verbs> anywhere" This chat -> "Saved for this chat: <verbs> in <cwd>" Once -> "Approved (no save)" Deny -> "Denied" Tests (Actors.Tests 1497 -> 1507): - New SlackApprovalBlockBuilderTests covers all the spec scenarios: single-verb header, multi-verb bulleted header, messy hint, five-button row with danger styling on Always anywhere + Deny, legacy Directory Roots / Patterns sections gone, and all five resolution-message branches (Always here / Always anywhere / This chat / Once / Deny). - Existing DiscordApprovalPromptBuilderTests label expectations bumped to the new spelling ("Once" / "Always here"). All 3407 tests pass; slopwatch clean; file headers verified. Discord rendering still on v1 — section 8 mirrors this design over.

Section 8 of the approval-policy-v2 OpenSpec change. Brings the Discord approval prompt to parity with the Slack v2 layout from section 7: same 5-button row, same danger styling rules, same header format, same single-line resolution message. DiscordApprovalPromptBuilder changes: - BuildButtonPrompt now renders the v2 header ("Approve git status in /home/user/repos/foo?" for single-verb, "Approve in /home/user/repos/foo?" + bulleted verbs for multi-verb) and surfaces the "complex command — only one-shot approval available" hint when IsMessy is true. - BuildResolvedPromptText emits the single-line resolution form identical to Slack: Always here -> "Saved: <verbs> in <cwd>" Always anywhere -> "Saved: <verbs> anywhere" This chat -> "Saved for this chat: <verbs> in <cwd>" Once -> "Approved (no save)" Deny -> "Denied" - GetButtonStyle applies DiscordButtonStyle.Danger to both ApproveEverywhere and Deny, mirroring Slack's danger pair. - Verb display sources from CandidateVerbs (v2) with a Patterns fallback for legacy callers. - GetDecisionLabel handles ApproveEverywhere alongside the existing keys. No Discord-side response-handler changes required: the transport decodes button values and forwards selectedKey to the session actor, and LlmSessionActor's switch (updated in section 7) already routes ApproveEverywhere for both channels. Tests (Actors.Tests 1507 -> 1514): - Existing two BuildResolvedPromptText cases bumped to assert the v2 single-line form ("Approved (no save)" / "Denied") instead of the v1 "Decision: <label>" string. - Seven new V2_ tests parallel to SlackApprovalBlockBuilderTests: single-verb header collapse, multi-verb generic header with bullets, messy-command hint with two-button row, five-button row with danger styling on Always anywhere and Deny, and the three persistent-resolution branches (Always here / Always anywhere / This chat). All 3414 tests pass; slopwatch clean; file headers verified. Both Slack and Discord approval flows now end-to-end on v2.

…hint Section 9 of the approval-policy-v2 OpenSpec change. Steers the agent toward declaring its project root early and gives it a self-correction path when a shell call is denied for cwd-outside- safe-spaces. netclaw-operations SKILL.md (bumped to v2.0.0): - Rewrote Approval Prompts around the v2 (verb, directory) model: three-layer gate (hard-deny / safe-verb short-circuit / interactive prompt), the five-button row and its scope semantics, when fewer buttons appear (messy / shallow cwd), and how set_working_directory affects prompt cadence. - Added "Pre-approving for unattended tasks (load-bearing)" section documenting the schedule-creation pre-approval flow. Replaces the v1 "run interactively first" pattern with the new 'netclaw approvals trust-verb <verb>' path; agent dialogue example shows how to ask the user before pre-approving. - Updated the Approval Requirements for Reminders/Webhooks section to point at trust-verb instead of interactive-first. - Updated the inspecting/revoking section: list emits typed entries ('<verb> in <dir>' / '<verb> anywhere'); revoke accepts those forms verbatim; trust-verb is the deliberate scriptable path. - Last-resort recovery now mentions both .v1.bak and .invalid quarantine paths. Resources/AGENTS.md (Personal+Team identity file): - New top-level "Declare Your Project Root Early (load-bearing)" section. Tells the agent its FIRST shell-related action MUST be set_working_directory when the task is project-scoped, with the consequence framing ("burns the user's attention and your token budget" if skipped). Includes a recovery rubric: when shell denial surfaces a set_working_directory hint, read it and self-correct rather than re-prompting the user. - AGENTS.public.md unchanged because set_working_directory is profile-managed away from Public. set_working_directory tool description: - Reframed from "set the project directory for this session" to "Declare your project root and expand your trusted scope." Spells out the safe-verb short-circuit consequence so the model sees *why* this tool matters for friction reduction. Removed the cd- style framing. - Added public ToolName constant so the failure-path hint logic can reference it without string duplication. Failure-path hint (SessionToolExecutionPipeline.BuildSetWorkingDirectoryHint): - Emits a one-line hint pointing at 'set_working_directory <cwd>' when: * tool is shell_execute * decision is Denied (not TimedOut, not hard-deny) * cwd is non-null * cwd is NOT inside SessionDirectory or ProjectDirectory * set_working_directory is exposed to the current audience - LlmSessionActor pre-computes setWorkingDirectoryAvailable from the ToolAccessPolicy's IsToolExposed check and threads the bool into ExecuteToolsAsync; the pipeline appends the hint to the deny-result text the model sees on its next turn. - Suppresses for non-shell tools, timeouts, hard-deny refusals, cwd already inside a safe space, and audiences without the tool — so Public sessions don't see misleading "use set_working_directory" guidance. Tests (Actors.Tests 1514 -> 1521): - Seven hint-helper unit tests cover all the spec scenarios: emitted on cwd-outside denial; suppressed when tool unavailable; suppressed for TimedOut; suppressed for non-shell tools; suppressed when cwd is inside session_dir; suppressed when cwd is inside project_dir; suppressed when cwd is null. All 3421 tests pass; slopwatch clean; file headers verified.

Cleanup pass on the approval-policy-v2 PR. Two related dead-code removals that were marked "to delete in section 7" but never trimmed. Dead v1 directory-extraction helpers: - IShellApprovalSemantics.ExtractDirectoryRoots (interface + impl). - ShellApprovalSemanticsBase.TryCreateDirectoryApprovalRoot, ExtractDisplayDirectory, NormalizeDisplayDirectory, IsRelativeDisplayPath, EnsureTrailingSeparator, CountPathSegments, GetLastShellSeparatorIndex. - PosixShellApprovalSemantics.ExtractDisplayDirectory and EnsureTrailingSeparator overrides. - ShellTokenizer.ExtractDirectoryRoots and MinDirectoryScopeDepth. - DirectoryApprovalRoot record (file deleted). - ShellTokenizerTests.ExtractDirectoryRoots_* test methods plus the AbsoluteRootCases / RelativeRootCases / WindowsAbsoluteDirectoryRootCases TheoryData properties that fed them. These were the v1 "extract directory roots from path arguments" path. v2 derives directory exclusively from ToolExecutionContext.Cwd so nothing in production calls these anymore. DirectoryRoots field plumbing: - ToolApprovalContext, ToolInteractionRequest (SessionOutput), SessionOutputDto.InteractionDirectoryRoots, the mapper round-trip, PendingToolInteraction, IParentApprovalBridge.RequestApprovalAsync, ParentSessionApprovalBridge, SubAgentActor caller, the pipeline emit site, and the TUI rendering in ChatViewModel/ChatPage. - All carriers always passed [], per the spec's "REMOVED Requirement: Directory root extraction via IToolApprovalMatcher" and the section 7 prompt redesign which moved cwd into the prompt header. Tests updated: - DaemonClientMappingTests no longer round-trips DirectoryRoots. - ParentSessionApprovalBridgeTests passes a real verb chain instead of the synthetic "/tmp/work/logs/" placeholder it was carrying. - ToolApprovalGateTests drops Assert.Empty(DirectoryRoots) calls that only existed to document the empty-after-cutover state. - ChatPage approval prompt rendering updated to the v2 button labels ("Once / This chat / Always here / Always anywhere / Deny"). 3411 tests pass (10 fewer than before because the ExtractDirectoryRoots_* test methods were removed; nothing else changed). Slopwatch clean; file headers verified.

Followups from the simplification review pass. ApprovalEntry now owns its display + parse round-trip: - Format: ApprovalEntry.FormatScope() emits "<verb> in <dir>" or "<verb> anywhere". - Parse: ApprovalEntry.TryParseScope(input, out entry, out error) is the inverse, accepting only the two user-visible forms. - Both helpers replace duplicated implementations in ApprovalsCommand.FormatEntryForList, ApprovalsCommand.TryParseRevokePattern, and ApprovalDisplayItem.DisplayText. One round-trip source of truth. Hot-path: the actor's per-message file read. - ToolApprovalActor.GetUnapprovedPatterns now snapshots the persisted approvals once per message rather than re-reading + re-parsing tool-approvals.json per candidate verb. For a compound shell with N verbs that's N file reads → 1. Hot-path: per-verb cwd / safe-roots / symlink work. - ScopedShellSafeVerbPolicy.AllShortCircuit hoists Path.GetFullPath, ResolveSafeSpaceRoots, and ContainsSymlinkSegment out of the per-verb loop. The cwd doesn't change between verbs in the same invocation, so a 4-verb compound now does 1 path-normalize + 1 symlink scan instead of 4. ShortCircuitsApproval becomes a thin wrapper that forwards to AllShortCircuit. Wire ApprovalOptionKeys.IsDangerStyled in both channel builders instead of inlining the same `Deny or ApproveEverywhere` switch arm in two files. Consolidate WorkingDirectory/Command extraction in ShellApprovalMatcher to call ToolArgumentHelper.GetString — the helper already handles the PascalCase ↔ camelCase round-trip via key normalization, so the inline two-key TryGetValue duplication was needless and slightly inconsistent with the rest of the codebase. 3411 tests pass; slopwatch clean; file headers verified.

…approval Section 10 of the approval-policy-v2 OpenSpec change. Adds a new "Approval Policy v2" eval category covering the four behavioral guardrails introduced in sections 5 + 9: - approval_set_working_directory_positive Project-scoped prompt mentions a repo path. Asserts the agent calls set_working_directory before any shell tool call into that tree (order check: SWD line < first shell_execute line). - approval_set_working_directory_negative Unrelated prompts ("what's 2+2?", "explain a hash table"). Asserts the agent does NOT preemptively call set_working_directory just because AGENTS.md mentions it. - approval_recovery_hint (multi-turn) T1 plants the cwd-outside-safe-spaces denial hint into the conversation; T2 asserts the agent self-corrects by calling set_working_directory rather than re-prompting the user. Scripting an actual denial inside the eval container would require a preconfigured project_dir mismatch we don't have plumbing for; the hint-shape feed exercises the same self-correction code path. - approval_schedule_pre_approval User asks to schedule a daily reminder using the freshdesk verb. Asserts the agent calls `netclaw approvals trust-verb freshdesk` via shell_execute as part of schedule setup. Task 10.1 cross-checked: "Pre-approving for unattended tasks (load-bearing)" section in netclaw-operations SKILL.md (added in section 9) covers the agent-driven trust-verb flow with example dialogue. No additional skill text needed. Task 10.6 (run the suite, document baseline pass rate) is deferred to local execution — the suite needs NETCLAW_EVAL_PROVIDER_* env + Docker daemon container which only Aaron has set up. Listed in acceptance gates. `bash -n evals/run-evals.sh` parses cleanly.

Folds the change's delta specs into main specs and archives the change to openspec/changes/archive/2026-05-08-approval-policy-v2/. - tool-approval-gates: rewrites shell pattern matching, persistent approval storage, and directory-root approvals around the v2 ApprovalEntry model; adds requirements for safe-verb short-circuit, five-button prompt, single-line resolution, and bash control-flow refusal. - session-cwd: adds shell tool cwd defaults, failure-path hint, and the safe-space expansion contract that set_working_directory now carries; modifies set_working_directory tool to reflect the new framing. Also fixes a pre-existing structural defect (spec was authored with the delta '## ADDED Requirements' heading instead of '## Purpose' + '## Requirements'). - netclaw-cli: replaces the Operator CLI for persistent tool approvals requirement with the v2 version (scope-labeled list, strict revoke parser, trust-verb subcommand, .v1.bak quarantine note).

Two bugs together meant evals never tested in-repo skill changes: 1. The skill scanner expects '<skills>/.system/<skill-name>/SKILL.md' but the eval script copied to '.system/files/<skill-name>/SKILL.md' (matching the repo's feeds/ layout, not the runtime layout). The local copies were silently invisible. 2. The daemon then synced from the live R2 feed, which ships the last released set of skills. So evals always exercised whatever was published, not the source tree. Result: a v2 'netclaw-operations' SKILL.md bumped in this PR was a no-op for evals — the model in the container saw the older 1.x copy from R2 and missed the new approval/trust-verb guidance entirely. Fix: - Copy '.../files/<skill>/' → '$EVAL_HOME/skills/.system/<skill>/'. - Set 'NETCLAW_SkillSync__DisableSystemSkillSync=true' in the eval container so the daemon doesn't fetch+overwrite from the live feed. Confirmed via re-run: skill_load("netclaw-operations") now succeeds inside the eval container (previously: "Skill not found"). The new v2 approval cases ('approval_set_working_directory_positive', 'approval_schedule_pre_approval') visibly improve once the model can see the bumped skill content.

Two cases had genuine eval-design problems independent of the v2 implementation, surfaced once N=5 baselines stabilized. approval_set_working_directory_positive Old prompt: 'I'm working on the Netclaw repository at /tmp. List the files in that directory and tell me what's there.' This is ambiguous between sustained project work (which the v2 spec says SHOULD pre- declare) and a one-shot directory listing (which the spec explicitly says should NOT pre-declare). The model going straight to shell was arguably a correct read of the prompt, not a guidance failure. New prompt makes the sustained-work signal explicit ('debugging session... multiple shell commands across the tree'). approval_recovery_hint Old structure was multi-turn: T1 fed a denial message and instructed 'do not call any tools yet', T2 said 'now call the tool.' Several failure runs showed the model getting stuck in T1's no-tools conditioning and refusing T2 ('I will not call any tools.'). That tests prompt-flip resilience, not recovery-hint comprehension. Rewrote as a single conversational prompt that delivers the denial hint and asks 'how should I unblock this?' which is what a real recovery turn looks like. Side note for the PR: full N=5 baselines on local provider show the inference endpoint is intermittently flaky (Dutchman-style stream stalls — 'out=3 tok_s=8' instead of normal 'out=80 tok_s=27'), which produces eval variance unrelated to either the v2 implementation or these prompts. Aaron will validate via binary- swap before merging.

Aaronontheweb mentioned this pull request May 8, 2026

fix: add pipeline init gate to wrong-requester approval tests #941

Merged

2 tasks

Aaronontheweb force-pushed the openspec/approval-policy-v2 branch 2 times, most recently from 81a306f to 574774a Compare May 8, 2026 19:31

Aaronontheweb added 17 commits May 9, 2026 03:18

Aaronontheweb force-pushed the openspec/approval-policy-v2 branch from c38eb08 to 1c96848 Compare May 9, 2026 03:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(openspec): propose approval-policy-v2#940

docs(openspec): propose approval-policy-v2#940
Aaronontheweb wants to merge 17 commits intonetclaw-dev:devfrom
Aaronontheweb:openspec/approval-policy-v2

Aaronontheweb commented May 8, 2026

Uh oh!

Aaronontheweb commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Aaronontheweb commented May 8, 2026

Summary

What's in scope

Phasing

Validation

Test plan

Uh oh!

Aaronontheweb commented May 9, 2026

Eval Suite Results — Approval Policy v2

Baseline → Variant B (prompt rewrites only)

Infrastructure observation

Eval-infra fixes shipped in this PR

What we know works

Acceptance gate status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant