[luv-319] fix: enforce Stop hook on Cursor Agent CLI + cut 0.0.10-beta.6 (#318)

NiveditJain · claude · web-flow · commit bbbdc8d04527 · 2026-05-08T15:19:16.000-07:00
* [luv-319] fix: enforce Stop hook on Cursor Agent CLI (followup_message + SubagentStop parity) Cursor's `stop` hook ignores the flat `{permission: "deny"}` shape — that's honored on tool events only. The only force-retry channel for Stop is `{followup_message}` on stdout (exit 0), per https://cursor.com/docs/hooks. The instruct branch already used this shape correctly since #245; the deny path needed the same treatment, mirroring Copilot's #299 fix. Without this, the 5 require-*-before-stop builtins were observation-only on Cursor — the deny was logged but the agent stopped cleanly. User repro: session 1b510ad4-906c-4f30-9467-ff2e6c581cce at /home/nivedit/dev-purge. Also subscribes to `subagentStop` (CURSOR_HOOK_EVENT_TYPES + CURSOR_EVENT_MAP) and widens both deny and instruct branches to match it, for parity with the Copilot SubagentStop widening from #299. Cloud Agents caveat: Cursor Cloud Agent VMs do NOT run stop/subagentStop hooks at all, so this fix only covers local Cursor sessions. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore: cut 0.0.10-beta.6 release in CHANGELOG Promote the six entries accumulated under `## Unreleased` to a versioned heading `## 0.0.10-beta.6 — 2026-05-08`. Add a fresh `## Unreleased` heading at the top for the next development cycle. package.json was already at 0.0.10-beta.6 (pre-bumped); no version edit needed here. The CHANGELOG cut completes the release-prep handshake. Entries promoted: - Cursor Stop hook enforcement fix (this PR) - 5 scripts/translate-docs fixes from #305, #306, #307, #312, #313 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,7 +2,10 @@
 
 ## Unreleased
 
+## 0.0.10-beta.6 — 2026-05-08
+
 ### Fixes
+- Make `require-*-before-stop` policies actually enforce on Cursor Agent CLI (and add `SubagentStop` parity). Verified empirically: a `stop` hook emitting Cursor's `{permission: "deny", user_message, agent_message}` flat shape is silently ignored — that shape is honored on tool events only, and on Stop the agent stops cleanly without retry. Per https://cursor.com/docs/hooks the only force-retry channel for `stop` / `subagentStop` is `{followup_message: "<text>"}` on stdout (exit 0), with the text auto-submitted as the next user message (capped at `loop_limit`, default 5). New `cli === "cursor" && eventType in {Stop, SubagentStop}` arm inside the Cursor deny branch in `src/hooks/policy-evaluator.ts` emits that shape ahead of the existing flat-shape return for tool events, mirroring the Cursor Stop instruct branch already at `:336` (which had used `{followup_message}` correctly since #245), the Copilot Stop branch added in #299 at `:279`, and the Gemini AfterAgent branch at `:188`. Without this arm, all 5 `require-*-before-stop` builtins (commit / push / PR / no-conflicts / CI-green) were observation-only on Cursor — exact same failure mode as Copilot pre-#299. Also adds `subagentStop` to `CURSOR_HOOK_EVENT_TYPES` + `CURSOR_EVENT_MAP` so **custom** policies subscribing to `SubagentStop` are reachable from Cursor subagent boundaries (Cursor's `subagentStop` is a sibling of `stop`, same payload + response contract); the instruct branch at `:336` is widened to match both events for parity. The 5 `require-*-before-stop` builtins still match `Stop` only by design — they are session-completion gates (commit / push / PR / conflicts / CI), not subagent-return gates — so the SubagentStop widening does not change builtin behavior. Caveat: Cursor Cloud Agent VMs do NOT run `stop` / `subagentStop` hooks at all (forum-confirmed at <https://forum.cursor.com/t/cursor-cloud-agents-do-not-run-afteragentresponse-or-stop-hooks/159929>) — this fix only covers local Cursor sessions; failproofai cannot enforce Stop policies in Cloud Agent runs. New unit tests pin the Cursor Stop and SubagentStop deny / instruct response shapes (4 tests across `policy-evaluator.test.ts`); new e2e regression in `cursor-integration.e2e.test.ts` confirms `require-commit-before-stop` against a dirty real-git fixture round-trips to the `{followup_message}` shape; existing data-driven `writeHookEntries` and `removeHooksFromFile` tests in `integrations.test.ts` auto-extend to `subagentStop` via iteration over `CURSOR_HOOK_EVENT_TYPES`. Updated `CLAUDE.md` Cursor section with a verified Stop block semantics table and the Cloud Agents caveat.
 - `scripts/translate-docs/mdx-translator.ts`: new `stripStrayTrailingFence` helper, wired into both `translateMdxPage` and `translateReadme`, drops a stray trailing ` ``` ` line that streamed Sonnet runs of long pages sometimes append. The unmatched fence opens a code block that consumes everything to EOF — including the wrapping `</div>` for RTL READMEs — and surfaces in Mintlify as `Failed to parse page content at path i18n/README.he.md: Expected a closing tag for <div> (6:1-6:16)`. Empirically observed on run 25542951106 (post-streaming-switch #307): `docs/i18n/README.he.md` and `docs/i18n/README.tr.md` both ended with 31 fence-line markers (one stray) instead of the canonical 30; a subsequent rebase against main found `docs/i18n/README.ar.md` regenerated by the auto-translate workflow (#312) with the same bug. The helper detects the odd-count case and removes only the last unmatched fence, preserving every balanced pair before it. Also strips the stray trailing fence from all three affected files in this commit so Mintlify can deploy without a re-translate. Six-case unit test covers balanced-unchanged, no-fence-unchanged, stray-trailing-after-balanced-pair, lone-fence, embedded-non-fence-mid-line, and language-tagged pairs (#313).
 - `scripts/translate-docs/translator.ts`: switch `translateContent` from `anthropic.messages.create(...)` to `anthropic.messages.stream(...).finalMessage()` so large Tier-1 (Sonnet) translations don't hit AWS Bedrock's 300 s synchronous `InvokeModel` ceiling. The LiteLLM proxy at `models.aikin.club` routes `claude-sonnet-4-6` weighted 1:1 across `anthropic/claude-sonnet-4-6` and `bedrock/us.anthropic.claude-sonnet-4-6`; under translate-docs load (4 jobs × 4 in-flight = 16 concurrent) any request that lands on Bedrock and runs >300 s is severed by Bedrock and surfaces to the SDK as `APIConnectionError ("Connection error.")` — exactly the symptom that survived #306 (SDK retry bump) and the platform-side `request_timeout: 300 → 600` lift in `exospherehost/platform#345`. Two consecutive matrix runs post-platform-fix ([25540656053](https://github.com/exospherehost/failproofai/actions/runs/25540656053), [25541614351](https://github.com/exospherehost/failproofai/actions/runs/25541614351)) showed the same deterministic failure cohort: the 4 largest pages (`built-in-policies`, `architecture`, `configuration`, `custom-policies`, plus `README`) failing at ~317 s for in-flight slot 1/2 and ~367 s for slots 3/4 — both below the new 600 s ceiling, so the wall isn't ours. `messages.stream(...).finalMessage()` returns the same `Message` shape so the function's public return type is unchanged; Bedrock falls back to `InvokeModelWithResponseStream` (no 300 s wall) and Anthropic-direct supports streaming for the full 10-minute non-streaming budget. SDK `maxRetries: 5`, per-job `MAX_CONCURRENT: 4`, and the platform `request_timeout: 600 s` ceiling all stay as the correct safety bounds; the actual unblock was on the client-side request shape (#307).
 - `scripts/translate-docs`: bump SDK `maxRetries` from the Anthropic default of 2 to 5 in `translator.ts:getClient` and raise per-job `MAX_CONCURRENT` from 2 to 4 in `cli.ts`, both now env-overridable via `TRANSLATE_MAX_RETRIES` and `TRANSLATE_MAX_CONCURRENT`. The LiteLLM proxy behind `ANTHROPIC_BASE_URL` has been horizontally scaled, so the previous cap of 2 (set in #300 to dodge the gateway's connection-drop cliff at ~2 in flight) now leaves capacity on the floor. The errors that *do* still surface are no longer load-induced — they are per-request transient failures (cold replicas, LB hashing landing on an unhealthy pod, idle-socket TCP resets) where the SDK's default 2-retry budget runs out before the LB can route a retry to a healthy replica, and `Anthropic.APIConnectionError ("Connection error.")` bubbles up. Empirically observed: a `--languages zh --force` re-run (Tier-1 Sonnet, 5 uncached MDX pages) returned 2 successes and 2 `Connection error.` lines under the prior 2/2 setting. Bumping to 5 retries (≈0.5+1+2+4+8 ≈ 15 s of jittered backoff per request, 6 connection attempts total per page) absorbs the transient failures; bumping concurrency to 4 takes back the throughput the prior cap forfeited. CI matrix `max-parallel: 4` is unchanged — the new global ceiling of 4×4 = 16 in flight is still half the failure-mode threshold of 28 from #305 even before accounting for the scale-out, so no workflow change needed (#306).
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -114,6 +114,33 @@ which writes a portable `npx -y failproofai --hook ... --cli cursor` command.
 Same self-reference caveat applies — do **not** install the standard `npx`
 form from inside this repo.
 
+**Stop block semantics** (verified against cursor-agent docs as of 2026-05-08
+and live behavior):
+
+| Channel                                              | Effect                                                                                          |
+|------------------------------------------------------|-------------------------------------------------------------------------------------------------|
+| `{followup_message: "<text>"}` JSON stdout (exit 0)  | ✅ Forces another turn — text becomes next user message; capped at `loop_limit` (default 5)     |
+| `{permission: "deny", …}` JSON stdout (exit 0)       | ❌ Honored on tool events only — Stop falls through and agent stops cleanly                     |
+| Exit 2 + stderr (Claude convention)                  | ❌ Surfaced as warning but does NOT trigger retry                                                |
+
+policy-evaluator.ts has a `cli === "cursor" && eventType in {Stop, SubagentStop}`
+branch ahead of the generic Cursor flat-shape deny that emits the
+`{followup_message}` shape, so the 5 `require-*-before-stop` builtins
+actually enforce on Cursor. Same shape applies to SubagentStop (Cursor's
+`subagentStop` is a sibling of `stop`, same payload + response contract);
+we subscribe to it for parity with Copilot so custom policies subscribing
+to SubagentStop also enforce on Cursor subagent boundaries. The 5
+`require-*-before-stop` builtins still match `Stop` only by design —
+session-completion gates, not subagent-return gates.
+
+**Cloud Agents caveat:** Cursor Cloud Agent VMs do NOT run `stop` /
+`subagentStop` hooks (or `afterAgentResponse`) — confirmed via Cursor
+forum: <https://forum.cursor.com/t/cursor-cloud-agents-do-not-run-afteragentresponse-or-stop-hooks/159929>.
+This means failproofai cannot enforce Stop policies in Cursor Cloud Agent
+runs; the fix above only covers local Cursor sessions.
+
+Ref: <https://cursor.com/docs/hooks>
+
 ### OpenCode hooks (`.opencode/`)
 
 This repo also ships a project-scope OpenCode (sst/opencode) plugin
diff --git a/__tests__/e2e/helpers/hook-runner.ts b/__tests__/e2e/helpers/hook-runner.ts
@@ -140,6 +140,17 @@ export function assertCursorStopInstruct(result: HookRunResult): void {
   expect(result.parsed?.followup_message).toMatch(/^Instruction from failproofai:/);
 }
 
+export function assertCursorStopBlock(result: HookRunResult): void {
+  // Cursor's stop / subagentStop hooks honor `{followup_message}` on stdout
+  // (exit 0) — auto-submitted as next user message, capped at loop_limit
+  // (default 5). The flat `{permission: "deny"}` shape is ignored on Stop.
+  // Mirrors assertCopilotStopBlock and assertGeminiStopBlock.
+  // Ref: https://cursor.com/docs/hooks
+  expect(result.exitCode).toBe(0);
+  expect(typeof result.parsed?.followup_message).toBe("string");
+  expect(result.parsed?.followup_message).toMatch(/MANDATORY ACTION REQUIRED/);
+}
+
 /**
  * Pi emits a flat `{permission, reason}` JSON shape — the pi-extension shim
  * parses this and translates `permission === "deny"` into a `{block, reason}`
diff --git a/__tests__/e2e/helpers/payloads.ts b/__tests__/e2e/helpers/payloads.ts
@@ -290,6 +290,14 @@ export const CursorPayloads = {
       hook_event_name: "stop",
     };
   },
+  subagentStop(cwd: string): Record<string, unknown> {
+    return {
+      conversation_id: CURSOR_CONVERSATION_ID,
+      transcript_path: TRANSCRIPT_PATH,
+      workspace_roots: [cwd],
+      hook_event_name: "subagentStop",
+    };
+  },
 };
 
 /**
diff --git a/__tests__/e2e/hooks/cursor-integration.e2e.test.ts b/__tests__/e2e/hooks/cursor-integration.e2e.test.ts
@@ -15,6 +15,7 @@ import {
   runHook,
   assertAllow,
   assertCursorDeny,
+  assertCursorStopBlock,
 } from "../helpers/hook-runner";
 import { CursorPayloads } from "../helpers/payloads";
 
@@ -181,6 +182,72 @@ describe("E2E: Cursor integration — hook protocol", () => {
       env.cleanup();
     }
   });
+
+  // Stop hook on Cursor honors `{followup_message}` JSON, NOT `{permission:
+  // "deny"}` (which is a tool-event-only shape) and NOT exit-2+stderr (which
+  // Cursor surfaces as a warning but ignores for retry). Per
+  // https://cursor.com/docs/hooks the followup_message text is auto-submitted
+  // as the next user message, capped at `loop_limit` (default 5). The
+  // `cli === "cursor" && eventType in {Stop, SubagentStop}` branch in
+  // policy-evaluator.ts emits this shape; without it the 5
+  // require-*-before-stop builtins were observation-only on Cursor.
+  it("stop deny emits {followup_message} JSON (Cursor stop force-retry shape)", () => {
+    const env = createCursorEnv();
+    try {
+      writeConfig(env.cwd, ["require-commit-before-stop"]);
+      // require-commit-before-stop denies when the cwd has uncommitted changes.
+      // env.cwd has no .git/ at all, so the policy short-circuits to allow —
+      // we need to plant uncommitted state. Same pattern as the Copilot Stop
+      // e2e in copilot-integration.e2e.test.ts.
+      execSync(
+        "git init -q && git config user.email t@t && git config user.name t && touch tracked && git add tracked && git commit -q -m initial && echo dirty > tracked",
+        { cwd: env.cwd, env: { ...process.env, GIT_TERMINAL_PROMPT: "0" } },
+      );
+      const result = runHook(
+        "stop",
+        CursorPayloads.stop(env.cwd),
+        { homeDir: env.home, cli: "cursor" },
+      );
+      assertCursorStopBlock(result);
+      // The flat {permission: "deny"} shape MUST NOT leak through on Stop —
+      // Cursor would ignore it and the agent would stop cleanly.
+      expect(result.parsed?.permission).toBeUndefined();
+    } finally {
+      env.cleanup();
+    }
+  });
+
+  // SubagentStop is a sibling of stop with the same payload + response
+  // contract per Cursor docs; we subscribe to it (CURSOR_HOOK_EVENT_TYPES,
+  // CURSOR_EVENT_MAP) so custom policies matching SubagentStop are reachable
+  // from Cursor subagent boundaries — parity with Copilot's #299 widening.
+  // Builtin require-*-before-stop policies still match Stop only by design,
+  // so we exercise SubagentStop with a custom policy via a small inline shim:
+  // assert allow when no SubagentStop policy is enabled but the hook fires
+  // and is canonicalized correctly (event lands in activity store as
+  // SubagentStop, not as an unknown camelCase form).
+  it("subagentStop canonicalizes to SubagentStop and reaches the activity store", () => {
+    const env = createCursorEnv();
+    try {
+      writeConfig(env.cwd, []);
+      const result = runHook(
+        "subagentStop",
+        CursorPayloads.subagentStop(env.cwd),
+        { homeDir: env.home, cli: "cursor" },
+      );
+      assertAllow(result);
+
+      const activityPath = resolve(env.home, ".failproofai", "cache", "hook-activity", "current.jsonl");
+      expect(existsSync(activityPath)).toBe(true);
+      const lines = readFileSync(activityPath, "utf-8").trim().split("\n").filter(Boolean);
+      const last = JSON.parse(lines[lines.length - 1]) as Record<string, unknown>;
+      expect(last.integration).toBe("cursor");
+      expect(last.eventType).toBe("SubagentStop");
+      expect(last.cwd).toBe(env.cwd);
+    } finally {
+      env.cleanup();
+    }
+  });
 });
 
 describe("E2E: Cursor integration — install/uninstall", () => {
@@ -203,6 +270,9 @@ describe("E2E: Cursor integration — install/uninstall", () => {
       expect(hooks.sessionEnd).toBeDefined();
       expect(hooks.beforeSubmitPrompt).toBeDefined();
       expect(hooks.stop).toBeDefined();
+      // subagentStop subscribed since #NEW for Copilot-parity: custom policies
+      // matching SubagentStop are reachable on Cursor subagent boundaries.
+      expect(hooks.subagentStop).toBeDefined();
       // PascalCase keys should not be present.
       expect(hooks.PreToolUse).toBeUndefined();
       // Flat array — each entry IS the hook, no `{hooks: [...]}` matcher wrapper.
diff --git a/__tests__/hooks/integrations.test.ts b/__tests__/hooks/integrations.test.ts
@@ -392,6 +392,9 @@ describe("Cursor Agent integration", () => {
     expect(cursor.eventTypes).toContain("sessionStart");
     expect(cursor.eventTypes).toContain("sessionEnd");
     expect(cursor.eventTypes).toContain("stop");
+    // subagentStop subscribed for parity with Copilot — custom policies
+    // matching SubagentStop are reachable on Cursor subagent boundaries.
+    expect(cursor.eventTypes).toContain("subagentStop");
   });
 
   it("buildHookEntry uses Claude-shaped {command,timeout} with --cli cursor", () => {
@@ -480,6 +483,7 @@ describe("CURSOR_EVENT_MAP", () => {
     expect(CURSOR_EVENT_MAP.sessionStart).toBe("SessionStart");
     expect(CURSOR_EVENT_MAP.sessionEnd).toBe("SessionEnd");
     expect(CURSOR_EVENT_MAP.stop).toBe("Stop");
+    expect(CURSOR_EVENT_MAP.subagentStop).toBe("SubagentStop");
   });
 
   it("CURSOR_EVENT_MAP keys exactly match CURSOR_HOOK_EVENT_TYPES", () => {
diff --git a/__tests__/hooks/policy-evaluator.test.ts b/__tests__/hooks/policy-evaluator.test.ts
diff --git a/src/hooks/policy-evaluator.ts b/src/hooks/policy-evaluator.ts
diff --git a/src/hooks/types.ts b/src/hooks/types.ts