Skip to content

fix(generation): make final-review-pass-gate failures attributable (not an opaque exit 1)#119

Merged
khaliqgant merged 2 commits into
mainfrom
fix/final-review-gate-opaque-blocked-failure
May 16, 2026
Merged

fix(generation): make final-review-pass-gate failures attributable (not an opaque exit 1)#119
khaliqgant merged 2 commits into
mainfrom
fix/final-review-gate-opaque-blocked-failure

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

Summary

Follow-up to #117. Diagnoses the other half of the workspace-primitives PR-39 child-update-last-week loop.

The generated child-workflow final-review-pass-gate was:

set -e
grep -F RICKY_CHILD_CLAUDE_FINAL_FIX_READY claude-final-fix.md
grep -F RICKY_CHILD_CODEX_FINAL_FIX_READY  codex-final-fix.md
test ! -f BLOCKED_NO_COMMIT.md
echo RICKY_CHILD_FRESH_EYES_LOOP_READY

When an agent deliberately wrote BLOCKED_NO_COMMIT.md (the documented "I cannot proceed safely, do not commit" protocol), the gate aborted with a bare exit 1, and the captured output showed only the two successful marker greps (grep -F prints matched lines). It looked exactly like a grep-target mismatch:

[final-review-pass-gate] Command failed (exit code 1)
[final-review-pass-gate] stdout:
RICKY_CHILD_CLAUDE_FINAL_FIX_READY
RICKY_CHILD_CODEX_FINAL_FIX_READY

A deliberate "needs a human" signal was indistinguishable from a transient failure, so the auto-fix loop retried it 7× and the master orchestrator re-ran the whole child for hours.

Fix

New shared buildFinalReviewPassGateCommand builder (used by the master-workflow renderer):

  • BLOCKED checked first. On hit: prints a distinct RICKY_CHILD_BLOCKED_NO_COMMIT marker + the agent's evidence to stderr and exits 3 — attributable, and routable to escalation rather than a blind retry.
  • Quiet, self-diagnosing marker checks. grep -qF (no leaked matched lines) and an explicit RICKY_CHILD_GATE_MISSING_MARKER: <detail> on failure — a real missing marker is now visible instead of hidden behind the prior grep's output.
  • Drops the trailing bare test ! -f that failed opaquely.

final-hard-validation in the same renderer gets the same quiet-grep + explicit-missing-marker treatment for consistency.

Out of scope (noted, not changed)

template-renderer.ts has a parallel final-review-pass-gate with the same BLOCKED-vs-missing-marker conflation, but a different marker scheme (tail | tr | grep -Eq '^..._COMPLETE$') and an output_contains verification contract that isn't safe to restructure without separate tracing. Flagged for a focused follow-up rather than risk a half-understood change to its verification semantics.

Test plan

  • New final-review-gate.test.ts: BLOCKED-before-greps ordering, distinct blocked marker + evidence + exit 3, quiet greps with explicit missing-marker diagnostics, success echo preserved, no trailing bare test ! -f
  • npx vitest run — 1170/1170 pass (incl. pipeline.test.ts generation suite)
  • npx tsc --noEmit clean

🤖 Generated with Claude Code

The generated child-workflow `final-review-pass-gate` joined marker
greps and a `test ! -f BLOCKED_NO_COMMIT.md` clause under `set -e`:

  grep -F RICKY_CHILD_CLAUDE_FINAL_FIX_READY claude-final-fix.md
  grep -F RICKY_CHILD_CODEX_FINAL_FIX_READY  codex-final-fix.md
  test ! -f BLOCKED_NO_COMMIT.md
  echo RICKY_CHILD_FRESH_EYES_LOOP_READY

When an agent deliberately wrote BLOCKED_NO_COMMIT.md (the documented
"I cannot proceed safely, do not commit" protocol), the gate aborted
with a bare `exit 1` and the captured output showed only the two
*successful* marker greps (grep -F prints matched lines). Operators
saw "grep ... Command failed (exit code 1)" with the success markers
in stdout and reasonably assumed a grep-target mismatch; the auto-fix
loop treated a deliberate "needs a human" signal as a retryable
INVALID_ARTIFACT and looped for hours (observed on the
workspace-primitives PR-39 master run, child `update-last-week`).

New shared `buildFinalReviewPassGateCommand` builder used by the
master-workflow renderer:
- Checks the BLOCKED sentinel FIRST; on hit, prints a distinct
  `RICKY_CHILD_BLOCKED_NO_COMMIT` marker plus the agent's evidence to
  stderr and exits 3 — attributable and routable to escalation rather
  than a blind retry.
- Runs each marker check quietly (`grep -qF`) and, on failure, prints
  `RICKY_CHILD_GATE_MISSING_MARKER: <detail>` — a genuinely missing
  marker is now diagnosable and never hidden behind a prior grep's
  matched-line output.
- Drops the trailing bare `test ! -f` that failed opaquely.

`final-hard-validation` in the same renderer gets the same quiet-grep
+ explicit-missing-marker treatment for consistency.

Note: `template-renderer.ts` has a parallel `final-review-pass-gate`
with the same BLOCKED-vs-missing-marker conflation but a different
marker scheme and an `output_contains` verification contract that is
not safe to restructure without separate tracing; left for a focused
follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a reusable shell command builder that detects a blocked sentinel, validates ordered marker presence with explicit stderr diagnostics and distinct exit codes, echoes a quoted success marker, includes tests, and integrates the builder into the master workflow renderer's final-review gate and hard-validation steps.

Changes

Gate Command Builder and Integration

Layer / File(s) Summary
Gate command builder module
src/product/generation/final-review-gate.ts
Defines GateMarkerCheck and FinalReviewPassGateOptions; exports GATE_BLOCKED_MARKER and GATE_MISSING_MARKER_PREFIX; implements shellQuote and buildFinalReviewPassGateCommand to produce a multi-line shell script that checks BLOCKED_NO_COMMIT.md first (exit 3 with stderr evidence), then runs ordered presenceTest checks (prints RICKY_CHILD_GATE_MISSING_MARKER: <detail> and exits 1 on failure), and finally echoes the shell-quoted success marker.
Gate command builder tests
src/product/generation/final-review-gate.test.ts
Vitest suite validating: blocked sentinel check ordering before greps; blocked path emits blocked marker to stderr, cats evidence (guarded) and exits with code 3; marker checks use grep -qF with explicit missing-marker diagnostics; success marker is shell-quoted and echoed last; no trailing bare test ! -f clause.
Master workflow renderer integration
src/product/generation/master-workflow-renderer.ts
Imports and uses buildFinalReviewPassGateCommand to generate the final-review-pass-gate step (checking Claude/Codex final-fix markers, emitting RICKY_CHILD_FRESH_EYES_LOOP_READY on success) and updates final-hard-validation to use explicit conditional grep -qF checks with per-marker stderr diagnostics and immediate failure on missing markers.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 A shell-built gate with checks in line,
Blocked files warned, diagnostics shine.
Quiet greps sing unless a marker's gone,
Quoted success echoed — fresh eyes on! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main fix: making failures in final-review-pass-gate attributable with distinct exit codes and markers instead of opaque exit 1.
Description check ✅ Passed The description is detailed and directly related to the changeset, explaining the problem, the fix, scope boundaries, and test coverage.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/final-review-gate-opaque-blocked-failure

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 4 additional findings in Devin Review.

Open in Devin Review

'fi',
);
}
lines.push(`echo ${options.successMarker}`);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 successMarker not shell-quoted, unlike every other dynamic value in buildFinalReviewPassGateCommand

options.successMarker is interpolated raw into the shell script at final-review-gate.ts:81 via `echo ${options.successMarker}`, while every other dynamic value in the same function is passed through shellQuote() (e.g., blockedPath at lines 63/69, GATE_BLOCKED_MARKER at line 64, the evidence message at line 65, and missingDetail at line 76). If a caller ever passes a successMarker containing shell metacharacters (spaces, $(), backticks, etc.), the generated shell script will misbehave or execute unintended commands. Current callers pass the safe constant 'RICKY_CHILD_FRESH_EYES_LOOP_READY', but the function is an exported shared builder designed for reuse, and the inconsistency breaks the quoting discipline the rest of the function establishes.

Suggested change
lines.push(`echo ${options.successMarker}`);
lines.push(`echo ${shellQuote(options.successMarker)}`);
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/product/generation/master-workflow-renderer.ts (1)

9-9: ⚡ Quick win

Use the shared missing-marker prefix constant here to avoid drift.

RICKY_CHILD_GATE_MISSING_MARKER is duplicated inline in hard-validation while the builder already exports GATE_MISSING_MARKER_PREFIX. Reuse the constant so gate and hard-validation diagnostics stay aligned.

Proposed refactor
-import { buildFinalReviewPassGateCommand } from './final-review-gate.js';
+import {
+  buildFinalReviewPassGateCommand,
+  GATE_MISSING_MARKER_PREFIX,
+} from './final-review-gate.js';
...
-      `if ! grep -qF RICKY_CHILD_CLAUDE_FINAL_FIX_READY ${shellQuote(`${artifactsDir}/claude-final-fix.md`)}; then echo ${shellQuote(`RICKY_CHILD_GATE_MISSING_MARKER: ${artifactsDir}/claude-final-fix.md is missing RICKY_CHILD_CLAUDE_FINAL_FIX_READY`)} >&2; exit 1; fi`,
-      `if ! grep -qF RICKY_CHILD_CODEX_FINAL_FIX_READY ${shellQuote(`${artifactsDir}/codex-final-fix.md`)}; then echo ${shellQuote(`RICKY_CHILD_GATE_MISSING_MARKER: ${artifactsDir}/codex-final-fix.md is missing RICKY_CHILD_CODEX_FINAL_FIX_READY`)} >&2; exit 1; fi`,
+      `if ! grep -qF RICKY_CHILD_CLAUDE_FINAL_FIX_READY ${shellQuote(`${artifactsDir}/claude-final-fix.md`)}; then echo ${shellQuote(`${GATE_MISSING_MARKER_PREFIX}: ${artifactsDir}/claude-final-fix.md is missing RICKY_CHILD_CLAUDE_FINAL_FIX_READY`)} >&2; exit 1; fi`,
+      `if ! grep -qF RICKY_CHILD_CODEX_FINAL_FIX_READY ${shellQuote(`${artifactsDir}/codex-final-fix.md`)}; then echo ${shellQuote(`${GATE_MISSING_MARKER_PREFIX}: ${artifactsDir}/codex-final-fix.md is missing RICKY_CHILD_CODEX_FINAL_FIX_READY`)} >&2; exit 1; fi`,

Also applies to: 534-535

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/product/generation/master-workflow-renderer.ts` at line 9, The code
duplicates the missing-marker string literal RICKY_CHILD_GATE_MISSING_MARKER
instead of reusing the shared constant; change imports to pull the exported
GATE_MISSING_MARKER_PREFIX from the builder (e.g., add import {
GATE_MISSING_MARKER_PREFIX } from './final-review-gate.js') and replace uses of
the inline RICKY_CHILD_GATE_MISSING_MARKER with a constructed value based on
GATE_MISSING_MARKER_PREFIX (or reuse the prefix directly) so the gate code and
hard-validation diagnostics stay aligned (also update the other occurrence
currently duplicated near the hard-validation usage).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/product/generation/final-review-gate.test.ts`:
- Around line 31-36: Replace the brittle substring/indexOf assertions that use
blockedIdx, firstGrepIdx and command with parser-based AST assertions: parse the
shell command string stored in command into a shell AST (use your repo's
shell/command parser utility or a library that produces a command AST), then
walk the AST to locate the node that references "BLOCKED_NO_COMMIT.md" and the
node representing the grep invocation with the pattern
"RICKY_CHILD_CLAUDE_FINAL_FIX_READY", and assert their relative order (the
BLOCKED file operand appears before the grep command node). Apply the same
AST-based replacement for the other occurrences around lines 50-52.

In `@src/product/generation/final-review-gate.ts`:
- Around line 69-70: The script currently runs `cat ${blockedPath} >&2` followed
by `exit 3`, but with set -e a failing cat can abort before exit 3; update the
command that reads blockedPath (referencing blockedPath and the surrounding
script generation in final-review-gate.ts) so the cat failure is ignored (e.g.,
make the read command safe like appending a short-circuit or conditional) while
still preserving the unconditional `exit 3` that follows, ensuring blocked
routing returns the intended exit code even if evidence reading fails.

---

Nitpick comments:
In `@src/product/generation/master-workflow-renderer.ts`:
- Line 9: The code duplicates the missing-marker string literal
RICKY_CHILD_GATE_MISSING_MARKER instead of reusing the shared constant; change
imports to pull the exported GATE_MISSING_MARKER_PREFIX from the builder (e.g.,
add import { GATE_MISSING_MARKER_PREFIX } from './final-review-gate.js') and
replace uses of the inline RICKY_CHILD_GATE_MISSING_MARKER with a constructed
value based on GATE_MISSING_MARKER_PREFIX (or reuse the prefix directly) so the
gate code and hard-validation diagnostics stay aligned (also update the other
occurrence currently duplicated near the hard-validation usage).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 5b7a9970-0798-4a67-82c4-1d320b3d236a

📥 Commits

Reviewing files that changed from the base of the PR and between 053fe6e and 0d25f4f.

📒 Files selected for processing (3)
  • src/product/generation/final-review-gate.test.ts
  • src/product/generation/final-review-gate.ts
  • src/product/generation/master-workflow-renderer.ts

Comment on lines +31 to +36
const blockedIdx = command.indexOf('BLOCKED_NO_COMMIT.md');
const firstGrepIdx = command.indexOf('grep -qF RICKY_CHILD_CLAUDE_FINAL_FIX_READY');
expect(blockedIdx).toBeGreaterThan(-1);
expect(firstGrepIdx).toBeGreaterThan(-1);
expect(blockedIdx).toBeLessThan(firstGrepIdx);
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Replace substring/regex command-shape assertions with parser-based shell AST assertions.

These checks currently rely on indexOf/regex pattern presence checks over shell text. Please switch to parser-based assertions for command structure/order to align with repository rules and reduce brittle text matching.

As per coding guidelines, "Use parser-based approaches (AST walk with typescript module, mdast-util-from-markdown) instead of regex or substring matching when inspecting TypeScript, JavaScript, Markdown, JSON, or shell artifacts in Ricky source code."

Also applies to: 50-52

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/product/generation/final-review-gate.test.ts` around lines 31 - 36,
Replace the brittle substring/indexOf assertions that use blockedIdx,
firstGrepIdx and command with parser-based AST assertions: parse the shell
command string stored in command into a shell AST (use your repo's shell/command
parser utility or a library that produces a command AST), then walk the AST to
locate the node that references "BLOCKED_NO_COMMIT.md" and the node representing
the grep invocation with the pattern "RICKY_CHILD_CLAUDE_FINAL_FIX_READY", and
assert their relative order (the BLOCKED file operand appears before the grep
command node). Apply the same AST-based replacement for the other occurrences
around lines 50-52.

Comment thread src/product/generation/final-review-gate.ts Outdated
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 3 files

Re-trigger cubic

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 15, 2026

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-16T06-46-46-683Z-HEAD-provider
Mode: provider
Git SHA: f2a6634

Passed: 11 | Needs human: 44 | Reviewable: 44 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Ricky – Updated Documentation & Code to Reflect Deeper Integration with Agent Assistant**  

Below is a complete, deterministic workflow for delivering the requested change. It follows the **Ricky repository conventions** (AGENTS.md, workflow standards, shared authoring rules) and includes all required artifacts, verification gates, and review checkpoints.

---

## 1. Scope & Objectives
| Item | Description |
|------|-------------|
| **Target** | `README.md`, `docs/architecture.md`, and the `ricky/__init__.py` module. |
| **Goal** | Make it explicit that Ricky **leverages the Agent Assistant (AA) platform for**: <br>• Prompt orchestration <br>• Context‑aware routing <br>• Runtime verification <br>• Adaptive retry policies |
| **Boundaries** | No functional code changes beyond updating the `AA_INTEGRATION_LEVEL` constant and doc strings. No impact on existing CI pipelines. |
| **Blockers** | None anticipated. If the AA SDK version is out‑of‑sync, raise a blocker in the PR comment. |

---

## 2. Workflow Contract (YAML)

```yaml
name: update-ricky-aa-integration
on:
  workflow_dispatch:
    inputs:
      branch:
        description: "Target branch (e.g., main, release/vX.Y)"
      
...[truncated]

Must

  • Ground claims in real package imports and runtime paths.
  • Distinguish current implementation from target architecture.
  • Identify which Agent Assistant primitive is actually exercised.

Must Not

  • Rename local code to sound Agent Assistant aligned and count that as adoption.
  • Claim broad Agent Assistant native behavior from documentation-only alignment.
  • Blur target architecture with landed behavior.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review**  
*Prepared 2026‑05‑16 – internal audit version 1.2*

---

## 1. Scope & Methodology
| Item | Detail |
|------|--------|
| **Target artifact** | `@agent-assistant/turn-context` package (v 2.3.1) |
| **Evaluation period** | 2024‑Q3 → 2025‑Q4 |
| **Data sources** | - GitHub repository analytics (PRs, commits, code‑owners)<br>- CI/CD telemetry (coverage, lint, integration tests)<br>- Runtime observability (trace logs, latency histograms)<br>- Survey of 27 Ricky‑engineers (response rate 85 %) |
| **Success criteria** (per AGENTS.md “adoption contract”) | 1. **Import usage** – ≥ 80 % of active services import the module directly.<br>2. **Deterministic gate** – All CI pipelines enforce `npm audit` and `type‑check` on the package.<br>3. **Review artifact** – Every PR that touches turn‑context must include a “Turn‑Context Impact” checklist item.<br>4. **Validation loops** – Minimum 80 % of changes pass ≥ 80 % of the 100‑step deterministic test matrix before merge.<br>5. **Blocker reporting** – Any failing gate must be logged in the `#ricky‑adoption` channel within 15 min. |
| **Verification tools** | - `ricky‑audit` (custom CLI) runs 
...[truncated]

Must

  • Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
  • Record compact provenance through generation decisions or coordinator metadata.
  • Keep the shared turn context internal to the adapter boundary.

Must Not

  • Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
  • Drop Ricky-specific workflow metadata during envelope assembly.
  • Treat turn context as a product decision engine.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation:** Move **all four** capabilities—workflow generation, validation, debugging, and the staged CLI UX (including blocker/evidence wording)—into the **Agent Assistant** component.

---

## Rationale (aligned with AGENTS.md and workflow standards)

| Capability | Why it belongs in Agent Assistant | Impact on reliability & coordination |
|------------|-----------------------------------|--------------------------------------|
| **Workflow Generation** | Agent Assistant already owns *authoring* primitives (template expansion, contract stitching). Adding generation keeps the “create‑first” path in a single, deterministic module. | Guarantees that every generated workflow immediately satisfies the **contract‑first** rule, reducing downstream validation failures. |
| **Workflow Validation** | Validation is the deterministic gate that enforces the **contract‑first** contract before any execution. Centralizing it in Agent Assistant ensures a single source of truth for schema checks, type safety, and policy compliance. | Provides a **single‑point verification** that can be reused by CI, PR checks, and runtime guards, eliminating duplicated validators. |
| **Workflow Debugging*
...[truncated]

Must

  • Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
  • Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
  • Make extraction follow typed, tested, live product proof.

Must Not

  • Generalize workflow-specific behavior prematurely.
  • Adopt moving shared seams merely for architectural purity.
  • Lose the precise local-first staged workflow UX.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption – “Ricky v2.0 Slice”**  
*Target window: July 1 – Oct 31 2026*  

---

## 1. Slice Goal
Deliver a production‑ready, self‑service adoption package that lets any AgentWorkforce team onboard Ricky with **zero‑touch CI/CD**, **deterministic verification**, and **clear rollback**. The slice will be the first “plug‑and‑play” release after the core reliability refactor (v1.9).

---

## 2. Scope & Boundaries  

| Area | In‑Scope | Out‑of‑Scope (deferred) |
|------|----------|--------------------------|
| **Workflow contracts** | Public `ricky-adopt.yaml` schema, versioned at `v2.0.0` | Multi‑region federation |
| **Verification gates** | Unit, contract, integration, performance, security | Chaos‑engineering at scale |
| **Review artifacts** | Auto‑generated `adoption‑report.md`, `gate‑summary.json` | Manual post‑mortem deep‑dive |
| **Validation loops** | 80‑loop CI (unit → contract → integration) + 20‑loop “real‑world” sandbox | Full‑production canary |
| **Branch/PR model** | `adopt/v2.0` feature branch → `release/v2.0` PR (protected) | Long‑living feature flags |

---

## 3. Deliverables (by milestone)

| Milestone (date) | Deliverable | Description | Acceptan
...[truncated]

Must

  • Pick exactly one real shared seam to evaluate or adopt.
  • Define a live Ricky product path that will prove the adoption.
  • Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

  • Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
  • Skip the comparison/evaluation step for mature Ricky-local seams.
  • Treat adoption as successful without a live product-path proof.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky × Slack / Web Support – Agent‑Assistant Package Design  

Below is a **complete, contract‑first workflow package** that lets end‑users invoke Ricky from Slack (or a web chat widget) and receive deterministic, review‑ready assistance for workflow reliability, coordination, and authoring tasks.  

---  

### 1. High‑Level Architecture  

| Component | Responsibility | Tech Stack (recommended) |
|-----------|----------------|--------------------------|
| **Slack / Web Front‑end** | UI glue – receives user messages, displays Ricky’s replies, shows verification status. | Slack Bot (Bolt JS) + WebSocket‑enabled React widget (or plain HTML/JS). |
| **Ricky API Gateway** | Auth, rate‑limit, request routing, contract validation. | FastAPI (Python) + OpenAPI 3.1 spec. |
| **Ricky Core Engine** | Executes the requested workflow (author, repair, review). | Python 3.12, `agentworkforce` library, deterministic execution sandbox. |
| **Verification Service** | Runs the 80‑100 validation loops, gates PR creation, reports blockers. | Celery workers + Redis, deterministic test harness. |
| **GitOps Layer** | Scoped branch/PR creation, CI checks, artifact storage. | GitHub Actions, protected
...[truncated]

Must

  • Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
  • Keep local CLI behavior product-local unless shared adoption is proven harmless.
  • Explain which behavior is future/target architecture versus implemented today.

Must Not

  • Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
  • Let future surface abstractions distort the current CLI contract.
  • Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What happens when you run**

```bash
ricky --mode local --spec "generate a workflow for package checks"

without the --run flag?


1. Ricky’s execution path

Step Action Outcome
Parse CLI --mode local tells Ricky to operate against the local repository; --spec supplies the natural‑language request. Ricky loads the spec parser and the local context (repo tree, ricky.yaml, existing workflows).
Spec → Contract The spec string is sent to the Spec‑to‑Contract engine which produces a deterministic workflow contract (JSON/YAML) describing inputs, steps, outputs, and verification gates. A workflow contract file (e.g., generated/package-checks.workflow.yaml) is written to the ricky/workflows/generated/ folder.
No --run Because --run is omitted, Ricky stops after contract generation. It does not invoke the execution engine or any of the verification gates. You receive a preview of the workflow and a short CLI summary, but no actions (e.g., linting, testing, publishing) are performed.
Exit Ricky exits with status 0 (success) if contract gene
...[truncated]

**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding**

Below is the complete, ready‑to‑run onboarding workflow for a brand‑new user of the Ricky CLI. It follows the AgentWorkforce standards for deterministic contracts, verification gates, and review artifacts. Copy the script into a file named `ricky-onboard.sh`, make it executable (`chmod +x ricky-onboard.sh`), and run it (`./ricky-onboard.sh`). All steps are idempotent – re‑running the script will safely skip already‑completed actions.

---

## 1️⃣  Workflow Contract

| Phase | Input | Output | Success Criteria |
|-------|-------|--------|-------------------|
| **EnvCheck** | None | `env_report.json` | Detects OS, Python ≥3.9, Git, Docker, and network connectivity. |
| **InstallDeps** | `env_report.json` | `deps_installed.log` | All required system packages and Python virtual‑env are present. |
| **CloneRepo** | None | `agentworkforce/` directory | Repo cloned at the exact commit defined in `RICKY_COMMIT`. |
| **Configure** | `agentworkforce/` | `~/.ricky/config.yaml` | Config file contains user‑specific tokens, workspace path, and logging level. |
| **Bootstrap** | Config file | `bootstrap_report.json` | All Ricky services (API, worker, UI) star
...[truncated]

Must

  • Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
  • End every branch with a concrete next step.
  • Advertise only commands that are currently implemented.

Must Not

  • Sound like a launch page or documentation dump.
  • Claim Ricky runs workflows by default when generation is the default path.
  • Require web or Slack onboarding before CLI use.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.recovery-guidance-no-stack-traces (cli-behavior/manual)

User message

A user gives Ricky an empty spec or a missing spec file.

Ricky output

**Ricky’s Standard Response to an Empty or Missing Spec File**

---

### 1. Immediate Detection & Blocker Report
1. **Detect** – As soon as the workflow‑authoring command (`ricky new‑workflow`, `ricky edit‑workflow`, etc.) runs, the CLI checks for the presence and non‑emptiness of the spec file (`workflow.yaml` or `spec.json`).
2. **Blocker** – If the file is missing or its size is 0 bytes, the tool aborts the operation and emits a **deterministic blocker**:

[BLOCKER] Spec file not found or empty: ./specs/<workflow‑name>.yaml
Action required: provide a non‑empty spec before proceeding.


The blocker is logged to `ricky.log` and also returned as a structured JSON payload for downstream CI pipelines.

---

### 2. Automated Recovery Scaffold (Optional “quick‑fix” mode)

If the user opts‑in with `--auto‑scaffold`, Ricky will:

| Step | Action | Artifact |
|------|--------|----------|
| 2.1 | Create a **placeholder spec** in the expected location with the minimal required top‑level keys (`name`, `description`, `steps: []`). | `./specs/<workflow‑name>.yaml` |
| 2.2 | Add a **TODO comment** prompting the author to fill in details. | `# TODO: Populate workflow steps` |
| 2.3 | Open
...[truncated]

Must

  • Return a user-facing failure or guidance message with a real recovery command.
  • Distinguish generation failure from execution failure.
  • Show stack traces only when verbose diagnostic mode is requested.

Must Not

  • Crash with an uncaught exception in normal mode.
  • Suggest commands that do not exist.
  • Pretend a missing spec was accepted.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output (no provider checks have proven Google or GitHub are connected)**  

=== AgentWorkforce Status Report ===

🟡 Provider Connectivity
• Google : NOT VERIFIED (no successful check)
• GitHub : NOT VERIFIED (no successful check)

🔧 Configuration
• Google OAuth client ID :
• GitHub App ID :
• Token store : initialized (no active tokens)

⚙️ Runtime
• Scheduler : idle
• Last health check : N/A
• Active workflows : 0

🚨 Blockers
• Google provider not confirmed – actions requiring Google APIs will be skipped.
• GitHub provider not confirmed – actions requiring GitHub APIs will be skipped.

📝 Next Steps

  1. Run ricky provider check google to validate Google connectivity.
  2. Run ricky provider check github to validate GitHub connectivity.
  3. Resolve any authentication or network errors reported by the checks.
  4. Re‑run ricky status to see updated connectivity state.

--- End of Report ---

Must

  • Report unknown or not-connected provider state honestly.
  • Update provider status only from explicit provider checks or Cloud status results.
  • Give concrete setup guidance for Cloud when relevant.

Must Not

  • Mark Google or GitHub connected because guidance text was shown.
  • Invent a provider connection URL or OAuth flow.
  • Show empty fields with no recovery guidance when config is missing.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)

User message

Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

Ricky output

**Workflow: “GitHub Primitive Change”**

| Step | Owner | Action | Input | Output | Verification Gate |
|------|-------|--------|-------|--------|--------------------|
| 1️⃣ | **Ricky (Coordination)** | **Create a scoped branch** | `main` (latest) | New branch `feat/<short‑description>` | `git checkout -b` succeeds; branch name matches regex `^feat\/[a-z0-9-]+$` |
| 2️⃣ | **Ricky (Branch‑Safety)** | **Sync with upstream** | Remote `origin/main` | Branch up‑to‑date with `origin/main` | `git fetch && git rebase origin/main` returns clean |
| 3️⃣ | **Developer** | **Implement primitive change** (e.g., rename a variable, adjust a comment, bump a version) | Branch from step 1 | Modified files | Local lint/format (`npm run lint` / `flake8`) passes |
| 4️⃣ | **Ricky (Deterministic Verification)** | **Run unit‑test suite** | Code from step 3 | Test report | All tests `PASS`; any failure aborts PR |
| 5️⃣ | **Ricky (Static Analysis)** | **Run security & style scanners** | Code from step 3 | Scan report | No `CRITICAL` findings; `HIGH` findings must be justified |
| 6️⃣ | **Developer** | **Commit changes** | Verified code | Signed commit(s) | `git log --show-signature` shows a valid GPG sign
...[truncated]

Must

  • Select skills from the actual registry rather than a hardcoded static set.
  • Record ranked skill matches with confidence and trigger evidence.
  • Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

  • Claim runtime agents embody skills when only generation-time guidance was applied.
  • Fail generation solely because optional skill files are missing.
  • Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow: `review‑and‑implement‑claude‑codex.yml`**  
*(conforms to AGENTS.md conventions, workflow standards, and shared authoring rules)*  

```yaml
name: Review & Implement (Claude → Codex)

on:
  pull_request:
    types: [opened, synchronize, reopened]
    branches: [main]

permissions:
  contents: read
  pull-requests: write
  issues: write

jobs:
  # ----------------------------------------------------------------------
  # 1️⃣ CLAUDE REVIEW
  # ----------------------------------------------------------------------
  claude-review:
    runs-on: ubuntu-latest
    outputs:
      review_id: ${{ steps.save-review.outputs.id }}
    steps:
      - name: Checkout source
        uses: actions/checkout@v3

      - name: Install Claude CLI
        run: |
          curl -sSL https://cli.claude.ai/install.sh | bash
          echo "${{ secrets.CLAUDE_API_KEY }}" > ~/.claude_key

      - name: Run Claude review
        id: run-review
        run: |
          claude review \
            --model=claude-3-opus \
            --input=./ \
            --output=review.json \
            --max-tokens=4096
          cat review.json

      - name: Save review artifact
        id: save-review
     
...[truncated]

Must

  • Assign per-step runner/model decisions from explicit spec hints where possible.
  • Let skill metadata or project defaults fill gaps when the spec is silent.
  • Write tool-selection.json or equivalent audit metadata.

Must Not

  • Use one generic runtime default for every agent despite explicit hints.
  • Let skill defaults override explicit user/spec runner hints.
  • Omit the reason each tool or model was chosen.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

## Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is the deterministic “polish” flag for any **AgentWorkforce** workflow file (`.awf`). It runs a second‑pass pass‑through that:

1. **Normalises** free‑form prose into the **Task Description** schema.  
2. **Adds / tightens** the **Acceptance Gate** JSON block.  
3. **Validates** the result against the workflow contract (schema version 2.1).  

Below is a step‑by‑step contract you can copy‑paste into your CI pipeline or local dev loop.

---

### 1️⃣ Contract Overview  

| Element | Required | Type | Notes |
|---------|----------|------|-------|
| `workflow.id` | ✅ | string | Globally unique (e.g., `org.project.my‑workflow`) |
| `workflow.version` | ✅ | semver | Increment on every `--refine` run |
| `steps[]` | ✅ | array | Ordered list of step objects |
| `steps[i].task` | ✅ | string | Must be ≤ 120 chars, imperative mood |
| `steps[i].description` | ✅ | string | Full‑sentence, ≤ 300 chars |
| `steps[i].acceptance` | ✅ | object | See **Gate Schema** below |
| `steps[i].owner` | ✅ | string | GitHub handle or team name |
| `steps[i].timeout` | ❌ | duration | Default `15m` if omitted |

---

### 2️⃣ Gate S
...[truncated]

Must

  • Keep the deterministic unrefined path as the default.
  • Bound the refinement pass by timeout, token budget, and editable regions.
  • Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

Must Not

  • Change the workflow graph, agent assignments, or side-effect scope during refinement.
  • Fail the whole generation if optional refinement times out.
  • Hide warnings when the deterministic artifact is returned unchanged.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.behavior-grounded-gates (generation-quality/manual)

User message

Generate a workflow for the ricky --version spec.

Ricky output

**Workflow: `ricky --version` Specification & Delivery**  
*Repository:* `github.com/agentworkforce/ricky`  
*Target Branch:* `main` → `feature/ricky-version`  

---

## 1. Scope & Acceptance Criteria  

| # | Requirement | Success Metric |
|---|-------------|----------------|
| 1 | `ricky --version` prints the current semantic version (MAJOR.MINOR.PATCH) of the CLI. | Exact string `ricky version X.Y.Z` printed to **stdout** (no trailing spaces). |
| 2 | Version is sourced from a single source of truth (`package.json` or `VERSION` file). | Changing the source updates the CLI output without code changes. |
| 3 | The command exits with status **0**. | `echo $?` after execution equals `0`. |
| 4 | The command is discoverable via `ricky -h/--help`. | Help text contains the line `--version   Show version information`. |
| 5 | The binary reports the same version when invoked as `node -p "require('./package.json').version"` (for JS implementation) or `cat VERSION`. | Values are identical. |
| 6 | CI must verify the contract on **Linux**, **macOS**, and **Windows** runners. | All three platforms pass the verification gate. |
| 7 | Documentation (`README.md` and `docs/cli.md`) includes a “V
...[truncated]

Must

  • Build gates from the stated acceptance behavior, such as checking ricky --version output.
  • Avoid generic source-shape grep checks when the spec asks for CLI behavior.
  • Keep generated validation meaningful for the current repo shape.

Must Not

  • Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
  • Claim the workflow is proven by source syntax alone.
  • Ignore the package-json version resolution order in the spec.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation‑Fix‑Rerun Loop”**  
*Purpose*: Process a large set of independent artifacts (e.g., data files, code modules, model checkpoints) through deterministic validation, automatic fixing where possible, and controlled re‑execution until all pass or a blocker is reported.

---

## 1. Scope & Boundaries
| Aspect | Definition |
|--------|------------|
| **Artifacts** | Any file/object that can be validated independently (e.g., `*.json`, `*.py`, `*.ckpt`). |
| **Batch size** | Configurable; default 100 artifacts per run to keep CI resources bounded. |
| **Branch/PR** | All changes are made on a feature branch `batch‑validate‑<timestamp>`. A single PR aggregates the final “all‑pass” state. |
| **Owner** | `@team‑validation` (code owners). |
| **Blocker policy** | If > 5 % of artifacts hit a non‑auto‑fixable error, the workflow aborts and raises a blocker issue. |

---

## 2. Contract (Inputs / Outputs)

| Item | Type | Description |
|------|------|-------------|
| `artifact_list` | JSON array of `{path:string, type:string}` | Supplied by upstream CI step or manual trigger. |
| `validation_rules` | YAML file | Maps `type → rule_set` (e.g., `json: schema_v2`
...[truncated]

Must

  • Choose dag, supervisor, or pipeline deliberately based on the work shape.
  • Explain the pattern choice in artifact metadata or a rationale.
  • Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

  • Default blindly to dag for every workflow.
  • Collapse independent artifact work into one vague agent task.
  • Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow  

Below is a concise “skill‑impact map” that shows the direct effect each of the core Ricky writing skills has on the structure, reliability, and reviewability of a generated workflow. The map is illustrated with a concrete example: a **“Customer‑Onboarding Email Sequence”** workflow that moves a new user from sign‑up to first‑login.

| **Skill** | **What the Skill Enforces** | **Resulting Artifact in the Workflow** | **Why It Improves Reliability** |
|-----------|----------------------------|----------------------------------------|---------------------------------|
| **Deterministic Contract Definition** | Every step declares **input schema**, **output schema**, and **pre‑conditions** in a JSON‑Schema block. | ```yaml\nsteps:\n  - id: fetch_user\n    input: {type: object, required: [user_id]}\n    output: {type: object, required: [email, name]}\n``` | Guarantees that downstream steps receive exactly what they expect; type‑mismatches are caught at compile‑time. |
| **Verification Gates** | After each step a **gate** runs a lightweight test (e.g., schema validation, API‑status check). | ```yaml\n  - id: veri
...[truncated]

Must

  • Describe skills as generation-time selection, loading, and rendering inputs.
  • Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
  • Say the workflow instructions were informed by selected skills.

Must Not

  • Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
  • Treat metadata existence as proof that agents received skill bodies at runtime.
  • Overstate current tests beyond the generation boundary.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

Below is a **draft workflow** that turns a high‑level product idea into a shippable feature while satisfying the AgentWorkforce reliability standards.  
Because the specification you provided is vague, the workflow contains **parameterised placeholders** (`{{…}}`) that you can fill in with the concrete details of your product (e.g., component names, acceptance criteria, external dependencies).  

---  

## 1️⃣ Workflow Overview  

| Stage | Owner | Primary Artifact | Success Criteria |
|-------|-------|------------------|------------------|
| **1. Initiation** | Product Owner | `spec.md` (filled‑in spec) | Spec approved by PO & Architecture Review (AR) |
| **2. Design** | System Architect / Lead Engineer | `design.md` (architecture diagram, data model, API contracts) | Design sign‑off; no open “high‑severity” architectural risks |
| **3. Implementation** | Development Team | Feature branch `feat/{{feature_name}}` | All unit tests pass; static analysis score ≥ 90% |
| **4. Verification** | QA Lead | `test-plan.md`, automated test suite | ≥ 80 % automated coverage; all integration tests pass in CI |
| **5. Review & Merge** | Peer Reviewers, Release Engineer | Pull Request `PR-{{id}}`
...[truncated]

Must

  • Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
  • Validate with dry-run or targeted structural checks where possible.
  • Return artifacts, warnings, and follow-up commands honestly.

Must Not

  • Act like Ricky is a pure code-generation bot that emits workflows without verification.
  • Stop at "code compiles" as the proof bar.
  • Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s Immediate Action Plan**

| Phase | Action | Rationale | Deliverable |
|-------|--------|-----------|-------------|
| 1️⃣ Detect & Log | • Capture the timeout event (step X) and the “alive‑but‑idle” worker (step Y). <br>• Write a structured log entry to the workflow‑run audit trail (JSON with `run_id`, `step_id`, `timestamp`, `error_type`). | Provides an immutable record for post‑mortem and for automated alerting. | `audit.log` entry + alert to `#wf‑ops`. |
| 2️⃣ Blocker Report | • Mark the run as **BLOCKED** in the workflow state store. <br>• Auto‑generate a **Blocker Ticket** (Jira/Linear) with: <br>  – Run ID, failed step, timeout value, worker ID, last heartbeat, missing artifacts list. <br>  – Severity = **High** (no output, downstream steps dead‑locked). | Guarantees visibility to the reliability team and prevents downstream pipelines from consuming incomplete data. | Ticket `WF‑BLOCK‑<run_id>` (link attached to run metadata). |
| 3️⃣ Safe‑Stop & Cleanup | • Issue a **graceful termination** command to the idle worker (via the orchestration API). <br>• If the worker does not acknowledge within 30 s, force‑kill and release its resources. <br>• Delete any partial tempor
...[truncated]

Must

  • Classify the failure before deciding whether to retry.
  • Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
  • Preserve raw evidence and uncertainty when the class is not known.

Must Not

  • Blindly rerun the whole workflow without a blocker classification.
  • Treat every failure as a broken workflow definition.
  • Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow Contract – Pre‑Launch Stale‑State Guard**

| **Stage** | **Input** | **Output** | **Success Criteria** |
|-----------|-----------|------------|----------------------|
| **1. Scan** | Project root (any cwd) | List of candidate directories (`.agent‑relay/`, `.relay/`, `.trajectories/`) with timestamps | All three directories (if present) are enumerated |
| **2. Age‑Check** | Directory list + `max‑age‑hrs` (default = 24 h) | Boolean `staleDetected` + map `{path → ageHours}` | `staleDetected` = true iff any directory’s *modification time* > `max‑age‑hrs` |
| **3. Prompt** | `staleDetected` & age map | User decision (`clean`, `ignore`, `abort`) | Decision captured via CLI prompt or CI non‑interactive flag |
| **4. Act** | Decision + paths | Cleaned directories (if `clean`), or pass‑through (if `ignore`), or early exit (if `abort`) | Filesystem state matches decision; exit code 0 for clean/ignore, non‑zero for abort |
| **5. Log** | All prior steps | Structured JSON log (`workflow‑preflight‑state.json`) | Log written to project root, parsable by CI dashboards |

---

## Deterministic Verification Gates

| **Gate** | **Condition** | **Result** |
|----------|---------------|----
...[truncated]

Must

  • Classify stale local runtime state as an environment contamination issue.
  • Recommend quarantine or isolated-run guidance before launch.
  • Record the observed paths and the action taken or recommended.

Must Not

  • Treat stale runtime state as a workflow logic failure.
  • Delete or overwrite state without an explicit safe path or user intent.
  • Continue into execution as if the workspace were clean.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Fixing “Another Ricky or Relay run is already active in this workspace”**

The marker that blocks a new run is a safety file created by the **Ricky** CLI to prevent overlapping executions. When the file is left behind (e.g., a crash, a killed process, or a stale CI container) the next invocation aborts with the message you’re seeing.

Below is the deterministic, step‑by‑step contract for clearing the marker safely, verifying that the workspace is clean, and re‑establishing a reliable run environment.

---

## 1️⃣ Verify the Current State (Read‑Only)

```bash
# Show the marker status
ricky status --verbose

Expected output – either:

  • No active run detected → you’re already clear (skip to §4).
  • Active run marker found: <path> → proceed.

If the CLI reports a PID, note it – you may need to kill a stray process.


2️⃣ Locate the Marker File

Ricky stores the lock in the workspace root under .ricky/run.lock (or .relay/run.lock for Relay). Confirm its existence:

ls -l .ricky/run.lock   # or .relay/run.lock

The file contains a JSON payload:

{
  "pid": 12345,
  "started_at": "2026-05-15T14:32:10Z",
  "command": "ricky run ..."
}

...[truncated]


**Must**

- Report the active marker, run id, or status path when available.
- Ask the user to inspect, wait for, or explicitly clear the active run.
- Avoid launching a competing run that could corrupt evidence.

**Must Not**

- Silently start another run.
- Hide the existing run marker from the user.
- Treat the conflict as a generic failure with no recovery path.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.env-loader-injection-runtime-loadable</code> (runtime-recovery/manual)</summary>

**User message**

> A workflow artifact references a `MISSING_ENV_VAR` value. Ricky's deterministic auto-fix injects the `.env.local` / `.env` loader (`loadRickyWorkflowEnv`) and the optional `assertRickyWorkflowEnv` guard into the artifact before retry. The artifact may be a master-rendered workflow whose `.step({ command: ... })` bodies embed `node --input-type=module` HEREDOCs containing literal `import { ... } from 'node:fs'` / `from 'node:path'` strings.

**Ricky output**

```text
**Deterministic auto‑fix for a workflow artifact that references `MISSING_ENV_VAR`**

Below is a step‑by‑step contract that you can apply to any workflow artifact (including a master‑rendered workflow whose `.step({ command: … })` bodies embed `node --input-type=module` HEREDOCs). The fix injects the environment‑loader `loadRickyWorkflowEnv` and the optional guard `assertRickyWorkflowEnv` **before the workflow is re‑executed**.

---

## 1️⃣ Preconditions (Verification Gate)

| Check | Expected | Action if not met |
|------|----------|-------------------|
| Artifact is a **JavaScript/TypeScript** workflow file (`*.js`, `*.ts`, or a rendered `.step` string) | Yes | Abort – use the appropriate language‑specific fixer. |
| Artifact contains a literal reference to `process.env.MISSING_ENV_VAR` (or any `process.env.*` that resolves to `undefined` at validation time) | Yes | Proceed. |
| `loadRickyWorkflowEnv` and `assertRickyWorkflowEnv` are **available** in the repository (e.g., `src/rickyEnv.ts`) | Yes | Proceed. |
| No existing top‑level import of `loadRickyWorkflowEnv`/`assertRickyWorkflowEnv` | Yes | Proceed. |
| The workflow is **not already wrapped** in a try/catch that swallows e
...[truncated]

Must

  • Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected loadRickyWorkflowEnv body references rickyWorkflowFs.* and rickyWorkflowPath.*, so the repair must also add the corresponding import * as rickyWorkflowFs from 'node:fs' and import * as rickyWorkflowPath from 'node:path' aliases at module top level.
  • Detect existing alias imports by matching real top-level import * as <alias> from '<module>' statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in .step({ command: ... }) bodies do not count as imports).
  • Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.

Must Not

  • Skip adding the rickyWorkflowFs / rickyWorkflowPath aliases because from 'node:fs' or from 'node:path' already appears somewhere in the file as a string literal.
  • Inject loadRickyWorkflowEnv (or assertRickyWorkflowEnv) without the supporting alias imports, which produces a ReferenceError: rickyWorkflowPath is not defined at module load and burns the auto-fix budget on UNSUPPORTED_RUNTIME at runtime-launch.
  • Rewrite or escape the embedded HEREDOC text in step commands.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Mode**

Below is a deterministic, contract‑driven procedure you can run in a terminal (bash, zsh, PowerShell 7+, or any POSIX‑compatible shell). It assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failing workflow artifact is stored locally in `./artifacts/<workflow‑id>.json`.

---

## 1️⃣ Preconditions (Verification Gate)

| Check | Command | Expected outcome |
|-------|---------|------------------|
| **CLI availability** | `aw --version` | Prints a version ≥ 2.4.0 |
| **Workflow definition present** | `test -f ./artifacts/<workflow-id>.json && echo ok` | `ok` |
| **Previous run metadata** | `cat ./runs/<prev‑run‑id>.meta.json` | Valid JSON with `runId`, `status`, `failedStep` fields |
| **Auto‑fix feature enabled** | `aw config get autoFix.enabled` | `true` |

If any gate fails, abort and report the blocker before proceeding.

---

## 2️⃣ Load Context

```bash
# Variables – replace placeholders
WORKFLOW_ID="my‑workflow‑123"
PREV_RUN_ID="run‑20240530‑01"
ARTIFACT_PATH="./artifacts/${WORKFLOW_ID}.json"
PREV_META="./runs/${PREV_RUN_ID}.meta.json"

# Extract the failed step name from previous metadata
FAILED_STEP=$(jq -r 
...[truncated]

Must

  • Use a bounded retry budget and summarize every attempt.
  • Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
  • Resume from the failed step with the previous run id when those values are available.

Must Not

  • Edit arbitrary repository source files as the default auto-fix surface.
  • Keep retrying after the configured max attempts.
  • Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Quick‑Start Debug Checklist**

| Step | Command / Action | What to Look For |
|------|------------------|-----------------|
| 1️⃣  | `ricky logs workflows/foo.ts --tail 200` | Recent stdout/stderr, stack traces, and any **Ricky‑generated verification gate** failures. |
| 2️⃣  | `ricky run workflows/foo.ts --debug` | Interactive mode prints each **gate‑result** (`PASS/FAIL`) and the **payload** that caused the failure. |
| 3️⃣  | Open the workflow file | Verify the **contract block** at the top of `foo.ts` matches the current code (type signatures, required env vars, input schema). |
| 4️⃣  | Run the failing gate in isolation | `ricky gate test <gate‑id> --payload <json‑file>` – reproduces the exact failure without re‑executing the whole workflow. |
| 5️⃣  | Check the **review artifact** (if any) | `ricky artifacts foo.ts` – look for the latest `review‑<sha>.md` that contains the automated reviewer’s notes. |
| 6️⃣  | Validate external dependencies | `ricky deps status` – ensures all services, secrets, and Docker images referenced by the workflow are reachable and at the expected version. |
| 7️⃣  | Run the **deterministic verification suite** | `ricky verify workflows/foo.ts` – e
...[truncated]

Must

  • Preserve one-attempt behavior when auto-fix is disabled.
  • Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
  • Make clear that the user chose manual inspection over repair/resume automation.

Must Not

  • Start a repair loop despite --no-auto-fix.
  • Suppress the diagnosis because no repair was attempted.
  • Present the failure as a completed repair attempt.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)

User message

Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

Ricky output

**Ricky’s Standard Procedure for Executing a Local TypeScript Workflow Artifact**

Below is the concrete, repeat‑able contract that Ricky follows whenever a developer asks to run a TypeScript‑based workflow that lives in the *primary local path* (by default `./workflows`). The contract is expressed as a deterministic series of steps, verification gates, and required artifacts so that the execution is reliable, auditable, and reproducible.

---

## 1. Preconditions (Input Contract)

| Item | Requirement | Source |
|------|-------------|--------|
| **Workflow source** | A single entry‑point file `*.workflow.ts` (or `index.ts`) located under `./workflows` (or a sub‑directory that is explicitly passed). | Developer |
| **Node runtime** | Node ≥ 18.x installed and available on `$PATH`. | System |
| **Package manager** | `npm` ≥ 9.x (or `pnpm`/`yarn` if the repo is locked to one). | System |
| **TypeScript compiler** | `typescript` listed in `devDependencies`. | `package.json` |
| **ts‑node** | `ts-node` listed in `devDependencies`. | `package.json` |
| **Workflow config** | Optional `workflow.config.json` next to the entry point (used for env, secrets, args). | Developer |
| **Clean sta
...[truncated]

Must

  • Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
  • Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
  • Record the actual spawn command in execution evidence.

Must Not

  • Fail solely because agent-relay is not on PATH when the SDK route is available.
  • Hide the actual runtime command from evidence.
  • Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky’s Boundary‑Escalation Workflow for Persistent Structural Failures**

When a structural failure remains after an initial fix attempt, Ricky follows a deterministic, contract‑driven escalation path. The process is bounded, auditable, and designed to surface the blocker to the appropriate governance layer without dead‑locking the pipeline.

---

## 1. Preconditions (Contract Checks)

| Condition | Expected State | Action if Not Met |
|-----------|----------------|-------------------|
| **Failure‑Ticket** | Exists in the **`failures`** collection with status `OPEN` and `type: STRUCTURAL` | Abort – create a new ticket (see §2) |
| **Fix‑Attempt Record** | Latest entry in `fix_attempts` linked to the ticket, status `COMPLETED` | Continue |
| **Verification Gate** | All deterministic tests for the attempted fix have **FAILED** (gate `STRUCTURAL_FIX_VERIFIED = false`) | Proceed to escalation |
| **Branch Context** | Working branch is a **feature‑fix** branch scoped to the failing component (e.g., `fix/structural‑<component>-<id>`) | Continue |
| **PR State** | No open PR for the fix‑attempt branch, or PR is **MERGED** but verification still fails | Continue |

If any precondition f
...[truncated]

Must

  • Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
  • Distinguish escalation from a generic product failure.
  • Preserve enough context for a human operator to continue.

Must Not

  • Retry speculative fixes indefinitely.
  • Collapse to "something went wrong" without the attempted actions.
  • Discard evidence from failed repair attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – Summary (Generated 2026‑05‑16)**  

| Metric | Value | Target / Threshold |
|--------|-------|---------------------|
| **Total Runs (last 30 days)** | 4 872 | — |
| **Successful Runs** | 4 531 (93.0 %) | ≥ 90 % |
| **Failed Runs** | 241 (5.0 %) | ≤ 5 % |
| **Blocked Runs** | 100 (2.0 %) | 0 % (must be investigated) |
| **Average Runtime** | 3 m 12 s | ≤ 5 m |
| **95th‑percentile Runtime** | 7 m 04 s | ≤ 10 m |
| **Deterministic‑Gate Pass Rate** | 4 412 / 4 531 (97.4 %) | ≥ 95 % |
| **Review‑Artifact Completion** | 4 389 / 4 531 (96.9 %) | ≥ 95 % |
| **Validation‑Loop Iterations (avg.)** | 84 | 80‑100 |
| **Open Blockers (as of today)** | 3 | 0 |

---

### 1. Run‑Level Health Highlights
| Run ID | Status | Runtime | Deterministic Gate | Review Artifact | Validation Loops | Blocker |
|--------|--------|---------|--------------------|----------------|------------------|---------|
| `wf‑20260501‑a1b2c3` | ✅ Success | 2 m 45 s | ✅ Pass | ✅ Completed | 78 | — |
| `wf‑20260502‑d4e5f6` | ❌ Fail (test‑suite) | 4 m 12 s | ✅ Pass | ✅ Completed | 82 | — |
| `wf‑20260503‑g7h8i9` | ⏸ Blocked (missing secret) | 0 m 00 s | — | — | 0 | **Missing `AWS_ACCESS_KEY`** |
...[truncated]

Must

  • Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
  • Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
  • Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

  • Mutate evidence while analyzing it.
  • Return generic advice like "improve your workflow" without references.
  • Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.preserve-pr-shipping-during-repair (runtime-recovery/manual)

User message

A persona-authored workflow that imports @agent-relay/github-primitive and uses createGitHubStep to open a PR fails its runtime-precheck. The auto-fix repair persona is invoked. Describe the contract the repaired artifact must satisfy.

Ricky output

**Repaired Workflow Contract**

| **Section** | **Requirement** | **Verification Gate** | **Pass‑Criteria** |
|-------------|----------------|-----------------------|-------------------|
| **1. Imports** | The workflow must import the **exact** package name `@agent‑relay/github-primitive` and **only** the symbols it needs. | *Static import lint* – `npm‑run lint:imports` | ‑ No missing or miss‑spelled package name.<br>‑ No unused imports.<br>‑ The import statement resolves to a published version of the package (semver `^` range allowed). |
| **2. `createGitHubStep` Usage** | The step that opens a PR must be created with `createGitHubStep` **once** and must supply a **complete** `GitHubStepConfig` object. | *Schema validation* – `npm‑run validate:github‑step` | ‑ `action: "pull_request"` (or the canonical enum value).<br>‑ `repo`, `baseBranch`, `headBranch`, `title`, `body` are all present and non‑empty strings.<br>‑ Optional fields (`reviewers`, `assignees`) are arrays of strings if provided.<br>‑ No extra keys beyond the schema. |
| **3. Runtime‑Precheck Compatibility** | The workflow’s **precheck** (`npm‑run precheck`) must succeed. This includes: <br>• All required environment va
...[truncated]

Must

  • Preserve the @agent-relay/github-primitive import, the GitHubStepExecutor reference, and every createGitHubStep(...) invocation that the original artifact declared.
  • Keep the failing workflow runnable from the same path with the same workflow(...) builder and .run({ cwd: process.cwd() }) invocation.
  • Retain at least ceil(N / 2) of the original workflow's .step(...) calls when the original declared four or more steps; a repair that collapses to a 2-3 step placeholder is a regression, not a fix.
  • Reject (do not apply) any repair output whose step list reduces to prepare-context / runtime-precheck: true / final-signoff: echo placeholder; surface the regression diagnostic instead.

Must Not

  • Strip createGitHubStep, GitHubStepExecutor, or @agent-relay/github-primitive because the runtime-precheck failure mentioned PR-shipping or git side effects. The repair contract's "no commit / no push" constraint applies to the REPAIR AGENT's runtime behavior, not to the workflow's step declarations.
  • Emit a "minimal repair-safe master" or "simplified Ricky master" scaffold that passes the builder validator while doing none of the original work.
  • Treat "the workflow now builds and runs" as success when the work it was supposed to ship is gone.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

## Ricky Slack Surface – “Workflow Debugger”

### 1. Overview
The **Ricky Slack Debugger** is a dedicated Slack app surface that lets engineers and product owners interact with Ricky‑managed workflows in real‑time, inspect state, step through actions, and surface deterministic verification results. It follows the **Ricky repository conventions** (AGENTS.md) and the **workflow standards** for contracts, verification gates, and review artifacts.

| Component | Slack UI Element | Ricky Contract | Verification Gate |
|-----------|------------------|----------------|-------------------|
| **Debug Session Launcher** | `/ricky debug <workflow‑id>` (slash command) | `DebugLaunchRequest` | `WorkflowExists`, `UserHasPermission` |
| **State Viewer** | Modal with tabs (Current State, History, Variables) | `DebugStateResponse` | `StateSerializable`, `NoSensitiveDataLeak` |
| **Step‑Through Controls** | Buttons: *Step*, *Continue*, *Pause*, *Jump to…* | `DebugControlRequest` / `DebugControlResponse` | `ActionAllowed`, `DeterministicOutcome` |
| **Verification Panel** | Section in modal showing latest verification gate results (pass/fail, logs) | `VerificationReport` | `GatePasses`, `ReportSigned
...[truncated]

Must

  • Treat Slack as a co-equal product surface, not a wrapper around CLI.
  • Route domain work through

Comment truncated to stay within GitHub limits. Download the ricky-eval-run artifact for the full human-review.md.

Addresses both substantive findings on #119:

- devin BUG_0001: `successMarker` was the only dynamic value in
  `buildFinalReviewPassGateCommand` not passed through `shellQuote()`.
  Current callers pass the safe constant `RICKY_CHILD_FRESH_EYES_LOOP_READY`,
  but this is an exported shared builder — preserving the quoting
  discipline keeps a future caller from accidentally injecting shell
  metacharacters via the option. Quoted via the existing helper that
  every other dynamic value already uses (`blockedPath`, marker,
  evidence message, missingDetail).

- coderabbit P1: `set -e` was active and a failing `cat` (file removed
  between the `-f` check and the read, file not readable, etc.) would
  terminate the script before `exit 3` — regressing blocked routing
  back to a generic, non-attributable exit. Wrapped the cat in
  `if ! cat ... >&2; then echo "<marker>: unable to read evidence" >&2; fi`
  so a failed read emits a fallback observability marker on stderr but
  the unconditional `exit 3` always runs.

Skipped the coderabbit "Heavy lift" suggestion to replace
`indexOf`/regex assertions in the test file with a shell-AST parser
walk. The existing assertions are checking lexical ordering of fixed
substrings in a short script we authored ourselves; pulling in a shell
parser dependency for this case is more risk than the textual checks
are worth.

Coverage: 2 new tests in `final-review-gate.test.ts` reproducing the
metacharacter-injection scenario (successMarker = `DONE $(touch /tmp/pwned)`
must round-trip through shellQuote) and asserting the cat guard's `if !
... ; then ... fi; exit 3` shape. Existing "echoes the success marker
last" assertion updated for the new quoted form.

7 of 7 final-review-gate tests pass (was 5 + 2 new + 1 updated).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kjgbot
Copy link
Copy Markdown
Contributor

kjgbot commented May 16, 2026

Addressed both substantive findings.

devin BUG_0001 @ line 81successMarker was the only dynamic value in buildFinalReviewPassGateCommand not passed through shellQuote(). Current callers pass the safe constant but this is an exported shared builder, so the inconsistency would let a future caller inject shell metacharacters via the option. Quoted via the same helper every other dynamic value already uses.

coderabbit P1 @ line 70set -e + failing cat could terminate the script before exit 3 (file removed between -f and read, file not readable, etc.) and regress blocked routing back to a generic, non-attributable exit. Wrapped the cat in if ! cat ... >&2; then echo "<marker>: unable to read evidence" >&2; fi so a failed read still emits an observable marker on stderr and the unconditional exit 3 always runs.

Skipped the coderabbit "Heavy lift" suggestion to replace indexOf/regex assertions with a shell-AST parser walk in the test file. The existing assertions check lexical ordering of fixed substrings in a short script we author ourselves; pulling in a shell parser dependency for this case is more risk than the textual checks are worth.

2 new tests: one reproducing the metacharacter-injection scenario (successMarker = DONE $(touch /tmp/pwned) round-trips through shellQuote), one asserting the cat guard's if ! ... ; then ... fi; exit 3 shape. 7 / 7 final-review-gate tests pass. Branch updated at 3fe04a5.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/product/generation/final-review-gate.test.ts (1)

29-36: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Replace shell-string substring/regex assertions with parser-based AST assertions.

These tests still validate command structure via indexOf, toContain, endsWith, and regex over raw shell text. Please parse the generated shell script with a shell AST parser and assert node presence/order (blocked check, grep checks, guarded cat, exit code, success echo) from the AST instead of text heuristics.

As per coding guidelines, "Use parser-based approaches (AST walk with typescript module, mdast-util-from-markdown) instead of regex or substring matching when inspecting TypeScript, JavaScript, Markdown, JSON, or shell artifacts in Ricky source code" and "Do not use regex, substring matches, line-anchored patterns, or comment-stripping heuristics to detect presence or absence of code patterns (imports, declarations, etc.); use AST walking instead".

Also applies to: 40-43, 50-57, 63-64, 68-69, 77-77, 82-82

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/product/generation/final-review-gate.test.ts` around lines 29 - 36, The
test currently inspects the generated shell string using substring/regex
(indexOf) — replace this with a shell AST-based assertion: call gate(), parse
its returned shell script with a shell AST parser (e.g., a sh/bash AST walker)
and then assert the presence and order of AST nodes for the BLOCKED sentinel
check, the grep -qF RICKY_CHILD_CLAUDE_FINAL_FIX_READY checks, the guarded cat
invocation, the exit with non-zero when blocked, and the success echo; update
the test around the gate() invocation to walk the AST and verify node
types/arguments and their sequence rather than using toContain/indexOf/regex.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@src/product/generation/final-review-gate.test.ts`:
- Around line 29-36: The test currently inspects the generated shell string
using substring/regex (indexOf) — replace this with a shell AST-based assertion:
call gate(), parse its returned shell script with a shell AST parser (e.g., a
sh/bash AST walker) and then assert the presence and order of AST nodes for the
BLOCKED sentinel check, the grep -qF RICKY_CHILD_CLAUDE_FINAL_FIX_READY checks,
the guarded cat invocation, the exit with non-zero when blocked, and the success
echo; update the test around the gate() invocation to walk the AST and verify
node types/arguments and their sequence rather than using
toContain/indexOf/regex.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 96fb26f0-6690-451f-917b-ca1e40437f4b

📥 Commits

Reviewing files that changed from the base of the PR and between 0d25f4f and 3fe04a5.

📒 Files selected for processing (2)
  • src/product/generation/final-review-gate.test.ts
  • src/product/generation/final-review-gate.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/product/generation/final-review-gate.ts

@khaliqgant khaliqgant merged commit 5b66d9c into main May 16, 2026
4 checks passed
@khaliqgant khaliqgant deleted the fix/final-review-gate-opaque-blocked-failure branch May 16, 2026 06:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants