Skip to content

fix: harden auto-salvage gates#130

Merged
kjgbot merged 1 commit into
mainfrom
fix/auto-salvage-gates
May 20, 2026
Merged

fix: harden auto-salvage gates#130
kjgbot merged 1 commit into
mainfrom
fix/auto-salvage-gates

Conversation

@kjgbot
Copy link
Copy Markdown
Contributor

@kjgbot kjgbot commented May 20, 2026

Summary

  • normalize annotated Target repo metadata before auto-salvage opens GitHub PRs
  • report already-shipped when a declared worktree path is gone but an open PR already exists for the target branch
  • replace targeted implementation file gates with diff-based evidence so broad/context target lists do not fail on test -f or generic source-shape grep checks

Why

The cloud pr-09/pr-10/pr-11 rerun on Ricky 0.1.73 produced implementation branches but all three blocked at post-implementation-file-gate. Auto-salvage also skipped PR-09 because the worktree record was prunable even though PR #724 already existed, and PR-11 tried to create a PR for the malformed repo name AgentWorkforce/\cloud` (mainly) + `relaycast` docs`.

Test plan

  • NODE_OPTIONS=... npx vitest run src/local/auto-salvage/spec-metadata.test.ts src/local/auto-salvage/run-auto-salvage.test.ts src/product/generation/pipeline.test.ts
  • NODE_OPTIONS=... npx vitest run test/generated-workflow-reliability-contract.test.ts src/local/auto-salvage/integration-smoke.test.ts
  • npx tsc --noEmit --pretty false --target ES2022 --module NodeNext --moduleResolution NodeNext --types node src/local/auto-salvage/spec-metadata.ts src/local/auto-salvage/run-auto-salvage.ts src/product/generation/template-renderer.ts

Known local typecheck note

Full npm run typecheck -- --pretty false is still blocked in this checkout by the existing missing @agent-relay/agent dependency in src/scheduled-agent.ts.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 854748d6-af61-486c-b319-3e1bf87db0f8

📥 Commits

Reviewing files that changed from the base of the PR and between cb0df00 and fdec34f.

📒 Files selected for processing (6)
  • src/local/auto-salvage/run-auto-salvage.test.ts
  • src/local/auto-salvage/run-auto-salvage.ts
  • src/local/auto-salvage/spec-metadata.test.ts
  • src/local/auto-salvage/spec-metadata.ts
  • src/product/generation/pipeline.test.ts
  • src/product/generation/template-renderer.ts

📝 Walkthrough

Walkthrough

This PR introduces GitHub PR detection in auto-salvage when local worktrees are missing, normalizes Target repo metadata values, and refactors workflow gate generation from inline shell commands to dedicated Node-based helpers that validate implementation changes via repository diff evidence.

Changes

Auto-salvage PR detection and metadata

Layer / File(s) Summary
Target repo metadata normalization
src/local/auto-salvage/spec-metadata.ts, src/local/auto-salvage/spec-metadata.test.ts
parseSpecMetadata normalizes extracted Target repo values using new normalizeRepoValue and isRepoToken helpers that handle inline-code forms, validate token shape against regex, strip punctuation, and reduce owner/repo to canonical repo slug form. New tests assert normalization of multi-token and Owner/Repo inputs.
GitHub PR detection for missing worktree paths
src/local/auto-salvage/run-auto-salvage.ts, src/local/auto-salvage/run-auto-salvage.test.ts
runAutoSalvage computes owner earlier and probes GitHub for existing open PRs when the target worktree does not exist. If a PR is found, it returns already-shipped status with the PR URL; otherwise it skips with worktree-path-missing. Redundant owner declarations in clean and dirty worktree branches are removed. New test verifies the already-shipped path does not trigger git.status checks.

Workflow gate evidence generation

Layer / File(s) Summary
Node-based gate command builders
src/product/generation/template-renderer.ts
buildGates refactors to route target-driven workflows through new buildTargetImplementationGateCommand and buildTargetGitDiffCommand helpers instead of inline shell logic. These emit Node scripts that gather tracked and untracked file changes via git diff --name-only and git ls-files --others --exclude-standard, filter out workflow artifact paths, and produce evidence files (implementation-file-gate.txt, git-diff.txt). Post-implementation and post-fix-verification gates now use the unified fileGateCommand builder.
Pipeline test validation for gate evidence
src/product/generation/pipeline.test.ts
Tests updated to assert repository diff evidence commands (diff --name-status, ls-files --others --exclude-standard) and gate markers (IMPLEMENTATION_FILE_GATE_OK, GIT_DIFF_GATE_OK). New regression test verifies post-implementation-file-gate uses repository diff evidence rather than filesystem test -f checks and does not include globbed target paths. Explicit-target git-diff-gate assertions updated to match new diff/ls-files evidence contract.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • AgentWorkforce/ricky#106: Updates workflow-generation gate construction in template-renderer.ts and pipeline test expectations.
  • AgentWorkforce/ricky#52: Modifies workflow gate generation logic in template-renderer.ts and pipeline.test.ts with focus on grep/git fallback handling.

Poem

🐰 With early owner and a GitHub glance,
We spot shipped PRs before local chance;
Repo slugs cleaned, gates flow as Node script light,
Evidence gathered, no artifacts in sight!
The worktree skips safe, the workflow takes flight. 🚀

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: harden auto-salvage gates' directly summarizes the main objective: hardening auto-salvage gates through normalization and diff-based evidence checks across the changeset.
Description check ✅ Passed The description comprehensively covers all major changes: metadata normalization, already-shipped detection, and replacement of file-existence gates with diff-based evidence, all directly related to the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/auto-salvage-gates

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kjgbot kjgbot merged commit fdb18f6 into main May 20, 2026
3 of 5 checks passed
@kjgbot kjgbot deleted the fix/auto-salvage-gates branch May 20, 2026 08:01
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

Open in Devin Review

@github-actions
Copy link
Copy Markdown
Contributor

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-20T07-57-32-953Z-HEAD-provider
Mode: provider
Git SHA: 416549f

Passed: 10 | Needs human: 44 | Reviewable: 44 | Missing output: 0 | Failed: 1 | Skipped: 0

Blocking Cases

FAIL workflow-authoring.fresh-eyes-loop-medium-source-and-test (workflow-authoring/ricky-cli)

User message

Generate a Ricky workflow that changes one source file and one test file for a CLI parsing bug, with scoped diff evidence and a targeted Vitest command.

Ricky output

[
  {
    "stage": "generate",
    "status": "ok",
    "artifact": {
      "path": "workflows/generated/ricky-generate-a-ricky-workflow-that-changes-one-sourc.ts",
      "workflow_id": "ricky-generate-a-ricky-workflow-that-changes-one-sourc",
      "spec_digest": "85568bcd48c9b628ee005f8ff5fb27b10c4f54d152245f551c7de2f48969a3cb",
      "target_files": [
        "src/surfaces/cli/flows/power-user-parser.ts",
        "src/surfaces/cli/flows/power-user-parser.test.ts"
      ]
    },
    "next": {
      "run_command": "ricky run workflows/generated/ricky-generate-a-ricky-workflow-that-changes-one-sourc.ts",
      "run_mode_hint": "ricky run workflows/generated/ricky-generate-a-ricky-workflow-that-changes-one-sourc.ts"
    },
    "decisions": {
      "skill_matches": [
        {
          "id": "relay-80-100-workflow",
          "name": "relay-80-100-workflow",
          "path": "/home/runner/work/ricky/ricky/.agents/skills/relay-80-100-workflow/SKILL.md",
          "confidence": 0.6,
          "reason": "Spec text mentions \"must\". Spec text mentions \"test\". Spec text mentions \"tests\".",
          "evidence": [
            {
              "trigger": "must",
              "source":
...[truncated]

Must

  • Preserve the Claude-then-Codex review/fix/final-review/final-fix order before final acceptance.
  • Keep deterministic file gates and scoped git diff --name-only / untracked-file checks limited to the declared source and test targets.
  • Feed review findings into fix steps and require fixers to harden tests when findings are testable.
  • Write review, fix, final-review, final-fix, validation, and signoff artifacts under .workflow-artifacts/.

Must Not

  • Use broad repo-wide change detection as the only proof.
  • Allow a single reviewer to rubber-stamp its own work without a distinct fresh-eyes pass.
  • Skip the Codex final review/fix loop because Claude already reviewed.
  • Move final hard validation before the Codex final fix.

Deterministic checks

  • PASS status: case executed
  • PASS ok: expected ok=true, got true
  • PASS contentIncludes:stage": "generate: expected content to include "stage": "generate"
  • PASS contentIncludes:status": "ok: expected content to include "status": "ok"
  • PASS contentIncludes:src/surfaces/cli/flows/power-user-parser.ts: expected content to include "src/surfaces/cli/flows/power-user-parser.ts"
  • PASS contentIncludes:src/surfaces/cli/flows/power-user-parser.test.ts: expected content to include "src/surfaces/cli/flows/power-user-parser.test.ts"
  • PASS contentIncludes:npx vitest run src/surfaces/cli/flows/power-user-parser.test.ts: expected content to include "npx vitest run src/surfaces/cli/flows/power-user-parser.test.ts"
  • PASS contentIncludes:git diff --name-only: expected content to include "git diff --name-only"
  • FAIL contentIncludes:git ls-files --others --exclude-standard: expected content to include "git ls-files --others --exclude-standard"
  • PASS contentIncludes:review-claude.md: expected content to include "review-claude.md"
  • PASS contentIncludes:final-review-codex.md: expected content to include "final-review-codex.md"
  • PASS contentIncludes:codex-final-fix.md: expected content to include "codex-final-fix.md"
  • PASS contentIncludes:dependsOn: ["final-fix-codex"]: expected content to include "dependsOn: ["final-fix-codex"]"
  • PASS contentMatches:\.step\("review-claude"[\s\S]*\.step\("fix-loop"[\s\S]*\.step\("final-review-claude"[\s\S]*\.step\("final-fix-claude"[\s\S]*\.step\("review-codex"[\s\S]*\.step\("fix-loop-codex"[\s\S]*\.step\("final-review-codex"[\s\S]*\.step\("final-fix-codex"[\s\S]*\.step\("final-review-pass-gate"[\s\S]*\.step\("final-hard-validation": expected content to match /.step("review-claude"[\s\S].step("fix-loop"[\s\S].step("final-review-claude"[\s\S].step("final-fix-claude"[\s\S].step("review-codex"[\s\S].step("fix-loop-codex"[\s\S].step("final-review-codex"[\s\S].step("final-fix-codex"[\s\S].step("final-review-pass-gate"[\s\S]*.step("final-hard-validation"/i
  • PASS forbidPhrase:TypeError: content must not include "TypeError"
  • PASS forbidPhrase:ReferenceError: content must not include "ReferenceError"
  • PASS forbidPhrase:needs_clarification: content must not include "needs_clarification"
  • PASS maxToolCalls: expected <= 1 tool calls, got 1

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Scope of Change**  
- **Docs**: Update the README, `AGENTS.md`, and the “Agent Assistant Integration” section to describe the deeper coupling with the Agent Assistant (AA) library (e.g., shared state, callback hooks, and AA‑driven intent routing).  
- **Code**: Add a thin AA wrapper (`src/aa_wrapper.py`) and modify `ricky/__init__.py` and the main coordination loop to invoke AA for intent extraction, validation, and fallback handling.  
- **Tests**: Add unit tests for the new wrapper and integration tests that verify the AA‑driven path is exercised.  

All changes will be confined to the `feature/aa‑deep‑integration` branch and submitted as a single PR.

---

## 1. Workflow Contract (RFC)

| Item | Description |
|------|-------------|
| **Branch** | `feature/aa-deep-integration` (protected, requires PR review) |
| **PR Title** | `feat(ricky): integrate Agent Assistant more deeply` |
| **PR Description** | Summarize docs updates, new wrapper, modified coordination loop, and added tests. |
| **Verification Gates** | 1. **Static analysis** – `flake8` passes.<br>2. **Unit test coverage** – ≥ 90 % for new modules.<br>3. **Integration test** – runs the full workflow with a mock AA serv
...[truncated]

Must

  • Ground claims in real package imports and runtime paths.
  • Distinguish current implementation from target architecture.
  • Identify which Agent Assistant primitive is actually exercised.

Must Not

  • Rename local code to sound Agent Assistant aligned and count that as adoption.
  • Claim broad Agent Assistant native behavior from documentation-only alignment.
  • Blur target architecture with landed behavior.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review (as of 2026‑05‑20)**  

| Metric | Current State | Target / SLA | Gap / Action |
|--------|---------------|--------------|--------------|
| **Workflow contracts defined** | 12 of 18 active workflows include a `turn‑context` contract (67 %). | 100 % of production workflows | Identify the 6 missing workflows; add `turn‑context` schema (see Contract Template below). |
| **Deterministic verification gates** | 8/12 contracts have automated gate checks in CI (66 %). | 100 % gate coverage | Implement missing gate scripts for the 4 contracts lacking them. |
| **Review artifacts** | PRs for `turn‑context` changes contain review checklists 70 % of the time. | Checklist present on every PR | Update the PR template to enforce the “Turn‑Context Review Checklist”. |
| **Validation loops (80‑100)** | Average loops per workflow: 42. | ≥80 loops before merge to production | Introduce a “Loop‑Boost” job that auto‑generates additional synthetic turn scenarios until the loop count reaches 80. |
| **Blocker reporting** | 3 blockers logged in the last sprint; 2 were not escalated to the coordination channel. | 100 % blocker escalation within 1 h | Re
...[truncated]

Must

  • Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
  • Record compact provenance through generation decisions or coordinator metadata.
  • Keep the shared turn context internal to the adapter boundary.

Must Not

  • Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
  • Drop Ricky-specific workflow metadata during envelope assembly.
  • Treat turn context as a product decision engine.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation:** Move **all four** capabilities—workflow generation, validation, debugging, staged‑CLI UX, and blocker/evidence wording—into the **Agent Assistant** module.

---

## Rationale (aligned with AGENTS.md & workflow standards)

| Capability | Why it belongs in Agent Assistant | Impact on reliability & coordination |
|------------|-----------------------------------|--------------------------------------|
| **Workflow Generation** | Core “authoring” function; Agent Assistant already owns the *contract‑first* pattern for creating deterministic workflow specs. Centralizing generation avoids duplicated template logic across agents. | Guarantees a single source of truth for workflow skeletons; downstream agents consume a stable contract. |
| **Workflow Validation** | Validation gates (type‑checking, schema compliance, deterministic‑state‑transition tests) are deterministic checks that the Assistant already performs before handing a workflow to the executor. | Early failure detection → fewer runtime blockers; validation artifacts become part of the PR review checklist. |
| **Debugging (runtime & design)** | The Assistant already maintains the *debug‑trace* context (input → 
...[truncated]

Must

  • Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
  • Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
  • Make extraction follow typed, tested, live product proof.

Must Not

  • Generalize workflow-specific behavior prematurely.
  • Adopt moving shared seams merely for architectural purity.
  • Lose the precise local-first staged workflow UX.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target window: 12 weeks (Q3 2026)*  

---

## 1. Slice Goal
Deliver a production‑ready, self‑service adoption package that lets any engineering team onboard Ricky as a **workflow‑reliability coordinator** with zero‑touch integration into their CI/CD pipelines.

Success is measured by **≥ 2 pilot teams** running ≥ 5 distinct workflows each, with **≥ 95 % automated verification pass** and **≤ 1 critical blocker** reported.

---

## 2. Scope & Boundaries  

| Area | In‑Scope | Out‑of‑Scope |
|------|----------|--------------|
| **Core runtime** | Ricky engine v2.0, deterministic contract layer, built‑in verification gates | Legacy v1.x runtime (deprecation handled separately) |
| **CLI/SDK** | `ricky-cli` (installable via pip/npm), language‑agnostic SDK (Python, Go, TypeScript) | UI‑only web console (future slice) |
| **Workflow authoring** | Template repo `ricky‑templates/standard` (5 pre‑approved patterns) + contract schema `ricky-workflow.yaml` | Custom DSL extensions |
| **CI/CD integration** | GitHub Actions, GitLab CI, Azure Pipelines wrappers | Jenkins (planned later) |
| **Observability** | Structured logs, Promethe
...[truncated]

Must

  • Pick exactly one real shared seam to evaluate or adopt.
  • Define a live Ricky product path that will prove the adoption.
  • Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

  • Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
  • Skip the comparison/evaluation step for mature Ricky-local seams.
  • Treat adoption as successful without a live product-path proof.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky × Agent Assistant – Future Slack / Web Support Design  

Below is a **contract‑first workflow package** that can be dropped into the existing Ricky repository (see `AGENTS.md`). It defines a **Slack‑bot** and a **Web‑chat widget** built on the **Agent Assistant SDK**. The design follows the workflow standards: deterministic verification gates, review artifacts, 80‑100 validation loops, and clear blocker reporting.

---

### 1. High‑Level Architecture  

| Component | Responsibility | Tech Stack | Interaction |
|-----------|----------------|------------|-------------|
| **Ricky‑Slack‑Bot** | Real‑time Slack support, command routing, escalation to human agents | Node.js 14+, Bolt for Slack, Agent‑Assistant SDK | Listens to Slack events → invokes Agent Assistant services → posts replies |
| **Ricky‑Web‑Widget** | Embedded chat on internal portals, persistent session, UI/UX | React 18, TypeScript, WebSocket, Agent‑Assistant SDK | Loads widget → opens WS to backend → streams Agent Assistant messages |
| **Agent‑Assistant Service Layer** | Core reasoning, tool execution, policy enforcement | Python 3.11, FastAPI, LangChain, Redis cache | Exposes `/v1/execute` endpoint, determini
...[truncated]

Must

  • Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
  • Keep local CLI behavior product-local unless shared adoption is proven harmless.
  • Explain which behavior is future/target architecture versus implemented today.

Must Not

  • Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
  • Let future surface abstractions distort the current CLI contract.
  • Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What happened**

You invoked Ricky with the `--mode local` flag and supplied a spec, but you didn’t include the `--run` flag. In local mode the `--run` flag is required to actually generate and materialize the workflow; without it Ricky only parses the request and returns a dry‑run summary.

**How to fix it**

Run the command again, adding `--run` (or `-r`) at the end:

```bash
ricky --mode local --spec "generate a workflow for package checks" --run

What to expect

  • Ricky will parse the spec, resolve any required templates, and emit a concrete workflow contract in the ./ricky/workflows/ directory (or the path you’ve configured via --out-dir).
  • The generated workflow will include:
    1. Input validation gate – checks that the package list is non‑empty and that each package name matches the ^[a-z0-9._-]+$ regex.
    2. Deterministic verification steps – e.g., lint, unit-test, security-scan, each with explicit success/failure criteria.
    3. Review artifact – a markdown summary (workflow-summary.md) that lists all steps, required reviewers, and any blockers detected during generation.
    4. Validation loop scaffolding – a placeholder for the 80‑to
      ...[truncated]

**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding**

Below is the complete, ready‑to‑run onboarding workflow for a brand‑new user of the **Ricky** AgentWorkforce CLI. Follow the steps in order; each step is a deterministic gate that either succeeds (green) or fails (red) with an explicit error message. If any gate fails, stop, fix the reported blocker, and re‑run the failed step before proceeding.

---

## 1️⃣ Prerequisites (Verification Gate)

| Check | Command | Expected Output |
|-------|---------|-----------------|
| **Node ≥18** | `node -v` | `v18.x.x` or higher |
| **npm ≥9** | `npm -v` | `9.x.x` or higher |
| **Git** | `git --version` | `git version 2.x.x` |
| **Supported OS** | `uname -a` (Linux/macOS) or `systeminfo` (Windows) | Linux, macOS, or Windows 10+ |

> **If any check fails**, install the missing tool and re‑run the corresponding command until it passes.

---

## 2️⃣ Install the Ricky CLI (Deterministic Installation Gate)

```bash
# Global install (recommended)
npm i -g @ricky/agentworkforce-cli

Verification:

ricky --version

Expected: ricky-cli vX.Y.Z

If the version string is missing or malformed, uninstall (`npm uninstall -g @ricky/agentworkforce-cli
...[truncated]


**Must**

- Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
- End every branch with a concrete next step.
- Advertise only commands that are currently implemented.

**Must Not**

- Sound like a launch page or documentation dump.
- Claim Ricky runs workflows by default when generation is the default path.
- Require web or Slack onboarding before CLI use.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.recovery-guidance-no-stack-traces</code> (cli-behavior/manual)</summary>

**User message**

> A user gives Ricky an empty spec or a missing spec file.

**Ricky output**

```text
**Ricky’s Standard Response to an Empty or Missing Spec File**

---

### 1. Immediate Detection & Blocker Report  

| Step | Action | Output |
|------|--------|--------|
| **1.1** | Verify the presence of the spec file at the path declared in the workflow manifest (`workflow.yaml` → `spec:`). | If `stat` fails → **Blocker:** “Spec file `{{path}}` not found.” |
| **1.2** | If the file exists, read its contents and check for non‑whitespace characters. | If empty → **Blocker:** “Spec file `{{path}}` is empty.” |
| **1.3** | Emit a **GitHub Issue** (or Jira ticket) with the blocker label, linking the workflow PR/branch. | Issue title: *“Missing/Empty Spec for {{workflow_name}}”* |

> **Ricky always surfaces a blocker before any downstream validation** so that the pipeline halts deterministically.

---

### 2. Automated Placeholder Generation (Optional “quick‑fix” mode)

If the team has opted‑in to a *fallback stub* (controlled by the repo flag `workflow.allow_stub_spec: true`), Ricky can auto‑create a minimal spec skeleton:

```yaml
# {{path}} – autogenerated placeholder
name: {{workflow_name}}
description: "TODO: Add a proper description."
steps:
  - id: placeholder
    type: noop
   
...[truncated]

Must

  • Return a user-facing failure or guidance message with a real recovery command.
  • Distinguish generation failure from execution failure.
  • Show stack traces only when verbose diagnostic mode is requested.

Must Not

  • Crash with an uncaught exception in normal mode.
  • Suggest commands that do not exist.
  • Pretend a missing spec was accepted.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output (no provider checks succeeded)**  

=== AgentWorkforce – Ricky ===
🟡 Overall Status: PARTIAL (some services unavailable)

🔌 Provider Connections
• Google : DISCONNECTED – no successful auth/check detected
• GitHub : DISCONNECTED – no successful auth/check detected
• (other providers) : UNKNOWN – no checks run

🛠 Workflow Engine
• Scheduler : RUNNING
• Last Run : N/A (no jobs executed due to missing providers)

📊 Recent Checks
• Google check : FAILED (no valid token / API unreachable)
• GitHub check : FAILED (no valid token / API unreachable)

🚨 Blockers
• Google OAuth credentials missing or invalid.
• GitHub personal‑access token missing or invalid.
• Dependent workflows paused until providers are re‑connected.

📝 Next Steps

  1. Verify Google OAuth client ID/secret and re‑run ricky provider connect google.
  2. Verify GitHub PAT (with repo & workflow scopes) and re‑run ricky provider connect github.
  3. After successful connections, run ricky healthcheck to confirm.
  4. Re‑trigger any pending workflows with ricky workflow run <name>.

💡 Tips
• Use ricky provider status to see detailed tok
...[truncated]


**Must**

- Report unknown or not-connected provider state honestly.
- Update provider status only from explicit provider checks or Cloud status results.
- Give concrete setup guidance for Cloud when relevant.

**Must Not**

- Mark Google or GitHub connected because guidance text was shown.
- Invent a provider connection URL or OAuth flow.
- Show empty fields with no recovery guidance when config is missing.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.skill-matcher-registry-backed</code> (generation-quality/manual)</summary>

**User message**

> Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

**Ricky output**

```text
**Workflow: “GitHub Primitive Change”**

| Step | Owner | Action | Input | Output | Verification Gate |
|------|-------|--------|-------|--------|--------------------|
| 1️⃣ | **Ricky (Coordination)** | **Create a scoped branch** | Issue/feature description | New branch `feat/<short‑slug>` from `main` | `git checkout -b` succeeds; branch name matches regex `^feat\/[a-z0-9-]+$` |
| 2️⃣ | **Ricky (Reliability)** | **Run pre‑change lint & static analysis** | Source code at branch tip | Lint report (`.lint-report.json`) | All lint rules PASS (no `error` severity) |
| 3️⃣ | **Ricky (Authoring)** | **Apply the primitive change** | Code edit (e.g., bump version, add comment) | Updated files committed | `git diff --cached` shows exactly the intended diff; commit message follows template `feat: <short description>` |
| 4️⃣ | **Ricky (Reliability)** | **Execute unit‑test suite** | Updated code | Test results (`junit.xml`) | 100 % of tests PASS; any failure aborts workflow |
| 5️⃣ | **Ricky (Coordination)** | **Open a Pull Request (PR)** | Branch, title, description | PR #\<n\> in GitHub | PR created via API; label `primitive-change` applied |
| 6️⃣ | **Ricky (Reliability)** | **Automated PR 
...[truncated]

Must

  • Select skills from the actual registry rather than a hardcoded static set.
  • Record ranked skill matches with confidence and trigger evidence.
  • Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

  • Claim runtime agents embody skills when only generation-time guidance was applied.
  • Fail generation solely because optional skill files are missing.
  • Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow Title:** Claude‑Review → Codex‑Implement Pipeline  
**Owner:** AgentWorkforce (Ricky)  
**Version:** 1.0.0  
**Status:** Draft (ready for PR)

---

## 1. Overview
This deterministic pipeline orchestrates two LLM agents:

| Stage | Agent | Responsibility |
|-------|-------|----------------|
| **Review** | **Claude** (Anthropic) | Perform semantic, security, and style review of the incoming specification. |
| **Implementation** | **Codex** (OpenAI) | Generate production‑ready code based on Claude‑approved spec. |

The workflow is bounded to a single feature branch (`feature/claude‑review‑codex‑impl`) and produces three immutable artifacts: `review_report.json`, `generated_code.zip`, and `verification_log.txt`.

---

## 2. Contract (Inputs / Outputs)

| Item | Type | Description |
|------|------|-------------|
| `spec.yaml` | YAML | Feature specification submitted by the product owner. |
| `review_report.json` | JSON | Claude’s structured review (pass/fail, comments, risk tags). |
| `generated_code.zip` | ZIP | Codex‑produced source tree (compiled, linted). |
| `verification_log.txt` | TXT | Deterministic gate outcomes and loop counts. |

**Schema – `review_report.json`**


...[truncated]

Must

  • Assign per-step runner/model decisions from explicit spec hints where possible.
  • Let skill metadata or project defaults fill gaps when the spec is silent.
  • Write tool-selection.json or equivalent audit metadata.

Must Not

  • Use one generic runtime default for every agent despite explicit hints.
  • Let skill defaults override explicit user/spec runner hints.
  • Omit the reason each tool or model was chosen.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

Below is a compact “how‑to” for using the **`--refine`** option when you want to tighten a workflow’s step‑level task descriptions and acceptance‑gate definitions. The pattern follows the **Ricky** repository conventions (AGENTS.md, workflow standards, shared authoring rules).

---

## 1️⃣ When to invoke `--refine`

| Situation | Why you need refinement |
|-----------|------------------------|
| **Vague task text** (e.g., “process data”) | To make the intent deterministic and testable. |
| **Missing acceptance criteria** (no gate or gate is “manual review”) | To add concrete, automated verification gates. |
| **Over‑broad scope** (step covers multiple responsibilities) | To split into atomic sub‑steps with clear ownership. |
| **Repeated blocker reports** | To surface hidden dependencies and add pre‑conditions. |

If any of the above appear in a generated workflow, run the refinement pass.

---

## 2️⃣ Command syntax

```bash
ricky workflow generate <template> \
    --output ./workflows/my_new_flow.yml \
    --refine \
    --refine‑mode=task,gates \
    --max‑loops=80
  • --refine – activates the refinement engine.
  • --refine‑mode – pick one or both of:
    • task – rewr
      ...[truncated]

**Must**

- Keep the deterministic unrefined path as the default.
- Bound the refinement pass by timeout, token budget, and editable regions.
- Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

**Must Not**

- Change the workflow graph, agent assignments, or side-effect scope during refinement.
- Fail the whole generation if optional refinement times out.
- Hide warnings when the deterministic artifact is returned unchanged.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.behavior-grounded-gates</code> (generation-quality/manual)</summary>

**User message**

> Generate a workflow for the `ricky --version` spec.

**Ricky output**

```text
**Workflow: `ricky --version` Specification & Delivery**  
*Repository:* `github.com/agentworkforce/ricky`  
*Target Branch:* `main` → `feat/version-cli`  
*PR:* `feat/version-cli` → `main`  

---

## 1. Scope & Acceptance Criteria  

| # | Requirement | Acceptance Test |
|---|-------------|-----------------|
| 1 | `ricky --version` prints a semantic version string (`MAJOR.MINOR.PATCH`) matching the package version in `pyproject.toml`. | Run `ricky --version`; output matches regex `^\d+\.\d+\.\d+$` and equals `$(poetry version -s)`. |
| 2 | The command exits with status `0`. | `echo $?` after execution is `0`. |
| 3 | The version flag is discoverable via `ricky -h`/`--help`. | Help output contains `--version` entry. |
| 4 | The implementation is type‑checked, lint‑free, and covered by unit tests. | `mypy`, `ruff` pass; `pytest` coverage ≥ 90% for `cli/version.py`. |
| 5 | CI pipeline validates the contract on every push to the feature branch. | GitHub Actions job `verify-version-cli` succeeds. |
| 6 | Documentation in `docs/cli.md` includes a “Version” section with example output. | Markdown file contains heading `## Version` and sample block. |

---

## 2. Contract (YAML)  

```ya
...[truncated]

Must

  • Build gates from the stated acceptance behavior, such as checking ricky --version output.
  • Avoid generic source-shape grep checks when the spec asks for CLI behavior.
  • Keep generated validation meaningful for the current repo shape.

Must Not

  • Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
  • Claim the workflow is proven by source syntax alone.
  • Ignore the package-json version resolution order in the spec.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation‑Fix‑Rerun Loop”**  
*Purpose*: Process a large set of independent artifacts (e.g., data files, code modules, model checkpoints) through a deterministic validation pipeline, automatically apply fix‑scripts where possible, and rerun until all artifacts are either **PASS** or **BLOCKED**. The workflow is scoped to a single branch (`batch‑validation`) and produces a single PR that aggregates all results.

---  

## 1. Contract (YAML)

```yaml
name: batch_artifact_validation
description: |
  Validate, auto‑fix, and rerun a collection of independent artifacts.
scope:
  branch: batch-validation
  pr_target: main
inputs:
  artifacts_dir: ./artifacts          # root folder containing artifact sub‑folders
  config_file:   ./validation.yml    # per‑artifact validation config
  max_iterations: 5                  # safety guard against infinite loops
outputs:
  report: ./reports/validation_summary.json
  logs:   ./logs/
  artifacts_fixed: ./artifacts_fixed/   # copies of artifacts that were successfully fixed
gates:
  - name: iteration_limit
    type: hard
    condition: iteration <= inputs.max_iterations
  - name: all_passed_or_blocked
    type: soft
    
...[truncated]

Must

  • Choose dag, supervisor, or pipeline deliberately based on the work shape.
  • Explain the pattern choice in artifact metadata or a rationale.
  • Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

  • Default blindly to dag for every workflow.
  • Collapse independent artifact work into one vague agent task.
  • Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow  

| Skill | What the Skill Does | Direct Effect on the Generated Workflow | Example of the Resulting Artifact |
|-------|---------------------|----------------------------------------|-----------------------------------|
| **Deterministic Contract Definition** | Forces the author to write explicit input‑output schemas, versioned contracts, and type‑checked data models. | • Every step declares the exact shape of its payload (JSON‑Schema or Pydantic model). <br>• The workflow engine can validate messages before routing, preventing “shape‑mismatch” runtime errors. | `contract: { name: "UserCreated", schema: { id: "string", email: "string" } }` |
| **Verification Gate Insertion** | Inserts automated checkpoints (unit test, integration test, schema validation) that must pass before the workflow proceeds to the next stage. | • The generated DAG includes `gate: "verify_user_schema"` nodes that run a test suite. <br>• If a gate fails, the engine halts and reports a **blocker** instead of silently continuing. | `gate verify_user_schema { run: npm test --filter=user-schema }` |
| **Review Artifact Generation** | Auto‑
...[truncated]

Must

  • Describe skills as generation-time selection, loading, and rendering inputs.
  • Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
  • Say the workflow instructions were informed by selected skills.

Must Not

  • Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
  • Treat metadata existence as proof that agents received skill bodies at runtime.
  • Overstate current tests beyond the generation boundary.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

**Blocker – Insufficient Specification**

The request is to “generate a workflow from a vague product spec.” Because the source specification is undefined, I cannot produce a deterministic, verifiable workflow that meets our standards for:

1. **Concrete workflow contracts** – required input/output schemas, pre‑/post‑conditions, and success/failure codes.  
2. **Verification gates** – unit‑test suites, integration‑test matrices, and CI/CD checks.  
3. **Review artifacts** – design‑review checklist, security‑review sign‑off, and performance‑budget analysis.  
4. **Validation loops** – the mandated 80‑to‑100 iteration loop cannot be scoped without clear acceptance criteria.

**Next Steps to Resolve the Blocker**

| Action | Owner | Deliverable |
|--------|-------|-------------|
| Provide the product specification (or at least the core functional requirements, data model, and success criteria). | You (the requester) | A concise spec document (≈1‑2 pages) covering: <br>• Primary use case(s) <br>• Input payload schema <br>• Expected output schema <br>• Error handling expectations <br>• Performance / SLA targets |
| Identify any non‑functional constraints (security, compliance, runtime e
...[truncated]

Must

  • Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
  • Validate with dry-run or targeted structural checks where possible.
  • Return artifacts, warnings, and follow-up commands honestly.

Must Not

  • Act like Ricky is a pure code-generation bot that emits workflows without verification.
  • Stop at "code compiles" as the proof bar.
  • Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s Immediate Response to the Failure**

| Phase | Action | Rationale | Artifact |
|-------|--------|-----------|----------|
| 1️⃣ Detect & Log | • Capture the timeout event (step X) and the “alive‑but‑idle” worker (worker Y). <br>• Write a structured log entry with timestamps, step ID, worker ID, and error codes. | Provides an immutable audit trail for post‑mortem and for the verification gate. | `workflow‑run‑<run‑id>.log` |
| 2️⃣ Abort & Isolate | • Send a **graceful termination** signal to worker Y. <br>• If the worker does not acknowledge within 30 s, force‑kill it and mark the host as *quarantined* for the next health‑check cycle. | Prevents resource leakage and ensures no stray processes continue to consume capacity. | `worker‑termination‑report.json` |
| 3️⃣ Mark Step Failure | • Update the run state machine: set step X status = **FAILED** and propagate a **FAIL** flag to downstream steps. <br>• Record the timeout reason (`ERR_TIMEOUT`) and the worker‑termination outcome. | Guarantees deterministic downstream behavior – downstream steps will be skipped unless they are explicitly marked *optional*. | `run‑state‑<run‑id>.json` |
| 4️⃣ Trigger Recovery Gate | • Invoke th
...[truncated]

Must

  • Classify the failure before deciding whether to retry.
  • Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
  • Preserve raw evidence and uncertainty when the class is not known.

Must Not

  • Blindly rerun the whole workflow without a blocker classification.
  • Treat every failure as a broken workflow definition.
  • Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow Contract – Stale‑State Guard for Local AgentWorkforce Runs**  

| **Element** | **Definition** |
|-------------|----------------|
| **Scope** | Local execution of any `.agent/`‑based workflow on a developer workstation. |
| **Inputs** | 1. Working directory root (absolute path).<br>2. Optional `--force` flag. |
| **Outputs** | - `exit 0` – clean state, workflow may start.<br>- `exit 1` – stale state detected; detailed report written to `./.agent-relay/stale-report.json`. |
| **Pre‑conditions** | - The repository contains an `agent.yaml` (or `agent.yml`) manifest.<br>- The CLI version is ≥ 2.4.0 (supports guard). |
| **Post‑conditions** | - If clean, no side‑effects.<br>- If stale, the guard aborts the launch and leaves a reproducible remediation plan. |
| **Deterministic Verification Gate** | `ricky guard --verify` must return `PASS` before any `ricky run` is allowed. |
| **Review Artifact** | `stale-report.json` is auto‑generated and attached to the PR as a comment (via CI). |
| **Blocker Reporting** | Any detection of stale state is reported as a **hard blocker**; the workflow will not start until the user resolves the issue or supplies `--force`. |

---

## 1. Detecti
...[truncated]

Must

  • Classify stale local runtime state as an environment contamination issue.
  • Recommend quarantine or isolated-run guidance before launch.
  • Record the observed paths and the action taken or recommended.

Must Not

  • Treat stale runtime state as a workflow logic failure.
  • Delete or overwrite state without an explicit safe path or user intent.
  • Continue into execution as if the workspace were clean.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Fixing “Another Ricky/Relay run is already active in this workspace”**

The marker is a safety lock that prevents two Ricky/Relay executions from stepping on each other. It is created in the workspace root at **`.ricky/run.lock`** (or `.relay/run.lock` for Relay). When the file exists, the CLI assumes a live run is in progress.

Below is the deterministic, 5‑step repair workflow that satisfies the AgentWorkforce contract:

| Step | Action | Command | Verification Gate |
|------|--------|---------|--------------------|
| 1️⃣ | **Inspect the lock** – confirm it is stale. | `cat .ricky/run.lock` | Output contains a PID and a timestamp. |
| 2️⃣ | **Check the PID** – see if the process is still alive. | `ps -p <PID>` | If `ps` returns *no such process*, the lock is stale. |
| 3️⃣ | **Remove the stale lock** – only after confirming the PID is dead. | `rm -f .ricky/run.lock` | `test -f .ricky/run.lock && echo "still present"` must return *nothing*. |
| 4️⃣ | **Run a quick health check** – ensure the workspace is clean. | `ricky status --brief` | Should report *“idle”* and *no pending jobs*. |
| 5️⃣ | **Restart the intended run** – with optional `--force` flag if you want the CLI to igno
...[truncated]

Must

  • Report the active marker, run id, or status path when available.
  • Ask the user to inspect, wait for, or explicitly clear the active run.
  • Avoid launching a competing run that could corrupt evidence.

Must Not

  • Silently start another run.
  • Hide the existing run marker from the user.
  • Treat the conflict as a generic failure with no recovery path.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.env-loader-injection-runtime-loadable (runtime-recovery/manual)

User message

A workflow artifact references a MISSING_ENV_VAR value. Ricky's deterministic auto-fix injects the .env.local / .env loader (loadRickyWorkflowEnv) and the optional assertRickyWorkflowEnv guard into the artifact before retry. The artifact may be a master-rendered workflow whose .step({ command: ... }) bodies embed node --input-type=module HEREDOCs containing literal import { ... } from 'node:fs' / from 'node:path' strings.

Ricky output

**Deterministic auto‑fix for missing environment variables**

Below is the concrete, reproducible patch you can apply to any workflow artifact that currently fails because it references a `MISSING_ENV_VAR`. The fix does three things:

1. **Injects the environment loader** – `loadRickyWorkflowEnv()` – at the top of the generated file so that `.env.local` (or fallback `.env`) is parsed before any step runs.  
2. **Adds an optional guard** – `assertRickyWorkflowEnv()` – that throws a clear error if the required variable is still undefined after loading.  
3. **Leaves the existing step bodies untouched** – even when they contain HEREDOC‑style `node --input-type=module` snippets that embed literal `import { … } from 'node:fs'` or `from 'node:path'` strings.

---

## 1. Patch contract

| Section | Action | Deterministic guarantee |
|---------|--------|--------------------------|
| **Header** | Insert `import { loadRickyWorkflowEnv, assertRickyWorkflowEnv } from './rickyEnv';` | Guarantees the loader is available before any step code executes. |
| **Init block** | Call `loadRickyWorkflowEnv();` immediately after imports. | Ensures `.env.local` (or `.env`) is parsed exactly once per workfl
...[truncated]

Must

  • Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected loadRickyWorkflowEnv body references rickyWorkflowFs.* and rickyWorkflowPath.*, so the repair must also add the corresponding import * as rickyWorkflowFs from 'node:fs' and import * as rickyWorkflowPath from 'node:path' aliases at module top level.
  • Detect existing alias imports by matching real top-level import * as <alias> from '<module>' statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in .step({ command: ... }) bodies do not count as imports).
  • Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.

Must Not

  • Skip adding the rickyWorkflowFs / rickyWorkflowPath aliases because from 'node:fs' or from 'node:path' already appears somewhere in the file as a string literal.
  • Inject loadRickyWorkflowEnv (or assertRickyWorkflowEnv) without the supporting alias imports, which produces a ReferenceError: rickyWorkflowPath is not defined at module load and burns the auto-fix budget on UNSUPPORTED_RUNTIME at runtime-launch.
  • Rewrite or escape the embedded HEREDOC text in step commands.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Run**

Below is a deterministic, contract‑driven procedure you can copy‑paste into your terminal (or CI step) to run the workflow, trigger the auto‑fix path on the first failure, and verify the repaired artifact. It assumes you have the **Ricky CLI** (`ricky`) installed and that the workflow repository follows the standard `ricky/` layout.

---

## 1️⃣ Preconditions (contract)

| Item | Requirement | How to verify |
|------|-------------|---------------|
| **Workflow definition** | `workflow.yaml` present at repo root | `test -f workflow.yaml` |
| **Auto‑fix enabled** | `auto_fix: true` in `ricky/config.yaml` | `grep -A1 "auto_fix:" ricky/config.yaml` |
| **Failed step ID** | e.g. `step-3` (provided) | variable `FAILED_STEP` |
| **Previous run ID** | e.g. `run‑20240515‑a1b2c3` (provided) | variable `PREV_RUN_ID` |
| **Ricky version** | ≥ 2.4.0 (auto‑fix support) | `ricky --version` |
| **Local execution environment** | Docker daemon running, enough disk space | `docker info` |

If any check fails, abort and raise a blocker ticket (see § 7).

---

## 2️⃣ Set up the execution context

```bash
# 2.1 Clone the repo (if not already)
git clone https:
...[truncated]

Must

  • Use a bounded retry budget and summarize every attempt.
  • Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
  • Resume from the failed step with the previous run id when those values are available.

Must Not

  • Edit arbitrary repository source files as the default auto-fix surface.
  • Keep retrying after the configured max attempts.
  • Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Immediate troubleshooting checklist**

1. **Capture the failure output**  
   ```bash
   ricky run workflows/foo.ts --no-auto-fix 2>&1 | tee ricky-run.log

Keep the full ricky-run.log – it is the primary artifact for any further analysis.

  1. Identify the verification gate that stopped the run
    The log will contain a line like:

    [VERIFICATION GATE] <gate-name> – FAILED
    

    Note the exact gate name (e.g., SchemaValidation, DependencyResolution, RuntimeSafety) and the accompanying error message.

  2. Locate the corresponding contract file
    Each workflow has a contract under contracts/. For workflows/foo.ts the contract is:

    contracts/foo.contract.json
    

    Open it and verify that:

    • All required inputs are declared.
    • Output schemas match the expectations of downstream steps.
    • Version constraints ("engine": ">=1.4.0 <2.0.0") are satisfied.
  3. Run the deterministic verification gate manually
    Most gates expose a CLI helper. For the failing gate, run:

    ricky verify --gate <gate-name> --contract contracts/foo.contract.json

    This will produce a deterministic report (`veri
    ...[truncated]


**Must**

- Preserve one-attempt behavior when auto-fix is disabled.
- Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
- Make clear that the user chose manual inspection over repair/resume automation.

**Must Not**

- Start a repair loop despite `--no-auto-fix`.
- Suppress the diagnosis because no repair was attempted.
- Present the failure as a completed repair attempt.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.in-process-local-runner</code> (runtime-recovery/manual)</summary>

**User message**

> Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

**Ricky output**

```text
**Ricky’s Standard Procedure for Executing a Local TypeScript Workflow Artifact**

Below is the concrete, repeatable contract that Ricky follows whenever a developer asks to run a TypeScript‑based workflow that lives in the *primary local path* (by default `./workflows`). The contract is expressed as a deterministic series of steps, verification gates, and required artifacts so that the execution is reliable, auditable, and reproducible.

---

## 1. Preconditions (Input Contract)

| Item | Requirement | Source |
|------|-------------|--------|
| **Workflow source** | A single entry‑point file `*.workflow.ts` (or `index.ts`) located under `./workflows` (or a sub‑directory explicitly passed as `--path`). | Developer |
| **Node runtime** | Node ≥ 18.x installed and available on `$PATH`. | System |
| **Package manager** | `npm` (or `pnpm`/`yarn` if the repo declares a lockfile). | System |
| **TypeScript compiler** | `typescript` (≥ 5.0) present in `devDependencies`. | `package.json` |
| **Execution wrapper** | `ts-node` (≥ 10.x) present in `devDependencies`. | `package.json` |
| **Environment config** | Optional `.env` file in the repo root; if present, it must be loaded via `dotenv`.
...[truncated]

Must

  • Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
  • Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
  • Record the actual spawn command in execution evidence.

Must Not

  • Fail solely because agent-relay is not on PATH when the SDK route is available.
  • Hide the actual runtime command from evidence.
  • Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky’s Boundary‑Escalation Workflow for Persistent Structural Failures**

When a structural failure remains after an initial fix attempt, Ricky follows a deterministic, contract‑driven escalation path. The workflow is bounded, auditable, and designed to surface blockers early while preserving branch and PR hygiene.

---

## 1. Preconditions (Contract)

| Item | Requirement |
|------|--------------|
| **Failure Ticket** | Must exist in the issue tracker (e.g., `WF‑1234`) with status **“Open – Structural Failure”**. |
| **Initial Fix Branch** | A feature branch `fix/struct‑<short‑id>` that has been merged to `staging` and deployed to the verification environment. |
| **Verification Gate** | Automated regression suite (`struct‑suite`) must have run at least once post‑merge. |
| **Owner** | The workflow owner (Ricky) must have `write` permission on the repo and `triage` permission on the issue tracker. |

If any precondition is missing, Ricky aborts and raises a **Pre‑condition Blocker** (see §7).

---

## 2. Deterministic Verification Loop (80‑100 iterations)

1. **Run Full Structural Test Suite**  
   - Trigger `ci/structural-test` pipeline.  
   - Capture **pass/fail** and **arti
...[truncated]

Must

  • Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
  • Distinguish escalation from a generic product failure.
  • Preserve enough context for a human operator to continue.

Must Not

  • Retry speculative fixes indefinitely.
  • Collapse to "something went wrong" without the attempted actions.
  • Discard evidence from failed repair attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – Aggregated from Recent Runs**  
*(Generated on 2026‑05‑20, covering the last 7 days of execution across all active Ricky pipelines)*  

| Metric | Definition | Current Value | Target / SLA | Status |
|--------|------------|---------------|--------------|--------|
| **Total Runs** | Number of workflow executions (including retries) | 1 842 | — | ✅ |
| **Successful Runs** | Runs that passed all deterministic verification gates | 1 735 | ≥ 95 % of total | ✅ |
| **Failed Runs** | Runs that hit a hard blocker (e.g., contract violation, missing artifact) | 107 | ≤ 5 % of total | ⚠️ |
| **Average Runtime** | Mean wall‑clock time from start to final gate | 3 m 27 s | ≤ 5 m | ✅ |
| **90‑th‑Percentile Runtime** | Runtime at which 90 % of runs complete | 4 m 12 s | ≤ 6 m | ✅ |
| **Deterministic Gate Pass Rate** | % of runs that cleared every contract‑based verification gate | 98.2 % | ≥ 97 % | ✅ |
| **Review Artifact Turn‑around** | Avg. time from gate‑fail to reviewer comment | 22 min | ≤ 30 min | ✅ |
| **Validation Loop Count** | Avg. number of 80‑to‑100‑iteration loops per run (for stochastic steps) | 3.4 | ≤ 5 | ✅ |
| **Branch‑Scope Violations** | PRs that
...[truncated]

Must

  • Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
  • Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
  • Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

  • Mutate evidence while analyzing it.
  • Return generic advice like

Comment truncated to stay within GitHub limits. Download the ricky-eval-run artifact for the full human-review.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant