Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 86 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@

# ShellForge

**Governed AI agent runtime — one Go binary, local or cloud.**
**Governed AI coding CLI and agent runtime — one Go binary, local or cloud.**

[![Go](https://img.shields.io/badge/Go-1.18+-00ADD8?style=for-the-badge&logo=go&logoColor=white)](https://go.dev)
[![GitHub Pages](https://img.shields.io/badge/Live_Site-agentguardhq.github.io/shellforge-ff6b2b?style=for-the-badge)](https://agentguardhq.github.io/shellforge)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue?style=for-the-badge)](LICENSE)
[![AgentGuard](https://img.shields.io/badge/Governed_by-AgentGuard-green?style=for-the-badge)](https://github.com/AgentGuardHQ/agentguard)

*Run autonomous AI agents with policy enforcement on every tool call. Local via Ollama or cloud via Anthropic API — your choice.*
*Interactive pair-programming with local models + autonomous multi-task execution — with governance on every tool call.*

[Website](https://agentguardhq.github.io/shellforge) · [Docs](docs/architecture.md) · [Roadmap](docs/roadmap.md) · [AgentGuard](https://github.com/AgentGuardHQ/agentguard)

Expand Down Expand Up @@ -54,12 +54,17 @@ shellforge setup # creates agentguard.yaml + output dirs

This creates `agentguard.yaml` (governance policy) in your project root. Edit it to customize which actions are allowed/denied.

### 5. Run an agent
### 5. Start a chat session

```bash
shellforge chat # interactive REPL — pair-program with a local model
```

Or run a one-shot agent:

```bash
shellforge agent "describe what this project does"
shellforge agent "find test gaps and suggest improvements"
shellforge agent "create a hello world program"
```

Every tool call (file reads, writes, shell commands) passes through governance before execution.
Expand All @@ -70,17 +75,55 @@ Every tool call (file reads, writes, shell commands) passes through governance b

## What Is ShellForge?

ShellForge is a **governed agent runtime** — not an agent framework, not an orchestration layer, not a prompt wrapper.
ShellForge is a **governed AI coding CLI and agent runtime** — like Claude Code or Cursor, but with local models and policy enforcement built in.

It sits between any agent driver and the real world. The agent decides what it wants to do. ShellForge decides whether it's allowed.
Two modes:

1. **Interactive REPL** (`shellforge chat`) — pair-program with a local or cloud model. Persistent conversation history, shell escapes, color output.
2. **Autonomous agents** (`shellforge agent`, `shellforge ralph`) — one-shot tasks or multi-task loops with automatic validation and commit.

Both modes share the same governance layer. Every tool call passes through [AgentGuard](https://github.com/AgentGuardHQ/agentguard) policy enforcement before execution.

```
Agent Driver (Goose, Claude Code, Copilot CLI)
→ ShellForge Governance (allow / deny / correct)
→ Your Environment (files, shell, git)
You (chat) or Octi Pulpo (dispatch)
→ ShellForge Agent Loop (tool calling, drift detection)
→ AgentGuard Governance (allow / deny / correct)
→ Your Environment (files, shell, git)
```

**The core insight:** ShellForge's value is governance, not the agent loop. [Goose](https://block.github.io/goose) handles local agent execution. [Dagu](https://github.com/dagu-org/dagu) handles workflow orchestration. ShellForge wraps them all with [AgentGuard](https://github.com/AgentGuardHQ/agentguard) policy enforcement on every tool call.
---

## Interactive REPL (`shellforge chat`)

Pair-programming mode. Persistent conversation history across prompts — the model remembers what you discussed.

```bash
shellforge chat # local model via Ollama (default)
shellforge chat --provider anthropic # Anthropic API (Haiku/Sonnet/Opus)
shellforge chat --model qwen3:14b # pick a specific model
```

Features:
- **Color output** — green prompt, red errors, yellow governance denials
- **Shell escapes** — `!git status` runs a command without leaving the session
- **Ctrl+C** — interrupts the current agent run without killing the session
- **Governance** — every tool call checked against `agentguard.yaml`, same as autonomous mode

---

## Ralph Loop (`shellforge ralph`)

Stateless-iterative multi-task execution. Each task gets a fresh context window — no accumulated confusion across tasks.

```bash
shellforge ralph tasks.json # run tasks from a JSON file
shellforge ralph --validate "go test ./..." # validate after each task
shellforge ralph --dry-run # preview without executing
```

The loop: **PICK** a task → **IMPLEMENT** it → **VALIDATE** (run tests) → **COMMIT** on success → **RESET** context → next task.

Tasks come from a JSON file or Octi Pulpo MCP dispatch. Failed validations skip the commit and move on — no broken code lands.

---

Expand Down Expand Up @@ -112,8 +155,14 @@ shellforge status

| Command | Description |
|---------|-------------|
| `shellforge agent "prompt"` | Run a governed agent (Ollama, default) |
| `shellforge agent --provider anthropic "prompt"` | Run via Anthropic API (Haiku/Sonnet/Opus, prompt caching) |
| `shellforge chat` | Interactive REPL — pair-program with a local or cloud model |
| `shellforge chat --provider anthropic` | REPL via Anthropic API (Haiku/Sonnet/Opus) |
| `shellforge chat --model qwen3:14b` | REPL with a specific Ollama model |
| `shellforge ralph tasks.json` | Multi-task loop — stateless-iterative execution |
| `shellforge ralph --validate "go test ./..."` | Ralph Loop with post-task validation |
| `shellforge ralph --dry-run` | Preview tasks without executing |
| `shellforge agent "prompt"` | One-shot governed agent (Ollama, default) |
| `shellforge agent --provider anthropic "prompt"` | One-shot via Anthropic API (prompt caching) |
| `shellforge agent --thinking-budget 8000 "prompt"` | Enable extended thinking (Sonnet/Opus) |
| `shellforge run <driver> "prompt"` | Run a governed CLI driver (goose, claude, copilot, codex, gemini) |
| `shellforge setup` | Install Ollama, create governance config, verify stack |
Expand All @@ -125,6 +174,23 @@ shellforge status

---

## Built-in Tools

The agent loop (used by `chat`, `agent`, and `ralph`) has 8 built-in tools, all governed:

| Tool | What It Does |
|------|-------------|
| `read_file` | Read file contents |
| `write_file` | Write a complete file |
| `edit_file` | Targeted find-and-replace (like Claude Code's Edit tool) |
| `glob` | Pattern-based file discovery with recursive `**` support |
| `grep` | Regex content search with `file:line` output |
| `run_shell` | Execute shell commands (via RTK for token compression) |
| `list_directory` | List directory contents |
| `search_files` | Search files by name pattern |

---

## Multi-Driver Governance

ShellForge governs any CLI agent driver via AgentGuard hooks. Each driver keeps its own model and agent loop — ShellForge ensures governance is active and spawns the driver as a subprocess.
Expand All @@ -151,13 +217,20 @@ See `dags/multi-driver-swarm.yaml` and `dags/workspace-swarm.yaml` for examples.

```
┌───────────────────────────────────────────────────┐
│ Entry Points │
│ chat (REPL) · agent (one-shot) · ralph (multi) │
│ run <driver> · serve (daemon) │
└────────────────────┬──────────────────────────────┘
│ prompt / task
┌────────────────────▼──────────────────────────────┐
│ Octi Pulpo (Coordination) │
│ Budget-aware dispatch · Memory · Model cascading │
└────────────────────┬──────────────────────────────┘
│ task
┌────────────────────▼──────────────────────────────┐
│ ShellForge Agent Loop │
│ LLM provider · Tool calling · Drift detection │
│ Sub-agent orchestrator (spawn sync/async) │
│ Anthropic API or Ollama │
└────────────────────┬──────────────────────────────┘
│ tool call
Expand All @@ -171,6 +244,7 @@ See `dags/multi-driver-swarm.yaml` and `dags/workspace-swarm.yaml` for examples.
┌────────────────────▼──────────────────────────────┐
│ Your Environment │
│ Files · Shell (RTK) · Git · Network │
│ 8 tools: read/write/edit/glob/grep/shell/ls/find │
│ Sandboxed by OpenShell │
└───────────────────────────────────────────────────┘
```
Expand Down
61 changes: 51 additions & 10 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,41 @@

ShellForge is a single Go binary (~7.5MB) that provides governed AI agent execution. Its core value is **governance** — every agent driver, whether a CLI tool, browser session, or local model, runs through AgentGuard policy enforcement on every action.

## Entry Points

ShellForge provides multiple entry points, all sharing the same agent loop and governance layer:

| Entry Point | Mode | Context |
|-------------|------|---------|
| `shellforge chat` | Interactive REPL | Persistent — conversation history across prompts |
| `shellforge agent "prompt"` | One-shot | Single task, single context window |
| `shellforge ralph tasks.json` | Multi-task loop | Stateless-iterative — fresh context per task |
| `shellforge run <driver>` | CLI driver | Governed subprocess (Goose, Claude Code, etc.) |
| `shellforge serve agents.yaml` | Daemon | 24/7 swarm with memory-aware scheduling |

### Interactive REPL (`chat`)

Pair-programming mode. The user and model share a persistent conversation — the model remembers previous prompts and results within the session. Color output (green prompt, red errors, yellow governance denials). Shell escapes via `!command`. Ctrl+C interrupts the current agent run without killing the session.

### Ralph Loop (`ralph`)

Stateless-iterative execution for multi-task workloads. Each task gets a fresh context window to prevent accumulated confusion:

```
PICK task from queue → IMPLEMENT → VALIDATE (run tests) → COMMIT on success → RESET context → next
```

Tasks come from a JSON file or Octi Pulpo MCP dispatch. `--validate` runs a command (e.g., `go test ./...`) after each task. `--dry-run` previews without executing.

### Sub-Agent Orchestrator

The agent loop can spawn sub-agents for parallel work:

- **SpawnSync** — block and wait for a sub-agent to complete
- **SpawnAsync** — fire multiple sub-agents, collect results
- Concurrency controlled via semaphore
- Sub-agent results compressed to ~750 tokens before returning to parent

## Execution Model

ShellForge supports three classes of agent driver, all governed uniformly:
Expand Down Expand Up @@ -110,7 +145,6 @@ Octi Pulpo routes tasks to the cheapest capable driver:
| **Optimize** | [RTK](https://github.com/rtk-ai/rtk) | Token compression — 70-90% reduction on shell output |
| **Execute** | [Goose](https://block.github.io/goose) / [OpenClaw](https://github.com/openclaw/openclaw) | Agent execution + browser automation |
| **Coordinate** | [Octi Pulpo](https://github.com/AgentGuardHQ/octi-pulpo) | Budget-aware dispatch, episodic memory, model cascading |
| **Coordinate** | [Octi Pulpo](https://github.com/AgentGuardHQ/octi-pulpo) | Swarm coordination via MCP |
| **Govern** | [AgentGuard](https://github.com/AgentGuardHQ/agentguard) | Policy enforcement on every action |
| **Sandbox** | [OpenShell](https://github.com/NVIDIA/OpenShell) | Kernel-level isolation (Docker on macOS) |
| **Scan** | [DefenseClaw](https://github.com/cisco-ai-defense/defenseclaw) | Supply chain scanner — AI Bill of Materials |
Expand All @@ -120,6 +154,8 @@ Octi Pulpo routes tasks to the cheapest capable driver:
```
cmd/shellforge/
├── main.go # CLI entry point (cobra-style subcommands)
├── chat.go # Interactive REPL (`shellforge chat`)
├── ralph.go # Multi-task loop (`shellforge ralph`)
└── status.go # Ecosystem health check

internal/
Expand All @@ -128,10 +164,13 @@ internal/
│ └── anthropic.go# Anthropic API adapter (stdlib HTTP, prompt caching, tool_use)
├── agent/ # Agentic loop
│ ├── loop.go # runProviderLoop (Anthropic) + runOllamaLoop, drift detection wiring
│ └── drift.go # Drift detector — self-score every 5 calls, steer/kill on low scores
│ ├── drift.go # Drift detector — self-score every 5 calls, steer/kill on low scores
│ └── repl.go # Interactive REPL — persistent history, color output, shell escapes
├── ralph/ # Ralph Loop — stateless-iterative multi-task execution
│ └── loop.go # PICK → IMPLEMENT → VALIDATE → COMMIT → RESET cycle
├── governance/ # agentguard.yaml parser + policy engine
├── ollama/ # Ollama HTTP client (chat, generate)
├── tools/ # 5 tool implementations + RTK wrapper
├── tools/ # 8 tool implementations (read/write/edit/glob/grep/shell/ls/find) + RTK wrapper
├── engine/ # Pluggable engine interface (Goose, OpenClaw, OpenCode)
├── logger/ # Structured JSON logging
├── scheduler/ # Memory-aware scheduling + cron
Expand All @@ -146,17 +185,19 @@ internal/

ShellForge uses a pluggable engine system:

1. **Goose** (preferred local driver) — subprocess, native Ollama support, SHELL wrapped via `govern-shell.sh`
2. **OpenClaw** (browser + integrations) — browser automation, web app access, 100+ skills
3. **NemoClaw** (enterprise) — OpenClaw + NVIDIA OpenShell sandbox + Nemotron local models
4. **CLI Drivers** (cloud coding) — Claude Code, Codex, Copilot CLI, Gemini CLI
5. **Native** (fallback) — built-in multi-turn loop with Ollama + tool calling
1. **Native REPL** (`shellforge chat`) — interactive pair-programming, persistent history, 8 built-in tools
2. **Native Agent** (`shellforge agent`) — one-shot autonomous execution with the same tool set
3. **Ralph Loop** (`shellforge ralph`) — stateless-iterative multi-task with validation and auto-commit
4. **Goose** (local driver) — subprocess, native Ollama support, SHELL wrapped via `govern-shell.sh`
5. **OpenClaw** (browser + integrations) — browser automation, web app access, 100+ skills
6. **NemoClaw** (enterprise) — OpenClaw + NVIDIA OpenShell sandbox + Nemotron local models
7. **CLI Drivers** (cloud coding) — Claude Code, Codex, Copilot CLI, Gemini CLI

## Governance Flow

```
User Request → Engine (Goose/OpenClaw/CLI/Native)
→ Tool Call → Governance Check (agentguard.yaml)
User Request → Entry Point (chat/agent/ralph/run/serve)
Agent Loop → Tool Call → Governance Check (agentguard.yaml)
→ ALLOW → Execute Tool → Return Result
→ DENY → Log Violation → Correction Feedback → Retry
```
Expand Down
25 changes: 22 additions & 3 deletions docs/roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
- [x] Fixed catch-all deny bug (bounded-execution policy was denying everything)
- [x] Dagu DAG templates (sdlc-swarm, studio-swarm, workspace-swarm, multi-driver)

### v0.7.0 — Anthropic API Provider ← CURRENT
### v0.7.0 — Anthropic API Provider
- [x] LLM provider interface (`llm.Provider`) — pluggable Ollama vs Anthropic backends
- [x] Anthropic API adapter — stdlib HTTP, structured `tool_use` blocks, multi-turn history
- [x] Prompt caching — `cache_control: ephemeral` on system + tools, ~90% savings on cached tokens
Expand All @@ -49,6 +49,21 @@
- [x] Drift detection — self-score every 5 tool calls, steer below 7, kill below 5 twice
- [x] RTK token compression wired into `runShellWithRTK()` (70-90% savings on shell output)

### v0.8.0 — UMAAL (Interactive REPL + Ralph Loop + Enhanced Tools)
- [x] Interactive REPL (`shellforge chat`) — pair-programming with persistent conversation history
- [x] Color output (green prompt, red errors, yellow governance denials)
- [x] Shell escapes (`!command`) and Ctrl+C interrupt without session kill
- [x] Ollama (local) and Anthropic API provider support in REPL
- [x] Ralph Loop (`shellforge ralph`) — stateless-iterative multi-task execution
- [x] PICK → IMPLEMENT → VALIDATE → COMMIT → RESET cycle
- [x] Task input from JSON file or Octi Pulpo MCP dispatch
- [x] `--validate` flag for post-task test commands, `--dry-run` for preview
- [x] Sub-agent orchestrator — SpawnSync (block), SpawnAsync (fire and collect)
- [x] Concurrency control via semaphore, context compression (~750 tokens)
- [x] `edit_file` tool — targeted find-and-replace
- [x] `glob` tool — pattern-based file discovery with recursive `**` support
- [x] `grep` tool — regex content search with `file:line` output

---

## In Progress
Expand Down Expand Up @@ -142,17 +157,21 @@ Bugs identified during v0.6.x development. Fix before v1.0.

---

## Stack (as of v0.6.1)
## Stack (as of v0.8.0)

| Component | Role | Status |
|---|---|---|
| `shellforge chat` | Interactive REPL | Working |
| `shellforge ralph` | Multi-task loop | Working |
| `shellforge agent` | One-shot agent | Working |
| Goose (Block) | Local model driver | Working |
| Claude Code | API driver (Linux) | Working (via hooks) |
| Copilot CLI | API driver (Linux) | Working (via hooks) |
| Codex CLI | API driver (Linux) | Coming soon |
| Gemini CLI | API driver (Linux) | Coming soon |
| Ollama | Local inference | Working |
| Anthropic API | Cloud inference | Working (prompt caching) |
| AgentGuard | Governance kernel | Working (YAML eval + Go kernel) |
| Dagu | Orchestration | Working (DAGs + web UI) |
| Octi Pulpo | Swarm coordination | Working (MCP) |
| RTK | Token compression | Optional |
| Docker | Sandbox | Optional |
Loading