Governed AI coding CLI and agent runtime β one Go binary, local or cloud.
Interactive pair-programming with local models + autonomous multi-task execution β with governance on every tool call.
Website Β· Docs Β· Roadmap Β· AgentGuard
brew tap AgentGuardHQ/tap
brew install shellforgeOr from source: git clone https://github.com/AgentGuardHQ/shellforge.git && cd shellforge && go build -o shellforge ./cmd/shellforge/
brew install ollama
ollama serve # start the model server (leave running)ollama pull qwen3:8b # 8B β good balance (needs ~6GB RAM)
# or: ollama pull qwen3:30b # 30B β best quality (needs ~19GB, M4 Pro recommended)
# or: ollama pull qwen3:1.7b # 1.7B β fastest, minimal RAMcd ~/your-project # navigate to any repo you want to work in
shellforge setup # creates agentguard.yaml + output dirsThis creates agentguard.yaml (governance policy) in your project root. Edit it to customize which actions are allowed/denied.
shellforge chat # interactive REPL β pair-program with a local modelOr run a one-shot agent:
shellforge agent "describe what this project does"
shellforge agent "find test gaps and suggest improvements"Every tool call (file reads, writes, shell commands) passes through governance before execution.
Requirements: macOS (Apple Silicon or Intel) or Linux
ShellForge is a governed AI coding CLI and agent runtime β like Claude Code or Cursor, but with local models and policy enforcement built in.
Two modes:
- Interactive REPL (
shellforge chat) β pair-program with a local or cloud model. Persistent conversation history, shell escapes, color output. - Autonomous agents (
shellforge agent,shellforge ralph) β one-shot tasks or multi-task loops with automatic validation and commit.
Both modes share the same governance layer. Every tool call passes through AgentGuard policy enforcement before execution.
You (chat) or Octi Pulpo (dispatch)
β ShellForge Agent Loop (tool calling, drift detection)
β AgentGuard Governance (allow / deny / correct)
β Your Environment (files, shell, git)
Pair-programming mode. Persistent conversation history across prompts β the model remembers what you discussed.
shellforge chat # local model via Ollama (default)
shellforge chat --provider anthropic # Anthropic API (Haiku/Sonnet/Opus)
shellforge chat --model qwen3:14b # pick a specific modelFeatures:
- Color output β green prompt, red errors, yellow governance denials
- Shell escapes β
!git statusruns a command without leaving the session - Ctrl+C β interrupts the current agent run without killing the session
- Governance β every tool call checked against
agentguard.yaml, same as autonomous mode
Stateless-iterative multi-task execution. Each task gets a fresh context window β no accumulated confusion across tasks.
shellforge ralph tasks.json # run tasks from a JSON file
shellforge ralph --validate "go test ./..." # validate after each task
shellforge ralph --dry-run # preview without executingThe loop: PICK a task β IMPLEMENT it β VALIDATE (run tests) β COMMIT on success β RESET context β next task.
Tasks come from a JSON file or Octi Pulpo MCP dispatch. Failed validations skip the commit and move on β no broken code lands.
| Layer | Project | What It Does |
|---|---|---|
| Infer | Ollama | Local LLM inference (Metal GPU on Mac) |
| Optimize | RTK | Token compression β 70-90% reduction on shell output |
| Execute | Goose | AI coding agent with native Ollama support (headless) |
| Coordinate | Octi Pulpo | Budget-aware dispatch, episodic memory, model cascading |
| Govern | AgentGuard | Policy enforcement on every action β allow/deny/correct |
| Sandbox | OpenShell | Kernel-level isolation (Docker on macOS) |
| Scan | DefenseClaw | Supply chain scanner β AI Bill of Materials |
shellforge status
# Ollama running (qwen3:30b loaded)
# RTK v0.4.2
# AgentGuard enforce mode (5 rules)
# Octi Pulpo connected (http://localhost:8080)
# OpenShell Docker sandbox active
# DefenseClaw scanner ready| Command | Description |
|---|---|
shellforge chat |
Interactive REPL β pair-program with a local or cloud model |
shellforge chat --provider anthropic |
REPL via Anthropic API (Haiku/Sonnet/Opus) |
shellforge chat --model qwen3:14b |
REPL with a specific Ollama model |
shellforge ralph tasks.json |
Multi-task loop β stateless-iterative execution |
shellforge ralph --validate "go test ./..." |
Ralph Loop with post-task validation |
shellforge ralph --dry-run |
Preview tasks without executing |
shellforge agent "prompt" |
One-shot governed agent (Ollama, default) |
shellforge agent --provider anthropic "prompt" |
One-shot via Anthropic API (prompt caching) |
shellforge agent --thinking-budget 8000 "prompt" |
Enable extended thinking (Sonnet/Opus) |
shellforge run <driver> "prompt" |
Run a governed CLI driver (goose, claude, copilot, codex, gemini) |
shellforge setup |
Install Ollama, create governance config, verify stack |
shellforge qa [dir] |
QA analysis β find test gaps and issues |
shellforge report [repo] |
Generate a status report from git + logs |
shellforge serve agents.yaml |
Daemon mode β run a 24/7 agent swarm |
shellforge status |
Show ecosystem health |
shellforge version |
Print version |
The agent loop (used by chat, agent, and ralph) has 8 built-in tools, all governed:
| Tool | What It Does |
|---|---|
read_file |
Read file contents |
write_file |
Write a complete file |
edit_file |
Targeted find-and-replace (like Claude Code's Edit tool) |
glob |
Pattern-based file discovery with recursive ** support |
grep |
Regex content search with file:line output |
run_shell |
Execute shell commands (via RTK for token compression) |
list_directory |
List directory contents |
search_files |
Search files by name pattern |
ShellForge governs any CLI agent driver via AgentGuard hooks. Each driver keeps its own model and agent loop β ShellForge ensures governance is active and spawns the driver as a subprocess.
# Run any driver with governance
shellforge run claude "review this code"
shellforge run codex "generate tests"
shellforge run copilot "update docs"
shellforge run gemini "security audit"Orchestrate multiple drivers in a single Dagu DAG:
dagu start dags/multi-driver-swarm.yamlSee dags/multi-driver-swarm.yaml and dags/workspace-swarm.yaml for examples.
βββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Entry Points β
β chat (REPL) Β· agent (one-shot) Β· ralph (multi) β
β run <driver> Β· serve (daemon) β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββ
β prompt / task
ββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ
β Octi Pulpo (Coordination) β
β Budget-aware dispatch Β· Memory Β· Model cascading β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββ
β task
ββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ
β ShellForge Agent Loop β
β LLM provider Β· Tool calling Β· Drift detection β
β Sub-agent orchestrator (spawn sync/async) β
β Anthropic API or Ollama β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββ
β tool call
ββββββββββββͺβββββββββββ
β AgentGuard β
β Governance Kernel β
β allow Β· deny Β· auditβ
β every. single. call.β
ββββββββββββͺβββββββββββ
β approved
ββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ
β Your Environment β
β Files Β· Shell (RTK) Β· Git Β· Network β
β 8 tools: read/write/edit/glob/grep/shell/ls/find β
β Sandboxed by OpenShell β
βββββββββββββββββββββββββββββββββββββββββββββββββββββ
ShellForge's core value. Every tool call passes through agentguard.yaml before execution.
# agentguard.yaml β policy-as-code for every agent action
mode: enforce # enforce | monitor
policies:
- name: no-force-push
action: deny
pattern: "git push --force"
- name: no-destructive-rm
action: deny
pattern: "rm -rf"
- name: no-secret-access
action: deny
pattern: "*.env|*id_rsa|*id_ed25519"When an action is denied, ShellForge's correction engine feeds structured feedback back to the model so it can self-correct β not just fail.
Run a 24/7 agent swarm on your Mac with memory-aware scheduling:
shellforge serve agents.yamlAuto-detects RAM, calculates max parallel Ollama slots, queues the rest.
# agents.yaml
max_parallel: 0 # 0 = auto-detect from RAM
model_ram_gb: 19 # qwen3:30b Q4
agents:
- name: qa-agent
system: "You are a QA engineer."
prompt: "Analyze the repo for test gaps."
schedule: "4h"
priority: 2
timeout: 300
enabled: trueMemory budget (qwen3:30b Q4):
| Mac | RAM | Free for KV | Max Parallel |
|---|---|---|---|
| M4 Pro 48GB | 48 GB | ~25 GB | 3-4 agents |
| M4 32GB | 32 GB | ~9 GB | 1-2 agents |
Tip: OLLAMA_KV_CACHE_TYPE=q8_0 halves KV cache memory β doubles agent capacity.
| Model | Params | RAM | Best For |
|---|---|---|---|
qwen3:1.7b |
1.7B | ~1.2 GB | Fast tasks, prototyping |
qwen3:4b |
4B | ~3 GB | Balanced reasoning |
qwen3:30b |
30B | ~19 GB | Production quality (M4 Pro 48GB) |
mistral:7b |
7B | ~5 GB | Complex analysis |
- Ollama uses Metal GPU acceleration β no CUDA needed
- KV cache quantization (
OLLAMA_KV_CACHE_TYPE=q8_0) halves memory per agent slot - OpenShell requires Docker via Colima
| Project | Role | What It Does |
|---|---|---|
| ShellForge | Orchestration | Governed agent runtime β CLI drivers + OpenClaw + local models |
| Octi Pulpo | Coordination | Swarm brain β shared memory, model routing, budget-aware dispatch |
| AgentGuard | Governance | Policy enforcement, telemetry, invariants β on every tool call |
| AgentGuard Cloud | Observability | SaaS dashboard β session replay, compliance, analytics |
ShellForge orchestrates. Octi Pulpo coordinates. AgentGuard governs.
| Runtime | What It Adds | Best For |
|---|---|---|
| CLI Drivers | Claude Code, Codex, Copilot, Gemini, Goose | Coding, PRs, commits |
| OpenClaw | Browser automation, 100+ skills, web app access | Integrations, NotebookLM, ChatGPT |
| NemoClaw | OpenClaw + NVIDIA OpenShell sandbox + Nemotron | Enterprise, air-gapped, zero-cost local inference |
| Ollama | Local model inference (Metal GPU) | Privacy, zero API cost |
git checkout -b feat/my-feature
go build ./cmd/shellforge/
go test ./...See docs/roadmap.md for what's planned.