Skip to content

AgentGuardHQ/shellforge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

83 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ShellForge

Governed AI coding CLI and agent runtime β€” one Go binary, local or cloud.

Go GitHub Pages License: MIT AgentGuard

Interactive pair-programming with local models + autonomous multi-task execution β€” with governance on every tool call.

Website Β· Docs Β· Roadmap Β· AgentGuard

ShellForge β€” Local Governed Agent Runtime

Quick Start (Mac)

1. Install ShellForge

brew tap AgentGuardHQ/tap
brew install shellforge

Or from source: git clone https://github.com/AgentGuardHQ/shellforge.git && cd shellforge && go build -o shellforge ./cmd/shellforge/

2. Install Ollama (if you haven't already)

brew install ollama
ollama serve                     # start the model server (leave running)

3. Pull a model

ollama pull qwen3:8b             # 8B β€” good balance (needs ~6GB RAM)
# or: ollama pull qwen3:30b      # 30B β€” best quality (needs ~19GB, M4 Pro recommended)
# or: ollama pull qwen3:1.7b     # 1.7B β€” fastest, minimal RAM

4. Run setup inside any repo

cd ~/your-project                # navigate to any repo you want to work in
shellforge setup                 # creates agentguard.yaml + output dirs

This creates agentguard.yaml (governance policy) in your project root. Edit it to customize which actions are allowed/denied.

5. Start a chat session

shellforge chat                     # interactive REPL β€” pair-program with a local model

Or run a one-shot agent:

shellforge agent "describe what this project does"
shellforge agent "find test gaps and suggest improvements"

Every tool call (file reads, writes, shell commands) passes through governance before execution.

Requirements: macOS (Apple Silicon or Intel) or Linux


What Is ShellForge?

ShellForge is a governed AI coding CLI and agent runtime β€” like Claude Code or Cursor, but with local models and policy enforcement built in.

Two modes:

  1. Interactive REPL (shellforge chat) β€” pair-program with a local or cloud model. Persistent conversation history, shell escapes, color output.
  2. Autonomous agents (shellforge agent, shellforge ralph) β€” one-shot tasks or multi-task loops with automatic validation and commit.

Both modes share the same governance layer. Every tool call passes through AgentGuard policy enforcement before execution.

You (chat) or Octi Pulpo (dispatch)
  β†’ ShellForge Agent Loop (tool calling, drift detection)
    β†’ AgentGuard Governance (allow / deny / correct)
      β†’ Your Environment (files, shell, git)

Interactive REPL (shellforge chat)

Pair-programming mode. Persistent conversation history across prompts β€” the model remembers what you discussed.

shellforge chat                          # local model via Ollama (default)
shellforge chat --provider anthropic     # Anthropic API (Haiku/Sonnet/Opus)
shellforge chat --model qwen3:14b        # pick a specific model

Features:

  • Color output β€” green prompt, red errors, yellow governance denials
  • Shell escapes β€” !git status runs a command without leaving the session
  • Ctrl+C β€” interrupts the current agent run without killing the session
  • Governance β€” every tool call checked against agentguard.yaml, same as autonomous mode

Ralph Loop (shellforge ralph)

Stateless-iterative multi-task execution. Each task gets a fresh context window β€” no accumulated confusion across tasks.

shellforge ralph tasks.json                    # run tasks from a JSON file
shellforge ralph --validate "go test ./..."    # validate after each task
shellforge ralph --dry-run                     # preview without executing

The loop: PICK a task β†’ IMPLEMENT it β†’ VALIDATE (run tests) β†’ COMMIT on success β†’ RESET context β†’ next task.

Tasks come from a JSON file or Octi Pulpo MCP dispatch. Failed validations skip the commit and move on β€” no broken code lands.


The Stack

Layer Project What It Does
Infer Ollama Local LLM inference (Metal GPU on Mac)
Optimize RTK Token compression β€” 70-90% reduction on shell output
Execute Goose AI coding agent with native Ollama support (headless)
Coordinate Octi Pulpo Budget-aware dispatch, episodic memory, model cascading
Govern AgentGuard Policy enforcement on every action β€” allow/deny/correct
Sandbox OpenShell Kernel-level isolation (Docker on macOS)
Scan DefenseClaw Supply chain scanner β€” AI Bill of Materials
shellforge status
# Ollama        running (qwen3:30b loaded)
# RTK           v0.4.2
# AgentGuard    enforce mode (5 rules)
# Octi Pulpo    connected (http://localhost:8080)
# OpenShell     Docker sandbox active
# DefenseClaw   scanner ready

CLI Commands

Command Description
shellforge chat Interactive REPL β€” pair-program with a local or cloud model
shellforge chat --provider anthropic REPL via Anthropic API (Haiku/Sonnet/Opus)
shellforge chat --model qwen3:14b REPL with a specific Ollama model
shellforge ralph tasks.json Multi-task loop β€” stateless-iterative execution
shellforge ralph --validate "go test ./..." Ralph Loop with post-task validation
shellforge ralph --dry-run Preview tasks without executing
shellforge agent "prompt" One-shot governed agent (Ollama, default)
shellforge agent --provider anthropic "prompt" One-shot via Anthropic API (prompt caching)
shellforge agent --thinking-budget 8000 "prompt" Enable extended thinking (Sonnet/Opus)
shellforge run <driver> "prompt" Run a governed CLI driver (goose, claude, copilot, codex, gemini)
shellforge setup Install Ollama, create governance config, verify stack
shellforge qa [dir] QA analysis β€” find test gaps and issues
shellforge report [repo] Generate a status report from git + logs
shellforge serve agents.yaml Daemon mode β€” run a 24/7 agent swarm
shellforge status Show ecosystem health
shellforge version Print version

Built-in Tools

The agent loop (used by chat, agent, and ralph) has 8 built-in tools, all governed:

Tool What It Does
read_file Read file contents
write_file Write a complete file
edit_file Targeted find-and-replace (like Claude Code's Edit tool)
glob Pattern-based file discovery with recursive ** support
grep Regex content search with file:line output
run_shell Execute shell commands (via RTK for token compression)
list_directory List directory contents
search_files Search files by name pattern

Multi-Driver Governance

ShellForge governs any CLI agent driver via AgentGuard hooks. Each driver keeps its own model and agent loop β€” ShellForge ensures governance is active and spawns the driver as a subprocess.

# Run any driver with governance
shellforge run claude "review this code"
shellforge run codex "generate tests"
shellforge run copilot "update docs"
shellforge run gemini "security audit"

Orchestrate multiple drivers in a single Dagu DAG:

dagu start dags/multi-driver-swarm.yaml

See dags/multi-driver-swarm.yaml and dags/workspace-swarm.yaml for examples.


Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Entry Points                                      β”‚
β”‚  chat (REPL) Β· agent (one-shot) Β· ralph (multi)   β”‚
β”‚  run <driver> Β· serve (daemon)                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚ prompt / task
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Octi Pulpo (Coordination)                         β”‚
β”‚  Budget-aware dispatch Β· Memory Β· Model cascading  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚ task
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  ShellForge Agent Loop                             β”‚
β”‚  LLM provider Β· Tool calling Β· Drift detection     β”‚
β”‚  Sub-agent orchestrator (spawn sync/async)         β”‚
β”‚  Anthropic API or Ollama                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚ tool call
          ═══════════β•ͺ═══════════
          β•‘  AgentGuard          β•‘
          β•‘  Governance Kernel   β•‘
          β•‘  allow Β· deny Β· auditβ•‘
          β•‘  every. single. call.β•‘
          ═══════════β•ͺ═══════════
                     β”‚ approved
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Your Environment                                  β”‚
β”‚  Files Β· Shell (RTK) Β· Git Β· Network               β”‚
β”‚  8 tools: read/write/edit/glob/grep/shell/ls/find  β”‚
β”‚  Sandboxed by OpenShell                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Governance

ShellForge's core value. Every tool call passes through agentguard.yaml before execution.

# agentguard.yaml β€” policy-as-code for every agent action
mode: enforce  # enforce | monitor

policies:
  - name: no-force-push
    action: deny
    pattern: "git push --force"

  - name: no-destructive-rm
    action: deny
    pattern: "rm -rf"

  - name: no-secret-access
    action: deny
    pattern: "*.env|*id_rsa|*id_ed25519"

When an action is denied, ShellForge's correction engine feeds structured feedback back to the model so it can self-correct β€” not just fail.


Swarm Mode

Run a 24/7 agent swarm on your Mac with memory-aware scheduling:

shellforge serve agents.yaml

Auto-detects RAM, calculates max parallel Ollama slots, queues the rest.

# agents.yaml
max_parallel: 0     # 0 = auto-detect from RAM
model_ram_gb: 19    # qwen3:30b Q4

agents:
  - name: qa-agent
    system: "You are a QA engineer."
    prompt: "Analyze the repo for test gaps."
    schedule: "4h"
    priority: 2
    timeout: 300
    enabled: true

Memory budget (qwen3:30b Q4):

Mac RAM Free for KV Max Parallel
M4 Pro 48GB 48 GB ~25 GB 3-4 agents
M4 32GB 32 GB ~9 GB 1-2 agents

Tip: OLLAMA_KV_CACHE_TYPE=q8_0 halves KV cache memory β€” doubles agent capacity.


Model Options

Model Params RAM Best For
qwen3:1.7b 1.7B ~1.2 GB Fast tasks, prototyping
qwen3:4b 4B ~3 GB Balanced reasoning
qwen3:30b 30B ~19 GB Production quality (M4 Pro 48GB)
mistral:7b 7B ~5 GB Complex analysis

macOS (Apple Silicon / M4)

  • Ollama uses Metal GPU acceleration β€” no CUDA needed
  • KV cache quantization (OLLAMA_KV_CACHE_TYPE=q8_0) halves memory per agent slot
  • OpenShell requires Docker via Colima

The Governed Swarm Platform

Project Role What It Does
ShellForge Orchestration Governed agent runtime β€” CLI drivers + OpenClaw + local models
Octi Pulpo Coordination Swarm brain β€” shared memory, model routing, budget-aware dispatch
AgentGuard Governance Policy enforcement, telemetry, invariants β€” on every tool call
AgentGuard Cloud Observability SaaS dashboard β€” session replay, compliance, analytics

ShellForge orchestrates. Octi Pulpo coordinates. AgentGuard governs.

Supported Runtimes

Runtime What It Adds Best For
CLI Drivers Claude Code, Codex, Copilot, Gemini, Goose Coding, PRs, commits
OpenClaw Browser automation, 100+ skills, web app access Integrations, NotebookLM, ChatGPT
NemoClaw OpenClaw + NVIDIA OpenShell sandbox + Nemotron Enterprise, air-gapped, zero-cost local inference
Ollama Local model inference (Metal GPU) Privacy, zero API cost

Contributing

git checkout -b feat/my-feature
go build ./cmd/shellforge/
go test ./...

See docs/roadmap.md for what's planned.


Website Β· Star on GitHub Β· AgentGuard

Built by humans and agents

MIT License

About

πŸ”₯ Forge local AI agents. Governed. Private. Unstoppable. β€” Ollama + AgentGuard + OpenShell + DefenseClaw

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors