- Go binary with Ollama integration
- 3 agents (QA, report, prototype)
- agentguard.yaml governance (enforce/monitor)
- Cron-based scheduling
- Goreleaser + Homebrew tap (
brew install shellforge) - GitHub Pages site
-
shellforge serveβ daemon mode with memory-aware scheduling - Terminal Bench 2.0 Harbor adapter
-
shellforge run <driver>β launch governed agents - Driver support: Claude Code, Copilot CLI, Codex, Gemini
- Format-agnostic intent parser (extracts tool calls from any model output)
- Normalizer (raw tool call β Canonical Action Representation)
- Correction engine (denial β feedback β retry)
- Setup wizard (6-step interactive installer)
- Server mode (Linux, no GPU) β skips Ollama, shows API drivers
- Mac mode β local models via Ollama
-
shellforge evaluateβ JSON governance evaluation endpoint -
shellforge swarmβ starts Dagu orchestration dashboard
- Tested Crush (broken β OpenAI-compat shim loses tool calls)
- Tested Aider (file editing only, no shell execution)
- Evaluated Goose (Block) β native Ollama, actually executes tools
- Goose as local model driver (
shellforge run goose) -
govern-shell.shβ shell wrapper that evaluates every command through AgentGuard -
shellforge run goosesets SHELL to governed wrapper automatically - Fixed catch-all deny bug (bounded-execution policy was denying everything)
- Dagu DAG templates (sdlc-swarm, studio-swarm, workspace-swarm, multi-driver)
- LLM provider interface (
llm.Provider) β pluggable Ollama vs Anthropic backends - Anthropic API adapter β stdlib HTTP, structured
tool_useblocks, multi-turn history - Prompt caching β
cache_control: ephemeralon system + tools, ~90% savings on cached tokens - Extended thinking budget (
--thinking-budgetflag) - Model cascading via Octi Pulpo (HaikuβSonnetβOpus by
TaskComplexityscore) - Drift detection β self-score every 5 tool calls, steer below 7, kill below 5 twice
- RTK token compression wired into
runShellWithRTK()(70-90% savings on shell output)
- Interactive REPL (
shellforge chat) β pair-programming with persistent conversation history - Color output (green prompt, red errors, yellow governance denials)
- Shell escapes (
!command) and Ctrl+C interrupt without session kill - Ollama (local) and Anthropic API provider support in REPL
- Ralph Loop (
shellforge ralph) β stateless-iterative multi-task execution - PICK β IMPLEMENT β VALIDATE β COMMIT β RESET cycle
- Task input from JSON file or Octi Pulpo MCP dispatch
-
--validateflag for post-task test commands,--dry-runfor preview - Sub-agent orchestrator β SpawnSync (block), SpawnAsync (fire and collect)
- Concurrency control via semaphore, context compression (~750 tokens)
-
edit_filetool β targeted find-and-replace -
globtool β pattern-based file discovery with recursive**support -
greptool β regex content search withfile:lineoutput
Foundation types exist (internal/action/, internal/orchestrator/, internal/scheduler/queue.go) but not wired into execution.
- Connect orchestrator state machine to
shellforge run - Proposal β Governance β Result flow through kernel
- Run-level audit trail (structured events, not just logs)
- Planner agent β task decomposition via Ollama
- Worker agent β Goose executes subtasks with governance
- Evaluator agent β validates results
- State machine: PLANNING β WORKING β EVALUATING β COMPLETE
- Anti-loop hash detection
- Escalation thresholds (auto-fail after N denials)
- Circuit breaker on Ollama failures
- Structured event emission to SQLite
- Run summaries with governance stats
- 24h soak test
ShellForge orchestrates, Octi Pulpo coordinates, AgentGuard governs. This phase wires the three together.
- Consume Octi Pulpo MCP tools (route_recommend, coord_claim, coord_signal)
- Budget-aware driver selection β query Octi Pulpo before choosing model/driver
- Duplicate work prevention via coord_claim (prevents agent stampedes)
- Driver health signals β broadcast ShellForge agent status to Octi Pulpo
- OpenClaw as execution runtime for browser-based agents
- NemoClaw as optional adapter (never a dependency β protect kernel independence)
- Browser driver support in
shellforge run(alongside Goose, Claude Code, Copilot, Codex, Gemini) - Governed browser actions through AgentGuard kernel
- ShellForge agents auto-connect to Octi Pulpo MCP server on startup
- Shared memory across ShellForge-managed agents via Octi Pulpo memory_store/recall
- Model routing delegation β ShellForge defers to Octi Pulpo route_recommend
- MCP server exposing governed tools
- Goose β MCP β AgentGuard β execute
- Dual-layer: kernel enforces, MCP integrates
- Harbor adapter
- Dry run on single task with Goose
- Full 89-task evaluation
- Leaderboard submission
- AgentGuard Go kernel integration (in-process, not subprocess)
- Publish Go module (
github.com/AgentGuardHQ/agentguard/go/pkg/hook) - Move
internal/types topkg/for external import - Cloud telemetry opt-in (AgentGuard Cloud)
- Migrated to API-driven dispatch: Octi Pulpo β ShellForge β Anthropic API
- GH Actions Copilot Agent workflow (
dispatch-agent.yml) for free-tier automation - ShellForge is now the execution harness for the agentguard-workspace swarm
Bugs identified during v0.6.x development. Fix before v1.0.
| Issue | Package | Severity | Description |
|---|---|---|---|
| #69 | agentguard.yaml |
High | Governance gap: plain rm and rm -r bypass no-destructive-rm policy |
| #67 | scripts/govern-shell.sh |
Medium | Fragile sed-based JSON parsing β denial reason extraction can fail or corrupt |
| #65 | internal/scheduler |
Medium | os.WriteFile error silently ignored β audit log loss |
| #63 | internal/normalizer |
Medium | classifyShellRisk prefix match too broad β catalog_tool classified as read-only |
| #62 | cmd/shellforge |
Medium | cmdEvaluate ignores JSON unmarshal error β malformed input defaults to allow |
| #61 | internal/intent |
Low | Dead code in flattenParams β first assignment immediately overwritten |
| #60 | all packages | High | Zero test coverage β critical for a governance runtime |
| Component | Role | Status |
|---|---|---|
shellforge chat |
Interactive REPL | Working |
shellforge ralph |
Multi-task loop | Working |
shellforge agent |
One-shot agent | Working |
| Goose (Block) | Local model driver | Working |
| Claude Code | API driver (Linux) | Working (via hooks) |
| Copilot CLI | API driver (Linux) | Working (via hooks) |
| Codex CLI | API driver (Linux) | Coming soon |
| Gemini CLI | API driver (Linux) | Coming soon |
| Ollama | Local inference | Working |
| Anthropic API | Cloud inference | Working (prompt caching) |
| AgentGuard | Governance kernel | Working (YAML eval + Go kernel) |
| Octi Pulpo | Swarm coordination | Working (MCP) |
| RTK | Token compression | Optional |
| Docker | Sandbox | Optional |