Skip to content

Latest commit

Β 

History

History
177 lines (143 loc) Β· 8.5 KB

File metadata and controls

177 lines (143 loc) Β· 8.5 KB

ShellForge Roadmap

Completed

v0.1.0 β€” Foundation

  • Go binary with Ollama integration
  • 3 agents (QA, report, prototype)
  • agentguard.yaml governance (enforce/monitor)
  • Cron-based scheduling

v0.2.0 β€” Release Pipeline

  • Goreleaser + Homebrew tap (brew install shellforge)
  • GitHub Pages site
  • shellforge serve β€” daemon mode with memory-aware scheduling
  • Terminal Bench 2.0 Harbor adapter

v0.3.x β€” Multi-Driver

  • shellforge run <driver> β€” launch governed agents
  • Driver support: Claude Code, Copilot CLI, Codex, Gemini
  • Format-agnostic intent parser (extracts tool calls from any model output)
  • Normalizer (raw tool call β†’ Canonical Action Representation)
  • Correction engine (denial β†’ feedback β†’ retry)
  • Setup wizard (6-step interactive installer)

v0.4.x β€” Environment Awareness

  • Server mode (Linux, no GPU) β€” skips Ollama, shows API drivers
  • Mac mode β€” local models via Ollama
  • shellforge evaluate β€” JSON governance evaluation endpoint
  • shellforge swarm β€” starts Dagu orchestration dashboard

v0.5.x β€” Driver Iteration

  • Tested Crush (broken β€” OpenAI-compat shim loses tool calls)
  • Tested Aider (file editing only, no shell execution)
  • Evaluated Goose (Block) β€” native Ollama, actually executes tools

v0.6.0 β€” Goose + Governed Shell

  • Goose as local model driver (shellforge run goose)
  • govern-shell.sh β€” shell wrapper that evaluates every command through AgentGuard
  • shellforge run goose sets SHELL to governed wrapper automatically
  • Fixed catch-all deny bug (bounded-execution policy was denying everything)
  • Dagu DAG templates (sdlc-swarm, studio-swarm, workspace-swarm, multi-driver)

v0.7.0 β€” Anthropic API Provider

  • LLM provider interface (llm.Provider) β€” pluggable Ollama vs Anthropic backends
  • Anthropic API adapter β€” stdlib HTTP, structured tool_use blocks, multi-turn history
  • Prompt caching β€” cache_control: ephemeral on system + tools, ~90% savings on cached tokens
  • Extended thinking budget (--thinking-budget flag)
  • Model cascading via Octi Pulpo (Haikuβ†’Sonnetβ†’Opus by TaskComplexity score)
  • Drift detection β€” self-score every 5 tool calls, steer below 7, kill below 5 twice
  • RTK token compression wired into runShellWithRTK() (70-90% savings on shell output)

v0.8.0 β€” UMAAL (Interactive REPL + Ralph Loop + Enhanced Tools)

  • Interactive REPL (shellforge chat) β€” pair-programming with persistent conversation history
  • Color output (green prompt, red errors, yellow governance denials)
  • Shell escapes (!command) and Ctrl+C interrupt without session kill
  • Ollama (local) and Anthropic API provider support in REPL
  • Ralph Loop (shellforge ralph) β€” stateless-iterative multi-task execution
  • PICK β†’ IMPLEMENT β†’ VALIDATE β†’ COMMIT β†’ RESET cycle
  • Task input from JSON file or Octi Pulpo MCP dispatch
  • --validate flag for post-task test commands, --dry-run for preview
  • Sub-agent orchestrator β€” SpawnSync (block), SpawnAsync (fire and collect)
  • Concurrency control via semaphore, context compression (~750 tokens)
  • edit_file tool β€” targeted find-and-replace
  • glob tool β€” pattern-based file discovery with recursive ** support
  • grep tool β€” regex content search with file:line output

In Progress

Phase 7 β€” Governed Multi-Agent Architecture

Foundation types exist (internal/action/, internal/orchestrator/, internal/scheduler/queue.go) but not wired into execution.

7.1 β€” Wire Orchestrator

  • Connect orchestrator state machine to shellforge run
  • Proposal β†’ Governance β†’ Result flow through kernel
  • Run-level audit trail (structured events, not just logs)

7.2 β€” Turn-Based Swarm

  • Planner agent β€” task decomposition via Ollama
  • Worker agent β€” Goose executes subtasks with governance
  • Evaluator agent β€” validates results
  • State machine: PLANNING β†’ WORKING β†’ EVALUATING β†’ COMPLETE

7.3 β€” Resilience

  • Anti-loop hash detection
  • Escalation thresholds (auto-fail after N denials)
  • Circuit breaker on Ollama failures

7.4 β€” Observability

  • Structured event emission to SQLite
  • Run summaries with governance stats
  • 24h soak test

Planned

Phase 7.5 β€” Octi Pulpo Integration + Browser Drivers

ShellForge orchestrates, Octi Pulpo coordinates, AgentGuard governs. This phase wires the three together.

7.5.1 β€” Octi Pulpo Coordination

  • Consume Octi Pulpo MCP tools (route_recommend, coord_claim, coord_signal)
  • Budget-aware driver selection β€” query Octi Pulpo before choosing model/driver
  • Duplicate work prevention via coord_claim (prevents agent stampedes)
  • Driver health signals β€” broadcast ShellForge agent status to Octi Pulpo

7.5.2 β€” OpenClaw / NemoClaw Browser Driver

  • OpenClaw as execution runtime for browser-based agents
  • NemoClaw as optional adapter (never a dependency β€” protect kernel independence)
  • Browser driver support in shellforge run (alongside Goose, Claude Code, Copilot, Codex, Gemini)
  • Governed browser actions through AgentGuard kernel

7.5.3 β€” Ecosystem Wiring

  • ShellForge agents auto-connect to Octi Pulpo MCP server on startup
  • Shared memory across ShellForge-managed agents via Octi Pulpo memory_store/recall
  • Model routing delegation β€” ShellForge defers to Octi Pulpo route_recommend

Phase 8 β€” AgentGuard MCP Server

  • MCP server exposing governed tools
  • Goose β†’ MCP β†’ AgentGuard β†’ execute
  • Dual-layer: kernel enforces, MCP integrates

Phase 9 β€” Terminal Bench 2.0

  • Harbor adapter
  • Dry run on single task with Goose
  • Full 89-task evaluation
  • Leaderboard submission

Phase 10 β€” Production Hardening

  • AgentGuard Go kernel integration (in-process, not subprocess)
  • Publish Go module (github.com/AgentGuardHQ/agentguard/go/pkg/hook)
  • Move internal/ types to pkg/ for external import
  • Cloud telemetry opt-in (AgentGuard Cloud)

Phase 11 β€” Replace Workspace Bash Swarm βœ… DONE

  • Migrated to API-driven dispatch: Octi Pulpo β†’ ShellForge β†’ Anthropic API
  • GH Actions Copilot Agent workflow (dispatch-agent.yml) for free-tier automation
  • ShellForge is now the execution harness for the agentguard-workspace swarm

Bug Backlog (Open Issues)

Bugs identified during v0.6.x development. Fix before v1.0.

Issue Package Severity Description
#69 agentguard.yaml High Governance gap: plain rm and rm -r bypass no-destructive-rm policy
#67 scripts/govern-shell.sh Medium Fragile sed-based JSON parsing β€” denial reason extraction can fail or corrupt
#65 internal/scheduler Medium os.WriteFile error silently ignored β€” audit log loss
#63 internal/normalizer Medium classifyShellRisk prefix match too broad β€” catalog_tool classified as read-only
#62 cmd/shellforge Medium cmdEvaluate ignores JSON unmarshal error β€” malformed input defaults to allow
#61 internal/intent Low Dead code in flattenParams β€” first assignment immediately overwritten
#60 all packages High Zero test coverage β€” critical for a governance runtime

Stack (as of v0.8.0)

Component Role Status
shellforge chat Interactive REPL Working
shellforge ralph Multi-task loop Working
shellforge agent One-shot agent Working
Goose (Block) Local model driver Working
Claude Code API driver (Linux) Working (via hooks)
Copilot CLI API driver (Linux) Working (via hooks)
Codex CLI API driver (Linux) Coming soon
Gemini CLI API driver (Linux) Coming soon
Ollama Local inference Working
Anthropic API Cloud inference Working (prompt caching)
AgentGuard Governance kernel Working (YAML eval + Go kernel)
Octi Pulpo Swarm coordination Working (MCP)
RTK Token compression Optional
Docker Sandbox Optional