Skip to content

Conversation

@lambertmt
Copy link

Summary

This PR adds an autonomous agent capability that executes tools internally, keeping raw data away from Claude's context window. Real-world testing shows 40-95% Claude token savings on analysis tasks.

Key Features

  • Autonomous agent loop - local LLM decides what tools to call, executes them, analyzes results
  • Built-in SSH execution - agent can run commands on configured hosts
  • GPG-encrypted credentials - secure storage for SSH passwords
  • Strict output formatting - clean JSON tool calls, plain text answers

Real-World Results

Task Claude Direct Claude w/ Agent Local LLM (free) Savings
Debugging workflow (7 calls) ~56,000 ~4,100 ~35,000 93%
Security audit ~11,800 ~800 ~11,000 93%
Docker logs analysis ~10,500 ~500 ~10,000 95%
System health check ~5,500 ~1,500 ~4,000 73%
Log analysis (journalctl) ~4,000 ~800 ~3,200 80%
Code gen (w/ exploration) ~2,700 ~1,700 ~1,000 37%
Disk analysis ~1,500 ~500 ~1,000 65%
Code gen (small input) ~1,550 ~1,600 ~1,500 0%
Simple query ~500 ~300 ~200 40%

Note: The 0% case matters - when raw data is small, there's no benefit. This shines on data-heavy tasks.

How It Works

Claude sends task → Agent (local LLM) executes SSH internally → 
Agent analyzes 40K chars locally → Returns 800-token summary to Claude

Claude never sees the raw output. Tokens shift from paid (Claude) to free (local LLM).

Test Plan

  • Health check tool working
  • SSH execution with GPG-encrypted credentials
  • Agent loop with auto_execute=true
  • Multi-iteration tool calls
  • Token measurements validated

🤖 Generated with Claude Code

Michael Lambert and others added 8 commits January 12, 2026 16:30
Features:
- New agent_chat tool with stateful conversation management
- Tool definition schema for describing available tools
- Few-shot prompt format for reliable JSON tool call output
- Multi-strategy JSON parsing (code blocks, inline, permissive)
- Conversation continuation with tool results
- Automatic conversation cleanup after 30 minutes
- list_conversations debug tool

Enables Claude to delegate tasks to local LLMs while maintaining
control of tool execution - a cost-effective hybrid approach.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The parseToolCall method now properly handles:
- Nested JSON objects (e.g., {"tool": "x", "arguments": {...}})
- Braces inside string values (e.g., "command": "awk '{print}'")

Replaced regex-based extraction with balanced brace tracking that
respects string escaping, enabling reliable tool call detection
from LLM output.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove dangerous single-quote replacement in tryParseToolJson that
  broke JSON containing single quotes in string values (e.g., awk '$3')
- Add DEBUG_MCP env var to enable detailed logging of:
  - parseToolCall input/output and strategies
  - agent_chat conversation flow and LLM responses
- Try JSON parsing strategies in order: as-is, trailing comma fix

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Major changes:
- agent_chat now auto-executes ssh_exec internally (no CC middleman)
- Agentic loop runs until final_answer or max_iterations
- Strict prompt guidelines for clean JSON/text output formatting
- Remove max_tokens limit (local tokens are free with 128K context)
- ssh_exec added as built-in tool automatically
- Reports tools_executed in response for transparency

This enables massive CC token savings - raw tool output (e.g., logs)
never touches Claude's context, only the final analysis is returned.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Documents real-world testing showing 70-90% Claude token savings:
- Log analysis: 15,000 tokens → 1,500 tokens (90% reduction)
- System health check: 15,000 tokens → 4,500 tokens (70% reduction)

Includes architecture diagrams, usage examples, and configuration
guide for the autonomous agent with internal tool execution.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add "How the Token Math Works" section explaining token breakdown
- Correct savings percentages: 40-80% (was 70-90%)
- Add Local Tokens column to comparison tables
- Update video script with consistent numbers and explanations

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Show tokens shifting from Claude (paid) to Local LLM (free)
- Add security audit (93%) and Docker logs (95%) as top examples
- Update all claims to "up to 95%" based on actual testing
- Include 8 test cases sorted by Claude Direct tokens
- Add test-scripts/health_check.sh generated by agent

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Acknowledge CC Token Saver, Ollama Claude, Rubber Duck MCP as prior art
- Position as infrastructure-focused implementation, not novel invention
- Add comparison tables: when to use this vs other options
- Add PORTABILITY.md for others evaluating the tool
- Add posts/ with Reddit and Substack drafts
- Update video script with honest intro and metadata
- Add Nextcloud debugging use case to stats table

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant