Skip to content

Conversation

@lambertmt
Copy link

Summary

This PR adds support for multi-turn agent conversations where the local LLM can request tool calls that the orchestrating system (Claude) executes.

Features

  • agent_chat tool - Start or continue agent conversations with tool-calling support
  • Stateful conversation management - Conversations persist with unique IDs for multi-turn workflows
  • Few-shot prompt format - Reliable JSON tool call output from instruction-following models
  • Multi-strategy JSON parsing - Handles code blocks, inline JSON, and permissive matching
  • Tool result continuation - Feed tool results back to continue the conversation
  • Auto-cleanup - Conversations expire after 30 minutes
  • list_conversations debug tool - Monitor active agent sessions

Use Case

This enables a hybrid architecture where Claude (or other orchestrators) can:

  1. Delegate analysis/reasoning tasks to local LLMs
  2. Receive structured tool call requests from the local model
  3. Execute tools using the orchestrator's capabilities
  4. Feed results back to continue the agent loop
  5. Get final answers at zero API cost for the delegated reasoning

Example

// Start agent conversation
agent_chat({
  task: "Find Python files and count lines",
  tools: [{ name: "run_command", description: "Run shell command", parameters: {...} }],
  context: "Working in /home/user/project"
})

// Returns: { type: "tool_call", tool_call: { name: "run_command", arguments: {...} }, conversation_id: "..." }

// Continue with tool result
agent_chat({
  conversation_id: "...",
  tool_result: { tool_name: "run_command", result: "file1.py\nfile2.py" }
})

// Returns: { type: "final_answer", content: "Found 2 Python files..." }

Testing

Tested with gpt-oss-120b model via llama-server. The few-shot prompt format produces reliable JSON tool calls.

Test Plan

  • TypeScript compiles without errors
  • Tool call JSON parsing handles multiple formats
  • Conversation state persists across calls
  • Direct llama-server testing confirms tool call output
  • Full integration test with Claude Code as orchestrator

🤖 Generated with Claude Code

Michael Lambert and others added 8 commits January 12, 2026 16:30
Features:
- New agent_chat tool with stateful conversation management
- Tool definition schema for describing available tools
- Few-shot prompt format for reliable JSON tool call output
- Multi-strategy JSON parsing (code blocks, inline, permissive)
- Conversation continuation with tool results
- Automatic conversation cleanup after 30 minutes
- list_conversations debug tool

Enables Claude to delegate tasks to local LLMs while maintaining
control of tool execution - a cost-effective hybrid approach.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The parseToolCall method now properly handles:
- Nested JSON objects (e.g., {"tool": "x", "arguments": {...}})
- Braces inside string values (e.g., "command": "awk '{print}'")

Replaced regex-based extraction with balanced brace tracking that
respects string escaping, enabling reliable tool call detection
from LLM output.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove dangerous single-quote replacement in tryParseToolJson that
  broke JSON containing single quotes in string values (e.g., awk '$3')
- Add DEBUG_MCP env var to enable detailed logging of:
  - parseToolCall input/output and strategies
  - agent_chat conversation flow and LLM responses
- Try JSON parsing strategies in order: as-is, trailing comma fix

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Major changes:
- agent_chat now auto-executes ssh_exec internally (no CC middleman)
- Agentic loop runs until final_answer or max_iterations
- Strict prompt guidelines for clean JSON/text output formatting
- Remove max_tokens limit (local tokens are free with 128K context)
- ssh_exec added as built-in tool automatically
- Reports tools_executed in response for transparency

This enables massive CC token savings - raw tool output (e.g., logs)
never touches Claude's context, only the final analysis is returned.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Documents real-world testing showing 70-90% Claude token savings:
- Log analysis: 15,000 tokens → 1,500 tokens (90% reduction)
- System health check: 15,000 tokens → 4,500 tokens (70% reduction)

Includes architecture diagrams, usage examples, and configuration
guide for the autonomous agent with internal tool execution.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add "How the Token Math Works" section explaining token breakdown
- Correct savings percentages: 40-80% (was 70-90%)
- Add Local Tokens column to comparison tables
- Update video script with consistent numbers and explanations

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Show tokens shifting from Claude (paid) to Local LLM (free)
- Add security audit (93%) and Docker logs (95%) as top examples
- Update all claims to "up to 95%" based on actual testing
- Include 8 test cases sorted by Claude Direct tokens
- Add test-scripts/health_check.sh generated by agent

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Acknowledge CC Token Saver, Ollama Claude, Rubber Duck MCP as prior art
- Position as infrastructure-focused implementation, not novel invention
- Add comparison tables: when to use this vs other options
- Add PORTABILITY.md for others evaluating the tool
- Add posts/ with Reddit and Substack drafts
- Update video script with honest intro and metadata
- Add Nextcloud debugging use case to stats table

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant