Skip to content

Plugin: ghost_judge#316

Open
0sicario wants to merge 1 commit into
agent0ai:mainfrom
0sicario:add-ghost-judge
Open

Plugin: ghost_judge#316
0sicario wants to merge 1 commit into
agent0ai:mainfrom
0sicario:add-ghost-judge

Conversation

@0sicario
Copy link
Copy Markdown

Ghost Judge — Silent LLM Quality Gate

Your agent says "done." But is it?

Ghost Judge uses a separate LLM to evaluate your agent's work after every response. Set a /goal, and the judge ensures completeness, cross-referencing, and evidence-backing before presenting results. If the work isn't done, your agent keeps refining — automatically.

What it does

  • /goal and /subgoal slash commands (requires Commands plugin)
  • Domain-aware evaluation: auto-classifies OSINT, design, research tasks from tool usage
  • Configurable judge model (default: Grok 4.3 via OpenRouter, works with any model)
  • Subgoals with evidence-gated evaluation — judge demands concrete proof per criterion
  • Parse failure tracking, turn budget, fail-open on API errors
  • Zero cost when no goal is active

The pitch

A $0.03 judge call turns a cheap LLM into premium-tier output. The capability was always there — the accountability wasn't.

Tested on

  • OSINT person research (caught lazy delegation, forced cross-referenced dossier)
  • Company deep dives (caught single-source data, forced multi-source verification)
  • Property owner resolution (prevented identity merging that previously failed)
  • Cross-surface coherence tests (verified data chaining across Console, Browser, Desktop)

Requirements

  • Agent Zero v1.13+
  • OpenRouter API key (OPENROUTER_API_KEY env var)
  • Commands plugin for /goal and /subgoal

Repository

https://github.com/Kironkeys/ghost-judge

Looking for community testers

Especially interested in results from local LLM setups (Llama, Qwen, Mistral, DeepSeek). Can a small local judge effectively gate a larger agent? Help us find out.

Post-task quality gate using a separate judge model to verify agent work.
Domain-aware evaluation for OSINT, design, and research tasks.
Configurable judge model with reasoning support via OpenRouter.

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant