Self-hosted LLM/Agent batch evaluation platform with OpenAI-compatible API testing, streaming response parsing, LLM-as-a-Judge scoring, Docker deployment, and CSV/XLSX export.
docker-compose sqlite nextjs self-hosted streaming-api model-evaluation prisma chatbot-evaluation ai-evaluation llm ai-testing private-deployment llm-evaluation llm-as-a-judge dataset-evaluation openai-compatible agent-evaluation llm-benchmark batch-evaluation eval-platform
-
Updated
May 26, 2026 - TypeScript