Agent and LLM evaluation harness — golden datasets, multi-scorer execution, regression detection across model versions, cost-quality leaderboards, and CI gates for model promotion.
express typescript platform-engineering regression-detection ml-ops ai-platform ai-governance llm-eval agent-eval ci-gate
-
Updated
May 7, 2026 - TypeScript