Summary
Set up a CI workflow to run the test/init-eval/ eval suite automatically (on PRs and/or on a schedule).
Context
The eval tests exercise sentry init against real project templates and use an LLM judge to score correctness. They currently only run locally because they require a running Mastra API server.
The Mastra agent lives in https://github.com/getsentry/cli-init-api/.
Requirements
- Mastra server: Either deploy a persistent dev instance or spin one up as a service container in the workflow (pull from
getsentry/cli-init-api).
- GitHub environment: Create an
init-eval environment with secrets:
MASTRA_API_URL — URL of the Mastra server
OPENAI_API_KEY — for the LLM judge
- Workflow: Add
.github/workflows/init-eval.yml with push/pull_request/workflow_dispatch triggers. A previous version existed but was removed in d5d0b22 — can be used as a starting point.
- Concurrency: Use a concurrency group to cancel in-progress runs on new pushes.
- Matrix: Run each platform (
express, nextjs, python-fastapi, python-flask, react-vite, sveltekit) as a separate matrix job with fail-fast: false.
Open questions
- Should the Mastra server be a persistent deployment (e.g. Cloudflare Worker) or spun up per-run?
- Should this run on every PR or only on a schedule / manual trigger to save costs?