Demo repository for showcasing CodexOpt on intentionally messy instruction assets.
This demo is the companion repository for the main CodexOpt project:
AGENTS.mdwith duplicate and conflicting guidanceSKILL.mdexamples:- missing frontmatter
- verbose/redundant text
- duplicated lines
tasks.mdwith 5 evaluation tasksissues.mdwith recurring feedback themes- Tiny Python package under
src/codexopt_demo - GEPA local/cloud setup guide:
docs/gepa-local-and-cloud.md
uv lock
uv sync --extra dev
uv run --no-sync pytest -q
uv run --no-sync ruff check src testsFrom this repo root:
codexopt init
codexopt scan
codexopt benchmark
codexopt optimize agents --file AGENTS.md
codexopt optimize skills --glob ".codex/skills/**/SKILL.md"
codexopt apply --kind skills --dry-run
codexopt report --output codexopt-report.mdThis demo is meant to mirror how a team would use CodexOpt in a real repository.
Inputs in this demo:
AGENTS.md- demo skills under
.codex/skills/ - repo task evidence in
tasks.md - recurring feedback themes in
issues.md
Suggested flow:
- Run
benchmarkto get a baseline score plus feedback. - Run
optimize agentsandoptimize skills. - Review
.codexopt/runs/*/optimize.jsonand generated reports. - Use
apply --dry-runbefore writing any changes.
Example:
cp codexopt.gepa.example.yaml codexopt.yaml
codexopt --config codexopt.yaml benchmark
codexopt --config codexopt.yaml optimize agents
codexopt --config codexopt.yaml optimize skills
codexopt apply --kind agents --dry-run
codexopt --config codexopt.yaml report --output codexopt-report.mdCommand reference used in the demo:
cd /path/to/codexopt-demo
export GEMINI_API_KEY="YOUR_REAL_KEY"
export GOOGLE_API_KEY="$GEMINI_API_KEY"
rm -rf .codexopt codexopt-report.md
ls
codexopt --config codexopt.gepa.example.yaml benchmark
codexopt --config codexopt.gepa.example.yaml optimize agents --engine heuristic --file AGENTS.md
codexopt --config codexopt.gepa.example.yaml optimize skills --engine heuristic --glob ".codex/skills/**/SKILL.md"
codexopt apply --kind agents --dry-run
codexopt apply --kind skills --dry-run
codexopt --config codexopt.gepa.example.yaml report --output codexopt-report.md
sed -n '1,120p' codexopt-report.md
codexopt --config codexopt.gepa.example.yaml optimize agents \
--engine gepa \
--reflection-model gemini/gemini-2.5-pro \
--max-metric-calls 2 \
--file AGENTS.mdbenchmark: baseline score plus evidence-aware feedbackoptimize agents: optimizeAGENTS.mdoptimize skills: optimize demo skill filesapply --dry-run: preview changes without writing filesreport: generate a markdown summary from the latest runsoptimize ... --engine gepa: optional low-budget GEPA example with Gemini 2.5 Pro
To benchmark against repo tasks and issue themes, copy the demo config first:
cp codexopt.gepa.example.yaml codexopt.yaml
codexopt --config codexopt.yaml benchmarkThat config enables:
tasks.mdas task evidenceissues.mdas recurring feedback evidence
The benchmark and report artifacts will then include:
- criterion sub-scores
- natural-language feedback
- task/issue evidence counts
The current demo shows evidence-aware instruction optimization. It does not yet run full agent task simulations from tasks.md; those tasks currently shape scoring and feedback.
Use this example file:
codexopt.gepa.example.yaml
cp codexopt.gepa.example.yaml codexopt.yamlEdit codexopt.yaml:
evidence:
task_files:
- tasks.md
issue_files:
- issues.md
optimization:
engine: "gepa"
max_metric_calls: 120
reflection_model: "your-provider/your-reflection-model"GEPA in CodexOpt is model-agnostic. You can use OpenAI, Gemini, local models, or other GEPA/LiteLLM-compatible providers for reflection and candidate feedback.
OpenAI example:
export OPENAI_API_KEY="YOUR_KEY"optimization:
engine: "gepa"
reflection_model: "openai/gpt-5-mini"Gemini example:
export GEMINI_API_KEY="YOUR_KEY"
export GOOGLE_API_KEY="$GEMINI_API_KEY"optimization:
engine: "gepa"
reflection_model: "gemini/gemini-2.5-pro"codexopt --config codexopt.yaml optimize agents
codexopt --config codexopt.yaml optimize skillscodexopt optimize skills \
--engine gepa \
--reflection-model your-provider/your-reflection-model \
--max-metric-calls 200Current CodexOpt exposes GEPA tuning via max_metric_calls and reflection_model.
A direct iterations field is not exposed yet; use max_metric_calls as the primary search-budget control.
If GEPA is unavailable or the requested model path fails, CodexOpt records that fallback in the optimization artifact and report.
For step-by-step local and cloud GEPA setup (including low-budget runs), see:
docs/gepa-local-and-cloud.md
cd /path/to/codexopt-demo
export GEMINI_API_KEY="YOUR_REAL_KEY"
export GOOGLE_API_KEY="$GEMINI_API_KEY"
rm -rf .codexopt codexopt-report.md
ls
codexopt --config codexopt.gepa.example.yaml benchmark
codexopt --config codexopt.gepa.example.yaml optimize agents --engine heuristic --file AGENTS.md
codexopt --config codexopt.gepa.example.yaml optimize skills --engine heuristic --glob ".codex/skills/**/SKILL.md"
codexopt apply --kind agents --dry-run
codexopt apply --kind skills --dry-run
codexopt --config codexopt.gepa.example.yaml report --output codexopt-report.md
sed -n '1,120p' codexopt-report.md
codexopt --config codexopt.gepa.example.yaml optimize agents \
--engine gepa \
--reflection-model gemini/gemini-2.5-pro \
--max-metric-calls 2 \
--file AGENTS.md