feat(evolve): default to gepa improvement-or-equal acceptance criterion (Path D)#74
Merged
Merged
Conversation
Adds the gepa_acceptance string ("strict" or "improvement_or_equal")
to the run_inputs payload so a third party holding only
gate_decision.json can tell which acceptance criterion produced the
run. Threaded through all 5 build_run_inputs call sites; updated
schema tests for the new key.
The shipped --gepa-acceptance flag offered "strict" as a choice, but
gepa.optimize rejects that string and only accepts "strict_improvement"
or "improvement_or_equal". The smoke surfaced this: a --gepa-acceptance
strict run raised ValueError("Unknown acceptance_criterion: strict")
inside gepa, triggering the MIPROv2 fallback path.
Rename the CLI choice to "strict-improvement" so the hyphen→underscore
conversion produces gepa's canonical "strict_improvement". Update tests,
config docstring, help text, calibration_findings reference.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Defaults
acceptance_criteriontoimprovement_or_equal(was implicit strict in gepa<0.1.2) for GEPA's minibatch acceptance test. Adds--gepa-acceptance {strict-improvement,improvement-or-equal}CLI flag onevolve_skillandevolve_tool, plumbed throughdspy.GEPA'sgepa_kwargspassthrough togepa.optimize. Closes the last remaining downstream item from the deploy-gate arc.Why
GEPA's minibatch acceptance test at
gepa/core/engine.py:493historically hard-coded strict improvement (new_sum > old_sum). Under LM-judge noise on small minibatches (3-8 examples), this rejects "true zero-difference" candidates roughly half the time, narrowing the search and reducing downstream Pareto-frontier diversity.The strict-vs-non-strict choice has no motivation in the GEPA paper (arXiv:2507.19457): Algorithm 1 just says "if σ′ improved" — undefined operator, no ablation, no discussion of minibatch noise. The paper's first author (Lakshya Agrawal) shipped this as a configurable choice in gepa-ai/gepa#304 with the explicit rationale that improvement-or-equal "allow[s] lateral moves that don't improve the score but may explore different regions of the solution space."
Adjacent literature (Beyer 2000, Aizawa & Wah 1994, Rakshit et al. 2017) treats strict-elitist acceptance under noisy fitness as a known anti-pattern. Improvement-or-equal is the lightest-touch mainstream fix.
Empirical validation (A/B smoke)
Same skill (nano-pdf), same seed (42), same everything else — only
--gepa-acceptancevaries:run_inputs.gepa_acceptance(sanity)strict_improvement✓improvement_or_equal✓Improvement-or-equal accepted 2x as many candidates — exactly the prediction from theory. More candidates → better Pareto front → +0.041 holdout improvement and a higher bootstrap lower bound, at no extra cost.
Dependency setup
The gepa PR landed 2026-04-06 but no PyPI release contains it yet (latest gepa==0.1.1 was uploaded 2026-03-16; DSPy 3.2.1 still pins gepa==0.0.27). Bridged by:
gepato PR #304's merge SHA5e24ee5c8e1857a62a1ba19731de9da45ffb6f1b[tool.uv] override-dependenciesto bypass DSPy 3.2.0's hard-pin ongepa[dspy]==0.0.27Documented inline in
pyproject.toml. When gepa 0.1.2 ships (or DSPy bumps via stanfordnlp/dspy#9673, which is merged but unreleased), the git pin + override can be swapped to a version pin in a one-line change.Bug caught during validation
Initial CLI choice was
["strict", "improvement-or-equal"]. The hyphen-to-underscore conversion produced"strict"for the first option, but gepa rejects unknown criteria — only"strict_improvement"and"improvement_or_equal"are valid. Caught by the A/B smoke (first attempt's strict run fell back to MIPROv2 → optuna ImportError), fixed by renaming the CLI value tostrict-improvement.Test plan
gepa_kwargspassthrough at the DSPy constructor (skill + tool sides, both criteria)run_inputsrecords thegepa_acceptancevalue for forensic replay--helpshows both choices with explanationFiles
pyproject.toml— git pin gepa to PR #304 merge SHA + uv override-dependenciesevolution/core/config.py—gepa_acceptance: str = "improvement_or_equal"fieldevolution/skills/evolve_skill.py+evolution/tools/evolve_tool.py— CLI flag, plumbingevolution/core/run_inputs.py— recordgepa_acceptancein the run_inputs payloadtests/{skills,tools}/test_evolve_*_validation_flow.py— passthrough regression teststests/core/test_run_inputs.py— new field assertionreports/calibration_findings.md— Path D section with the rationale