feat: --gepa-minibatch-size CLI flag (Path E) by jramos · Pull Request #65 · jramos/agent-self-evolution

jramos · 2026-05-22T03:36:32Z

Summary

Exposes GEPA's existing reflection_minibatch_size kwarg as --gepa-minibatch-size on both evolve_skill and evolve_tool. Default unchanged at 3 (matches GEPA's own default; no behavior change for existing scripts/CI). Users hitting the saturation pre-flight's weak_signal band now get the panel telling them to bump to 8 + pipeline-specific budget compensation.
New EvolutionConfig.reflection_minibatch_size field (canonical home, auto-serialized into metrics.json).
Post-dataset-build trainset-ceiling guard that aborts at startup with an actionable message if --gepa-minibatch-size exceeds the trainset size. Without the guard, GEPA's EpochShuffledBatchSampler asserts mid-optimization at gepa/strategies/batch_sampler.py:71.
Updated saturation_check.py's weak_signal suggestions to recommend the new flag concretely (replaces the "Path E follow-up would help once landed" placeholder).
Help text is pipeline-aware: tool side recommends --iterations bump (uses max_full_evals); skill side recommends --budget heavy (uses auto).

Background

Implements Path E from reports/pareto_frontier_feasibility.md. Spike #2 showed that on a saturated baseline with ~15% behavioral failure rate, GEPA's sum(subsample_scores) acceptance gate rejected all 40 candidate proposals over 116 iterations — because a random 3-example minibatch contains a discriminating example only ~34% of the time (hypergeometric, K=7 / N=56). Bumping to 8 raises that to ~68%. Path E ships the user-facing knob; Path F (saturation pre-flight, already merged) is the user-facing surface that recommends it.

Test plan

Pull, run uv run pytest -q — expect 1082 passed (up from 1078 pre-branch, +4 new tests across TestGepaMinibatchSizeFlag in both pipelines).
Inspect help text: uv run python -m evolution.tools.evolve_tool --help | grep -A 12 gepa-minibatch shows the tool-side help mentioning --iterations; uv run python -m evolution.skills.evolve_skill --help | grep -A 12 gepa-minibatch shows the skill-side mentioning --budget heavy.
Invoke with --gepa-minibatch-size 1000 against a tiny synthetic dataset; expect a clean exit 1 with "exceeds trainset size" in stderr instead of a deep GEPA assertion.
(Optional, ~$35) Multi-seed smoke against the spike fix: bump DSPy 3.0→3.2 and make GEPA actually run #2 saturated baseline: 3 seeds at --gepa-minibatch-size 8 (success: ≥2 of 3 produce at least one accepted proposal) + 1 reverse-control at --gepa-minibatch-size 3 (success: reproduces the spike fix: bump DSPy 3.0→3.2 and make GEPA actually run #2 all-rejected pattern). Verifies the mechanism actually moves selection.

Scope notes

Path D (Pareto-dominance acceptance) and Path C (stratified sampling) remain future work. Path C is shippable without an upstream PR via dspy.GEPA(gepa_kwargs={"batch_sampler": StratifiedBatchSampler(...)}) if Path E proves insufficient on harder cases.
Type-design refactor from PR feat: saturation pre-flight for evolve_skill and evolve_tool #64 review (bundle closed_loop_* triple, DEFAULT_THRESHOLDS → frozen dataclass) is a separate follow-up, not in this PR.

Exposes GEPA's existing reflection_minibatch_size kwarg as a CLI flag so users hitting the saturation pre-flight's weak_signal band can widen the sampling window without an upstream PR. Background: GEPA's acceptance gate is sum(subsample_scores) over a small random minibatch (gepa/core/engine.py:491-493). At the default minibatch=3, on a saturated baseline with ~15% failure rate, the discriminating examples appear in only ~34% of minibatches (spike #2 in reports/pareto_frontier_feasibility.md: 40 proposals rejected over 116 GEPA iterations with sum(N.0) not better than N.0 patterns). Bumping minibatch to 8 raises that probability to ~68%, giving the acceptance gate the contrast it needs. Changes: - evolution/core/config.py: new EvolutionConfig.reflection_minibatch_size field (default 3, matches GEPA's own default). - evolution/skills/evolve_skill.py + evolution/tools/evolve_tool.py: new --gepa-minibatch-size click option with IntRange(min=1) validation. Threaded through main → evolve → EvolutionConfig → dspy.GEPA(reflection_minibatch_size=...). Help text is pipeline-aware: tool side recommends --iterations bump (uses max_full_evals); skill side recommends --budget heavy (uses auto). - Both pipelines: post-dataset-build guard that aborts at startup if --gepa-minibatch-size exceeds the trainset size, with an actionable message. Without the guard, GEPA's EpochShuffledBatchSampler asserts mid-optimization at gepa/strategies/batch_sampler.py:71. - evolution/core/saturation_check.py: weak_signal band suggestions now recommend the specific flag (replaces the "Path E follow-up would help once landed" placeholder). - Tests: new TestGepaMinibatchSizeFlag classes in both pipelines — patches dspy.GEPA.__init__ to verify the kwarg reaches self.reflection_minibatch_size post-construction (catches future DSPy renames), plus a test that the trainset-ceiling guard fires with the expected message and exit code 1. Default unchanged at 3: no behavior change for existing scripts / CI. Users hitting weak_signal get the panel telling them to bump to 8 + pipeline-specific budget compensation. Full suite: 1082 passed (was 1078 → +4 new tests). Verified locally with env -i ... OPENAI_API_KEY=sk-fake-test-key uv run pytest to match CI conditions. Implements Path E from reports/pareto_frontier_feasibility.md. Path D (Pareto-dominance acceptance) and Path C (stratified sampling) remain future work; Path C is shippable without an upstream PR via dspy.GEPA(gepa_kwargs={"batch_sampler": ...}) if Path E proves insufficient on harder cases.

jramos merged commit 24f4e42 into main May 22, 2026
4 checks passed

jramos deleted the path-e-larger-minibatch branch May 22, 2026 20:51

jramos mentioned this pull request May 23, 2026

test: weakened write_file fixture + ambiguous-task suite for Path E retro-validation #68

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: --gepa-minibatch-size CLI flag (Path E)#65

feat: --gepa-minibatch-size CLI flag (Path E)#65
jramos merged 1 commit into
mainfrom
path-e-larger-minibatch

jramos commented May 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jramos commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

Test plan

Scope notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jramos commented May 22, 2026 •

edited

Loading