Skip to content

feat: --gepa-minibatch-size CLI flag (Path E)#65

Merged
jramos merged 1 commit into
mainfrom
path-e-larger-minibatch
May 22, 2026
Merged

feat: --gepa-minibatch-size CLI flag (Path E)#65
jramos merged 1 commit into
mainfrom
path-e-larger-minibatch

Conversation

@jramos
Copy link
Copy Markdown
Owner

@jramos jramos commented May 22, 2026

Summary

  • Exposes GEPA's existing reflection_minibatch_size kwarg as --gepa-minibatch-size on both evolve_skill and evolve_tool. Default unchanged at 3 (matches GEPA's own default; no behavior change for existing scripts/CI). Users hitting the saturation pre-flight's weak_signal band now get the panel telling them to bump to 8 + pipeline-specific budget compensation.
  • New EvolutionConfig.reflection_minibatch_size field (canonical home, auto-serialized into metrics.json).
  • Post-dataset-build trainset-ceiling guard that aborts at startup with an actionable message if --gepa-minibatch-size exceeds the trainset size. Without the guard, GEPA's EpochShuffledBatchSampler asserts mid-optimization at gepa/strategies/batch_sampler.py:71.
  • Updated saturation_check.py's weak_signal suggestions to recommend the new flag concretely (replaces the "Path E follow-up would help once landed" placeholder).
  • Help text is pipeline-aware: tool side recommends --iterations bump (uses max_full_evals); skill side recommends --budget heavy (uses auto).

Background

Implements Path E from reports/pareto_frontier_feasibility.md. Spike #2 showed that on a saturated baseline with ~15% behavioral failure rate, GEPA's sum(subsample_scores) acceptance gate rejected all 40 candidate proposals over 116 iterations — because a random 3-example minibatch contains a discriminating example only ~34% of the time (hypergeometric, K=7 / N=56). Bumping to 8 raises that to ~68%. Path E ships the user-facing knob; Path F (saturation pre-flight, already merged) is the user-facing surface that recommends it.

Test plan

  • Pull, run uv run pytest -q — expect 1082 passed (up from 1078 pre-branch, +4 new tests across TestGepaMinibatchSizeFlag in both pipelines).
  • Inspect help text: uv run python -m evolution.tools.evolve_tool --help | grep -A 12 gepa-minibatch shows the tool-side help mentioning --iterations; uv run python -m evolution.skills.evolve_skill --help | grep -A 12 gepa-minibatch shows the skill-side mentioning --budget heavy.
  • Invoke with --gepa-minibatch-size 1000 against a tiny synthetic dataset; expect a clean exit 1 with "exceeds trainset size" in stderr instead of a deep GEPA assertion.
  • (Optional, ~$35) Multi-seed smoke against the spike fix: bump DSPy 3.0→3.2 and make GEPA actually run #2 saturated baseline: 3 seeds at --gepa-minibatch-size 8 (success: ≥2 of 3 produce at least one accepted proposal) + 1 reverse-control at --gepa-minibatch-size 3 (success: reproduces the spike fix: bump DSPy 3.0→3.2 and make GEPA actually run #2 all-rejected pattern). Verifies the mechanism actually moves selection.

Scope notes

  • Path D (Pareto-dominance acceptance) and Path C (stratified sampling) remain future work. Path C is shippable without an upstream PR via dspy.GEPA(gepa_kwargs={"batch_sampler": StratifiedBatchSampler(...)}) if Path E proves insufficient on harder cases.
  • Type-design refactor from PR feat: saturation pre-flight for evolve_skill and evolve_tool #64 review (bundle closed_loop_* triple, DEFAULT_THRESHOLDS → frozen dataclass) is a separate follow-up, not in this PR.

Exposes GEPA's existing reflection_minibatch_size kwarg as a CLI flag
so users hitting the saturation pre-flight's weak_signal band can
widen the sampling window without an upstream PR.

Background: GEPA's acceptance gate is sum(subsample_scores) over a
small random minibatch (gepa/core/engine.py:491-493). At the default
minibatch=3, on a saturated baseline with ~15% failure rate, the
discriminating examples appear in only ~34% of minibatches (spike #2
in reports/pareto_frontier_feasibility.md: 40 proposals rejected
over 116 GEPA iterations with sum(N.0) not better than N.0 patterns).
Bumping minibatch to 8 raises that probability to ~68%, giving the
acceptance gate the contrast it needs.

Changes:
- evolution/core/config.py: new EvolutionConfig.reflection_minibatch_size
  field (default 3, matches GEPA's own default).
- evolution/skills/evolve_skill.py + evolution/tools/evolve_tool.py:
  new --gepa-minibatch-size click option with IntRange(min=1)
  validation. Threaded through main → evolve → EvolutionConfig →
  dspy.GEPA(reflection_minibatch_size=...). Help text is
  pipeline-aware: tool side recommends --iterations bump (uses
  max_full_evals); skill side recommends --budget heavy (uses auto).
- Both pipelines: post-dataset-build guard that aborts at startup if
  --gepa-minibatch-size exceeds the trainset size, with an actionable
  message. Without the guard, GEPA's EpochShuffledBatchSampler asserts
  mid-optimization at gepa/strategies/batch_sampler.py:71.
- evolution/core/saturation_check.py: weak_signal band suggestions now
  recommend the specific flag (replaces the "Path E follow-up would
  help once landed" placeholder).
- Tests: new TestGepaMinibatchSizeFlag classes in both pipelines —
  patches dspy.GEPA.__init__ to verify the kwarg reaches
  self.reflection_minibatch_size post-construction (catches future
  DSPy renames), plus a test that the trainset-ceiling guard fires
  with the expected message and exit code 1.

Default unchanged at 3: no behavior change for existing scripts /
CI. Users hitting weak_signal get the panel telling them to bump to
8 + pipeline-specific budget compensation.

Full suite: 1082 passed (was 1078 → +4 new tests). Verified locally
with env -i ... OPENAI_API_KEY=sk-fake-test-key uv run pytest to
match CI conditions.

Implements Path E from reports/pareto_frontier_feasibility.md. Path D
(Pareto-dominance acceptance) and Path C (stratified sampling)
remain future work; Path C is shippable without an upstream PR via
dspy.GEPA(gepa_kwargs={"batch_sampler": ...}) if Path E proves
insufficient on harder cases.
@jramos jramos merged commit 24f4e42 into main May 22, 2026
4 checks passed
@jramos jramos deleted the path-e-larger-minibatch branch May 22, 2026 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant