feat(knee-point): drop selector for val-best after empirical no-op verification#73
Merged
Merged
Conversation
The helper was added on the assumption of list[list[float]] for val_subscores, but DSPy returns list[dict[int, float]] (sparse-coverage native). With the empirical finding that the knee-point epsilon band is a no-op on this corpus, the helper has no production call site and the wrong type signature. Cleanest action is to revert.
…st knee-point Regenerated calibration across nano-pdf, apple-notes, polymarket, and huggingface-hub at N*=250, ratio*=0.65 showed the epsilon-band knee-point selector picks GEPA's val-argmax 10/10 across five epsilon modes (1/n_val, 0.5/n_val, 2/n_val, 3/n_val, noise-estimated paired-bootstrap). The selector is a no-op for the default val-best path on this corpus. evolve_skill now branches the call site on --knee-point-strategy: val-best (default) skips select_knee_point entirely and uses details.candidates[best_idx] directly; smallest keeps the existing band-walk path for compression-bias users. evolve_tool has no strategy flag, so the call site reduces to the val-best short-circuit. The gate_decision.json knee_point block now carries either a full CandidatePick payload (smallest path) or a minimal deferred payload (fallback="gepa_default", band_roster=[]) so downstream calibration consumers don't crash on key access. Two saturation-preflight tests previously relied on a patched select_knee_point to inject a real CandidatePick; updated to configure the fake GEPA's compile() output with a real-shaped detailed_results namespace instead.
… no-op Add a 2026-05-24 update to Finding 3 documenting the regenerated four-skill (nano-pdf, apple-notes, polymarket, huggingface-hub) replay at N*=250, ratio*=0.65: 10 runs × 5 ε modes (1/n_val, 0.5/n_val, 2/n_val, 3/n_val, paired-bootstrap noise-estimated) all produced identical mean transfer error 0.0466 and 70% deploy rate. The selector is a no-op on val-best for this corpus; --knee-point-strategy smallest is preserved for compression-bias users.
Address review feedback on the val-best short-circuit: - Drop unused select_knee_point / CandidatePick / _knee_point_payload from evolve_tool.py (no smallest branch exists there). - Remove dead select_knee_point patches from 6 test sites where the default val-best routing no longer invokes the patched symbol. - Refresh --knee-point-epsilon and --knee-point-strategy help text so the CLI documentation matches the post-commit semantics. - Add a single line near the val-best short-circuit pointing at the smallest strategy as the recovery path for static-failure regressions.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Drops the knee-point ε-band selector for the
val-beststrategy (the default). The val-best path now defers directly to GEPA's `details.best_idx` — matching the GEPA paper's prescribed termination behavior. `--knee-point-strategy smallest` is unchanged and still routes through `select_knee_point` for compression-bias users.Reverts the noise-estimated ε helper introduced earlier this cycle — investigation showed it would have been a no-op too, and its type signature was incorrect relative to DSPy's actual `val_subscores` shape (`list[dict[int, float]]`, not `list[list[float]]`).
Why
A regenerated calibration campaign (10 runs across nano-pdf, apple-notes, polymarket, huggingface-hub at N*=250, ratio*=0.65) replayed the selector with 5 ε modes:
Every mode picks the same candidate in every run. 10/10 agreement with GEPA's val-argmax confirms the selector is a no-op on this corpus. Details in `reports/calibration_findings.md` Finding 3.
Behavior change
Intentional, narrow: when `band_size >= 2` AND `best_idx`'s candidate fails static validation AND another band member would pass, the old code picked the non-best candidate; the new val-best path will reject. The calibration's 10/10 pick == best_idx says the band walk was empirically never invoked. Users who need the band-walk recovery can switch to `--knee-point-strategy smallest`.
Commits
Test plan