refactor: regenerate baselines for the loader API refactor (q2mm#281 companion)#4
Merged
Merged
Conversation
Companion to ericchansen/q2mm PR. The loader API refactor (frozen-as- invariant + qfuerza_fresh/qfuerza_into split + load_system dispatch through q2mm/models/loaders.py) changes the contract every loader exposes: published OPT values are now preserved as-published instead of silently being overwritten by raw QFUERZA projections. That means every convergence baseline in this repo needed to be regenerated against the new loaders. Headline result: the ratio gate now passes for four of five systems (was two). | System | Ratio before | Ratio after | Gate after | |----------------|-------------:|------------:|:----------:| | ch3f | 1.000 | 1.000 | ✓ | | Rh-enamide | 1.05 | 1.07 | ✓ | | Pd-allyl | 1.09 | 1.10 | ✓ | | Heck relay | 1.30 | 1.30 | ✗ | | Pd 1,4-conj | 1.20 | **0.96**| **✓** | | Rh 1,4-conj | ~4 × 10³ | **1.04**| **✓** | Pd 1,4-conj and Rh 1,4-conj were previously misclassified as gate-failures because the QFUERZA overwrite was corrupting their published Wahlers OPT values, sending JaxLoss's inner geometry minimization into pathological regions. The pre-refactor rh-conjugate "non-determinism" tracked in q2mm#278 (ratios of 0.46 / 0.96 / ~4 × 10³ across sessions) is fully explained by this — the overwrite was chaotic. Per-category R² (published OPT values evaluated by the JAX engine, no QFUERZA): | System | R²(bond_len) | R²(bond_ang) | R²(eig_diag) | |----------------|:------------:|:------------:|:------------:| | Rh-enamide | 0.987 | 0.918 | 0.963 | | Heck relay | 0.980 | 0.781 | −12.6 | | Pd-allyl | 0.042 | 0.330 | −2.82 | | Pd 1,4-conj | 0.939 | −0.177 | −10.06 | | Rh 1,4-conj | 0.891 | 0.454 | −7.86 | Removed: - benchmarks/rh-enamide/convergence/rh-enamide_optimized.fld - benchmarks/pd-allyl-amination/convergence/pd-allyl_optimized.fld Both `.fld` files were produced by the pre-refactor loader (which QFUERZA-overwrote the published OPT values) and are no longer reproducible against the current loader. Fresh `_optimized.fld` files for all four gate-passing systems will land in a follow-up PR that runs end-to-end optimization against the refactored loader (tracked alongside ericchansen/q2mm#275). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Companion to ericchansen/q2mm#281 (loader API refactor).
TL;DR
Regenerates every convergence baseline against the refactored q2mm loaders. The headline result is that the ratio gate now passes for four of five published-FF systems (was two). Two systems — pd-conjugate and rh-conjugate — were previously misclassified as gate-failures because the pre-refactor loader was silently overwriting their published Wahlers OPT values with raw QFUERZA projections.
What changed
Per-category R² of the published OPT values, evaluated by the q2mm JAX engine:
Geometry reproduction is now strong (bond_length R² ≥ 0.89 for the published-OPT systems, except pd-allyl). The eigenmatrix R² is consistently negative — that is the real cross-engine MM3* ↔ JAX-engine gap, not a loader artifact.
What got removed
benchmarks/rh-enamide/convergence/rh-enamide_optimized.fldbenchmarks/pd-allyl-amination/convergence/pd-allyl_optimized.fldBoth
.fldfiles were produced by the pre-refactor loader (which QFUERZA-overwrote the published OPT values) and are no longer reproducible against the current loader. Fresh_optimized.fldfiles for all four gate-passing systems will land in a follow-up PR that runs end-to-end optimization against the refactored loader (alongside q2mm#275).Provenance
Every regenerated JSON carries the standard provenance block:
Companion PR
ericchansen/q2mm#281— must merge before this PR's data fully reflects committed q2mm SHAs (the convergence JSONs reference the refactor branch's HEAD).