feat(io.yaml): factor model_from_yaml_data out of read_yaml_model + back-port docs / known-issues / localization / scripts#11
Merged
Conversation
read_yaml_model now opens+parses the file then delegates the post-parse work (capturing per-entry side-fields onto notes, restoring legacy metaData id/name, stashing unknown top-level sections onto model.notes['_yaml_sections']) to a new model_from_yaml_data(raw: dict) helper. This lets downstream packages that need to pre-normalise their YAML before cobra reads it (e.g. geckopy, which lifts legacy MATLAB ec-model quirks like top-level per-metabolite `smiles` into `annotation` and merges bare-`-` sequence-of-single-key-maps back to a mapping) hand the cleaned dict directly to the post-parse pipeline, without round-tripping through a temp file. Both functions are exported from raven_python.io.yaml. Pure refactor on the read side; no behaviour change for existing read_yaml_model callers.
cobra's model_to_dict serialises model.notes verbatim into the output doc as the 'notes' section. write_yaml_model already pops these three management keys from a local copy of model.notes to use them as top-level YAML fields, but the originals remained on model.notes and therefore also leaked into doc['notes'], producing duplicate sections in the file (the legitimate top-level emit AND a nested copy inside notes). Strip them from doc['notes'] post-model_to_dict and drop the notes section entirely when nothing else is left. Discovered while round-tripping a geckopy ecModel (it stashes ec-rxns / ec-enzymes / gecko_light on model.notes['_yaml_sections']); was visible as duplicated GECKO sections in the written YAML.
…d-matlab-docs # Conflicts: # src/raven_python/io/yaml.py
edkerk
added a commit
to SysBioChalmers/geckopy
that referenced
this pull request
May 30, 2026
save_ec_model and load_ec_model now use raven-python for the cobra-side round-trip and the opaque preservation of GECKO top-level keys (ec-rxns, ec-enzymes, gecko_light, metaData). geckopy still owns the typed EcData interpretation, the legacy MATLAB ec-model normalisation (bare-`-` YAML sequences, top-level per-met smiles -> annotation, metaData id/name/version -> top level), and the protein-direction flip for older models with reverse-sign usage_prot_* reactions. save_ec_model: builds ec-rxns / ec-enzymes / gecko_light from EcData (omitting empty source/notes/eccodes, kcat==0 == 'no kcat assigned', omitting NaN mw/concs and empty sequence), pre-coerces numpy/ruamel scalars via _to_native (raven-python's writer only coerces the cobra-shaped portion), stashes everything on model.notes within a try/finally so the caller's model is restored after the write, then delegates to raven_python.io.yaml.write_yaml_model. load_ec_model: keeps its own _read_yaml (handles legacy bare-`-` sequence-of-single-key-maps), applies _normalize_legacy_layout (metaData lifting + smiles -> annotation), hands the cleaned dict to raven_python.io.yaml.model_from_yaml_data, then reads GECKO sections back off cobra_model.notes['_yaml_sections'] to build the typed EcData. _flip_legacy_prot_direction stays in geckopy. Requires raven_python >= the commit adding model_from_yaml_data (currently pending PR SysBioChalmers/raven-python#11). geckopy/develop should not be pushed until that PR merges. docs/raven_integration.md updated (corrects the earlier note saying geckopy had no YAML I/O; in fact it always had it, and now delegates the heavy lifting to raven-python). Full suite: 1245 passed, 1 xfailed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This branch is several commits ahead of main. The change that motivates landing it now is the YAML refactor at the tip (
b20a89e):YAML refactor (commit
b20a89e)read_yaml_modelnow opens+parses the file then delegates the post-parse work (capturing per-entry side-fields onto notes, restoring legacy metaData id/name, stashing unknown top-level sections ontomodel.notes['_yaml_sections']) to a newmodel_from_yaml_data(raw: dict)helper.This lets downstream packages that need to pre-normalise their YAML before cobra reads it hand the cleaned dict directly to the post-parse pipeline, without round-tripping through a temp file.
Why now: geckopy is preparing to delegate its
save_ec_model/load_ec_modelto raven-python (currently in flight on geckopy'sdevelop). geckopy still needs to pre-normalise legacy MATLAB ec-model quirks (top-level per-metabolitesmiles→ annotation, bare--sequence-of-single-key-maps → mapping merge) before cobra reads, so callingread_yaml_model(path)doesn't fit.model_from_yaml_data(cleaned_dict)is the natural seam.Both functions are exported from
raven_python.io.yaml. Pure refactor on the read side; no behaviour change for existingread_yaml_modelcallers.Also on this branch
Five earlier commits unrelated to the YAML refactor (RAVEN back-port docs, known-issues catalogue, localization benchmark, scripts). They're all additive and already on the feature branch; landing them on main makes the commit graph the natural shape.
Safe to squash or rebase per your usual preference.