Skip to content

feat(io.yaml): factor model_from_yaml_data out of read_yaml_model + back-port docs / known-issues / localization / scripts#11

Merged
edkerk merged 29 commits into
developfrom
feature/quality-and-matlab-docs
May 30, 2026
Merged

feat(io.yaml): factor model_from_yaml_data out of read_yaml_model + back-port docs / known-issues / localization / scripts#11
edkerk merged 29 commits into
developfrom
feature/quality-and-matlab-docs

Conversation

@edkerk
Copy link
Copy Markdown
Member

@edkerk edkerk commented May 29, 2026

This branch is several commits ahead of main. The change that motivates landing it now is the YAML refactor at the tip (b20a89e):

YAML refactor (commit b20a89e)

read_yaml_model now opens+parses the file then delegates the post-parse work (capturing per-entry side-fields onto notes, restoring legacy metaData id/name, stashing unknown top-level sections onto model.notes['_yaml_sections']) to a new model_from_yaml_data(raw: dict) helper.

This lets downstream packages that need to pre-normalise their YAML before cobra reads it hand the cleaned dict directly to the post-parse pipeline, without round-tripping through a temp file.

Why now: geckopy is preparing to delegate its save_ec_model / load_ec_model to raven-python (currently in flight on geckopy's develop). geckopy still needs to pre-normalise legacy MATLAB ec-model quirks (top-level per-metabolite smiles → annotation, bare-- sequence-of-single-key-maps → mapping merge) before cobra reads, so calling read_yaml_model(path) doesn't fit. model_from_yaml_data(cleaned_dict) is the natural seam.

Both functions are exported from raven_python.io.yaml. Pure refactor on the read side; no behaviour change for existing read_yaml_model callers.

Also on this branch

Five earlier commits unrelated to the YAML refactor (RAVEN back-port docs, known-issues catalogue, localization benchmark, scripts). They're all additive and already on the feature branch; landing them on main makes the commit graph the natural shape.

Safe to squash or rebase per your usual preference.

edkerk added 29 commits May 29, 2026 22:45
read_yaml_model now opens+parses the file then delegates the
post-parse work (capturing per-entry side-fields onto notes,
restoring legacy metaData id/name, stashing unknown top-level
sections onto model.notes['_yaml_sections']) to a new
model_from_yaml_data(raw: dict) helper.

This lets downstream packages that need to pre-normalise their YAML
before cobra reads it (e.g. geckopy, which lifts legacy MATLAB
ec-model quirks like top-level per-metabolite `smiles` into
`annotation` and merges bare-`-` sequence-of-single-key-maps
back to a mapping) hand the cleaned dict directly to the post-parse
pipeline, without round-tripping through a temp file.

Both functions are exported from raven_python.io.yaml. Pure
refactor on the read side; no behaviour change for existing
read_yaml_model callers.
cobra's model_to_dict serialises model.notes verbatim into the output
doc as the 'notes' section. write_yaml_model already pops these three
management keys from a local copy of model.notes to use them as
top-level YAML fields, but the originals remained on model.notes and
therefore also leaked into doc['notes'], producing duplicate sections
in the file (the legitimate top-level emit AND a nested copy inside
notes).

Strip them from doc['notes'] post-model_to_dict and drop the notes
section entirely when nothing else is left. Discovered while
round-tripping a geckopy ecModel (it stashes ec-rxns / ec-enzymes /
gecko_light on model.notes['_yaml_sections']); was visible as
duplicated GECKO sections in the written YAML.
…d-matlab-docs

# Conflicts:
#	src/raven_python/io/yaml.py
@edkerk edkerk changed the base branch from main to develop May 30, 2026 00:05
@edkerk edkerk merged commit 8c7672e into develop May 30, 2026
8 checks passed
@edkerk edkerk deleted the feature/quality-and-matlab-docs branch May 30, 2026 00:08
edkerk added a commit to SysBioChalmers/geckopy that referenced this pull request May 30, 2026
save_ec_model and load_ec_model now use raven-python for the cobra-side
round-trip and the opaque preservation of GECKO top-level keys
(ec-rxns, ec-enzymes, gecko_light, metaData). geckopy still owns the
typed EcData interpretation, the legacy MATLAB ec-model normalisation
(bare-`-` YAML sequences, top-level per-met smiles -> annotation,
metaData id/name/version -> top level), and the protein-direction
flip for older models with reverse-sign usage_prot_* reactions.

save_ec_model: builds ec-rxns / ec-enzymes / gecko_light from EcData
(omitting empty source/notes/eccodes, kcat==0 == 'no kcat assigned',
omitting NaN mw/concs and empty sequence), pre-coerces numpy/ruamel
scalars via _to_native (raven-python's writer only coerces the
cobra-shaped portion), stashes everything on model.notes within a
try/finally so the caller's model is restored after the write, then
delegates to raven_python.io.yaml.write_yaml_model.

load_ec_model: keeps its own _read_yaml (handles legacy bare-`-`
sequence-of-single-key-maps), applies _normalize_legacy_layout
(metaData lifting + smiles -> annotation), hands the cleaned dict to
raven_python.io.yaml.model_from_yaml_data, then reads GECKO sections
back off cobra_model.notes['_yaml_sections'] to build the typed
EcData. _flip_legacy_prot_direction stays in geckopy.

Requires raven_python >= the commit adding model_from_yaml_data
(currently pending PR SysBioChalmers/raven-python#11).
geckopy/develop should not be pushed until that PR merges.

docs/raven_integration.md updated (corrects the earlier note saying
geckopy had no YAML I/O; in fact it always had it, and now delegates
the heavy lifting to raven-python). Full suite: 1245 passed, 1 xfailed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant