Skip to content

Commit 41c2b72

Browse files
authored
feat(io.yaml): typed model.ec via EcData; absorb legacy GECKO normalisations (#12)
* feat(io.yaml): typed model.ec via EcData; absorb legacy GECKO normalisations Mirrors RAVEN MATLAB's readYAMLmodel.m / writeYAMLmodel.m, which populate the model.ec struct whenever the YAML defines it. Downstream consumers (geckopy / GECKO) operate on the populated struct rather than re-parsing the YAML themselves. New: src/raven_python/io/ec_data.py - EcData dataclass with the MATLAB-GECKO field shape (per-rxn arrays: rxns/kcat/source/notes/eccodes; per-enzyme arrays: genes/enzymes/mw/ sequence/concs; sparse rxn_enz_mat coupling; gecko_light flag). - ec_data_from_yaml_sections: parses ec-rxns/ec-enzymes/gecko_light into a typed EcData, validating that every enzyme referenced from an ec-rxns row exists in ec-enzymes (catches the common authoring bug where the two sections drift apart). - ec_data_to_yaml_sections: serialises an EcData back to the list-of-mappings YAML form. Empty source/notes/eccodes/sequence and NaN mw/concs are omitted to keep files compact; kcat is always written (0 == "no kcat assigned", matching MATLAB GECKO). - _canonicalize_eccodes / _eccodes_to_yaml handle the scalar-or-list YAML representation for EC numbers. Extended: src/raven_python/io/yaml.py - model_from_yaml_data now pulls ec-rxns / ec-enzymes / gecko_light out of the foreign-keys stash, builds an EcData, and attaches it as model.ec. Other unknown top-level keys still round-trip opaquely via model.notes['_yaml_sections']. - write_yaml_model now serialises model.ec to the top-level ec-rxns / ec-enzymes / gecko_light sections when present, and drops any stale ec-* in _yaml_sections so the file isn't ambiguous. - read_yaml_model also accepts the very old RAVEN shape where the document root is a bare `-` sequence of single-key mappings; the reader merges them into one dict before parsing. - model_from_yaml_data now also normalises two legacy ecModel quirks in line with MATLAB GECKO behaviour: * per-metabolite top-level `smiles` -> annotation['smiles'] (older writers placed SMILES at the metabolite top level); * `usage_prot_*` / `prot_pool_exchange` reactions with negative lower bound and swapped stoichiometry are flipped to the forward convention (warns once per load). Tests - tests/test_io_yaml_ec_data.py (new): 18 focused tests covering load (model.ec population, sentinel handling for omitted optional fields, gecko_light flag, eccodes scalar-or-list, no-ec models), save (sections emitted, NaN/empty omission, numpy-scalar coercion, stale _yaml_sections overridden), legacy quirks (top-level smiles lifted, reverse-direction prot flip with warn, bare-sequence root merge), and error paths (half-pair of ec-* sections, dangling enzyme reference). - tests/test_io_yaml.py (updated): RAVEN_DOC fixture grew a complete ec-rxns/ec-enzymes pair so the round-trip test now verifies typed EcData survives, not just an opaque _yaml_sections stash. * feat(io.ec_data): add EcData.validate + EcData.empty Two shape-management helpers that consumers (geckopy's pipeline, test fixtures) need on top of the raw dataclass: - validate(): raise ValueError when per-rxn array lengths, per-enzyme array lengths, or the rxn_enz_mat shape drift from one another. Cheap; callable after each mutation in a builder pipeline. - EcData.empty(n_rxns, n_enzymes, *, gecko_light=False): preallocate with the canonical sentinels (empty strings for the string fields, 0 for kcat, NaN for mw/concs, empty CSR matrix). Used by builders that allocate up-front and fill row by row. Both methods are shape-level operations, not algorithm, so they live with the dataclass rather than on a downstream consumer. Tests: 6 new EcData tests covering empty's sentinels, validate's three drift paths (per-rxn length, per-enzyme length, coupling-matrix shape), the empty -> validate round-trip, and the gecko_light flag on empty. * fix(io.ec_data): satisfy ruff (UP037, B905, I001) - UP037: drop the string-quoted forward reference on EcData.empty's return annotation; the module already uses `from __future__ import annotations`, so the bare class name is fine. - B905: zip(coo.row, coo.col, coo.data) now passes strict=True. The three arrays come from the same COO matrix and are guaranteed equal length; strict=True turns any future drift into a loud TypeError instead of silent truncation. - I001: drop the stray blank line between the import block and the first section comment in tests/test_io_yaml_ec_data.py.
1 parent 8c7672e commit 41c2b72

5 files changed

Lines changed: 998 additions & 27 deletions

File tree

src/raven_python/io/__init__.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,15 @@
1-
"""RAVEN-specific I/O: YAML (cobra + Metabolic Atlas / Human-GEM extensions), SIF,
2-
Excel export, and the Standard-GEM ``model/<fmt>/…`` git layout.
1+
"""RAVEN-specific I/O: YAML (cobra + Metabolic Atlas / Human-GEM extensions, plus
2+
the GECKO ec-model substructure), SIF, Excel export, and the Standard-GEM
3+
``model/<fmt>/…`` git layout.
34
"""
5+
from raven_python.io.ec_data import EcData
46
from raven_python.io.excel import export_to_excel
57
from raven_python.io.git import export_for_git
68
from raven_python.io.sif import export_model_to_sif
79
from raven_python.io.yaml import read_yaml_model, write_yaml_model
810

911
__all__ = [
12+
"EcData",
1013
"export_for_git",
1114
"export_model_to_sif",
1215
"export_to_excel",

0 commit comments

Comments
 (0)