Commit 41c2b72
authored
feat(io.yaml): typed model.ec via EcData; absorb legacy GECKO normalisations (#12)
* feat(io.yaml): typed model.ec via EcData; absorb legacy GECKO normalisations
Mirrors RAVEN MATLAB's readYAMLmodel.m / writeYAMLmodel.m, which
populate the model.ec struct whenever the YAML defines it. Downstream
consumers (geckopy / GECKO) operate on the populated struct rather
than re-parsing the YAML themselves.
New: src/raven_python/io/ec_data.py
- EcData dataclass with the MATLAB-GECKO field shape (per-rxn arrays:
rxns/kcat/source/notes/eccodes; per-enzyme arrays: genes/enzymes/mw/
sequence/concs; sparse rxn_enz_mat coupling; gecko_light flag).
- ec_data_from_yaml_sections: parses ec-rxns/ec-enzymes/gecko_light
into a typed EcData, validating that every enzyme referenced from
an ec-rxns row exists in ec-enzymes (catches the common authoring
bug where the two sections drift apart).
- ec_data_to_yaml_sections: serialises an EcData back to the
list-of-mappings YAML form. Empty source/notes/eccodes/sequence and
NaN mw/concs are omitted to keep files compact; kcat is always
written (0 == "no kcat assigned", matching MATLAB GECKO).
- _canonicalize_eccodes / _eccodes_to_yaml handle the scalar-or-list
YAML representation for EC numbers.
Extended: src/raven_python/io/yaml.py
- model_from_yaml_data now pulls ec-rxns / ec-enzymes / gecko_light
out of the foreign-keys stash, builds an EcData, and attaches it
as model.ec. Other unknown top-level keys still round-trip
opaquely via model.notes['_yaml_sections'].
- write_yaml_model now serialises model.ec to the top-level
ec-rxns / ec-enzymes / gecko_light sections when present, and
drops any stale ec-* in _yaml_sections so the file isn't ambiguous.
- read_yaml_model also accepts the very old RAVEN shape where the
document root is a bare `-` sequence of single-key mappings; the
reader merges them into one dict before parsing.
- model_from_yaml_data now also normalises two legacy ecModel
quirks in line with MATLAB GECKO behaviour:
* per-metabolite top-level `smiles` -> annotation['smiles']
(older writers placed SMILES at the metabolite top level);
* `usage_prot_*` / `prot_pool_exchange` reactions with negative
lower bound and swapped stoichiometry are flipped to the
forward convention (warns once per load).
Tests
- tests/test_io_yaml_ec_data.py (new): 18 focused tests covering
load (model.ec population, sentinel handling for omitted optional
fields, gecko_light flag, eccodes scalar-or-list, no-ec models),
save (sections emitted, NaN/empty omission, numpy-scalar coercion,
stale _yaml_sections overridden), legacy quirks (top-level smiles
lifted, reverse-direction prot flip with warn, bare-sequence root
merge), and error paths (half-pair of ec-* sections, dangling
enzyme reference).
- tests/test_io_yaml.py (updated): RAVEN_DOC fixture grew a complete
ec-rxns/ec-enzymes pair so the round-trip test now verifies typed
EcData survives, not just an opaque _yaml_sections stash.
* feat(io.ec_data): add EcData.validate + EcData.empty
Two shape-management helpers that consumers (geckopy's pipeline,
test fixtures) need on top of the raw dataclass:
- validate(): raise ValueError when per-rxn array lengths, per-enzyme
array lengths, or the rxn_enz_mat shape drift from one another.
Cheap; callable after each mutation in a builder pipeline.
- EcData.empty(n_rxns, n_enzymes, *, gecko_light=False): preallocate
with the canonical sentinels (empty strings for the string fields,
0 for kcat, NaN for mw/concs, empty CSR matrix). Used by builders
that allocate up-front and fill row by row.
Both methods are shape-level operations, not algorithm, so they live
with the dataclass rather than on a downstream consumer.
Tests: 6 new EcData tests covering empty's sentinels, validate's
three drift paths (per-rxn length, per-enzyme length, coupling-matrix
shape), the empty -> validate round-trip, and the gecko_light flag
on empty.
* fix(io.ec_data): satisfy ruff (UP037, B905, I001)
- UP037: drop the string-quoted forward reference on EcData.empty's
return annotation; the module already uses `from __future__ import
annotations`, so the bare class name is fine.
- B905: zip(coo.row, coo.col, coo.data) now passes strict=True. The
three arrays come from the same COO matrix and are guaranteed
equal length; strict=True turns any future drift into a loud
TypeError instead of silent truncation.
- I001: drop the stray blank line between the import block and the
first section comment in tests/test_io_yaml_ec_data.py.1 parent 8c7672e commit 41c2b72
5 files changed
Lines changed: 998 additions & 27 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
2 | | - | |
| 1 | + | |
| 2 | + | |
| 3 | + | |
3 | 4 | | |
| 5 | + | |
4 | 6 | | |
5 | 7 | | |
6 | 8 | | |
7 | 9 | | |
8 | 10 | | |
9 | 11 | | |
| 12 | + | |
10 | 13 | | |
11 | 14 | | |
12 | 15 | | |
| |||
0 commit comments