Human-GEM cell-type model validation: raven-python vs RAVEN

Validation of raven-python's tINIT/ftINIT against MATLAB RAVEN on a real genome-scale reconstruction (Human-GEM) using the Hart2015 RNA-seq dataset (5 cell lines: DLD1, GBM, HCT116, HELA, RPE1). The goal is functional equivalence — do raven-python and RAVEN extract the same context-specific reaction sets from the same inputs?

Method

Template & inputs. RAVEN built the ftINIT reference model from Human-GEM (prepHumanModelForftINIT: remove drug/exchange/artificial reactions, set spontaneous/custom lists) and exported it as raven_refModel.xml (10198 reactions). raven-python builds on that same exported model, so the candidate reaction universe is identical and set comparison is exact.
Scoring. Gene scores from log2(TPM+1)-style expression via gene_scores_from_expression, mapped to reactions through the GPR (score_reactions_from_genes), matching RAVEN's getExprForRxnScore.
ftINIT. Series 1+1 (2 staged MILP steps). RAVEN run via ftINIT.m with Gurobi; raven-python via raven_python.init.ftinit with Gurobi (mip_gap=0.001, time_limit=600).
tINIT. raven-python get_init_model (classic single-MILP INIT) on HCT116, compared to the ftINIT result for the same cell line.
Tasks. Two raven-python ftINIT variants: no-task (expression only) and task-constrained (essential metabolic tasks, metabolicTasks_Essential.txt, force task-essential reactions to be kept). RAVEN's reference is task-constrained.
Solver. Gurobi 13.0.1 for both tools.

Engineering findings (raven-python tractability)

Getting ftINIT to run at genome scale surfaced three issues, all now fixed and matching RAVEN's design:

O(n²) constraint construction. Building the steady-state balances with Python sum() re-canonicalises a growing sympy expression at each term; hub metabolites (ATP/H⁺/H₂O in ~10³ reactions) made one constraint take ~minutes (≈154 s total build, benchmark: 1500-term sum = 59 s vs optlang.symbolics.add = 0.01 s). Fixed by building flat term lists once per reaction and summing with optlang.symbolics.add (in both ftINIT and tINIT).
Big-M too loose. The on/off indicator constraints used each reaction's own bound (±1000) as big-M; with force_on=0.1 that is a ~10⁴ ratio → very weak LP relaxation → Gurobi never closes the gap. RAVEN uses a fixed big-M = 100. Adopted.
Stoichiometric rescaling. A fixed big-M=100 is only valid if no reaction needs flux ≫100; ported RAVEN's rescaleModelForINIT (cap each reaction's coefficient dynamic range at 25×, normalise mean |coeff| to 1) into prep_init_model. Without it the staged MILP is infeasible (step-1 caps transports that step-0 used freely).

Net effect: a full ftINIT cell-line solve went from not finishing to ~200 s, comparable to RAVEN.

Results

Reaction counts

cell line	RAVEN ftINIT	raven-python ftINIT (no-task)	raven-python ftINIT (task)
DLD1	7782	7744	7774
GBM	7668	7667	7680
HCT116	7780	7752	7776
HELA	7832	7789	7816
RPE1	7569	7564	7570

Counts agree within ~0.5 % everywhere; the task-constrained run is closest (e.g. RPE1 7570 vs 7569, HCT116 7776 vs 7780). raven-python tINIT (HCT116) gives 6024 reactions — a smaller model, as expected from the different (classic INIT) objective.

Agreement — raven-python (no-task) ftINIT vs RAVEN ftINIT

cell line	shared	only raven-python	only RAVEN	Jaccard
DLD1	7667	77	115	0.976
GBM	7562	105	106	0.973
HCT116	7675	77	105	0.977
HELA	7707	82	125	0.974
RPE1	7470	94	99	0.975

~97.5 % of reactions are identical between the two independent implementations, even though this run is expression-only while RAVEN's reference is task-constrained. The "only RAVEN" surplus (≈99–125) is expected to include task-essential reactions that the task-constrained run (below) recovers.

Agreement — raven-python (task-constrained) ftINIT vs RAVEN ftINIT

cell line	shared	only raven-python	only RAVEN	Jaccard
DLD1	7699	75	83	0.980
GBM	7588	92	80	0.978
HCT116	7696	80	84	0.979
HELA	7735	81	97	0.978
RPE1	7493	77	76	0.980

Adding the essential metabolic tasks (same task list RAVEN uses) raises agreement to Jaccard 0.978–0.980 and makes the disagreement symmetric (only-raven-python ≈ only-RAVEN ≈ 80), confirming the prediction: the task constraints recover RAVEN's task-essential reactions. The residual ≈80 reactions each way out of ~7700 is at the level expected from MIP-gap tolerance (both accept near-optimal incumbents) and alternate optima.

raven-python tINIT vs ftINIT (HCT116)

tINIT 6024 rxns vs ftINIT 7752; shared 5957, Jaccard 0.762. tINIT is nearly a subset (only 67 reactions unique to it) — the two methods agree on a common core, with ftINIT keeping more (its staged formulation and task handling are less aggressive at removal). This matches the expected tINIT/ftINIT relationship rather than indicating a defect.

Conclusions

From identical inputs on a genome-scale human reconstruction, raven-python reproduces RAVEN's ftINIT reaction selection to 97.5 % (no-task) and 98 % (task-constrained) set identity across five cell lines — strong evidence of functional equivalence between the two independent implementations. Agreement is symmetric and the residual (~80 reactions each way) is consistent with MIP near-optimality and alternate optima rather than any systematic divergence.

Reaching genome-scale tractability required matching RAVEN's numerical-conditioning choices and fixing optlang-specific construction costs (see Engineering findings): fixed big-M = 100, rescaleModelForINIT, optlang.symbolics.add instead of Python sum() in every MILP build (ftINIT, tINIT, and the gap-fill). With these, a task-constrained cell-line model builds in ~15–25 min (dominated by the essential-forced staged MILP) and a no-task one in ~3 min, comparable to RAVEN.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Human-GEM cell-type model validation: raven-python vs RAVEN

Method

Engineering findings (raven-python tractability)

Results

Reaction counts

Agreement — raven-python (no-task) ftINIT vs RAVEN ftINIT

Agreement — raven-python (task-constrained) ftINIT vs RAVEN ftINIT

raven-python tINIT vs ftINIT (HCT116)

Conclusions

FilesExpand file tree

humangem_validation.md

Latest commit

History

humangem_validation.md

File metadata and controls

Human-GEM cell-type model validation: raven-python vs RAVEN

Method

Engineering findings (raven-python tractability)

Results

Reaction counts

Agreement — raven-python (no-task) ftINIT vs RAVEN ftINIT

Agreement — raven-python (task-constrained) ftINIT vs RAVEN ftINIT

raven-python tINIT vs ftINIT (HCT116)

Conclusions