Skip to content

Conversation

@bpowers
Copy link
Owner

@bpowers bpowers commented Jan 26, 2026

Summary

Pure Rust implementation of Vensim MDL file parsing, replacing the C++ xmutil dependency for MDL-to-datamodel conversion. This eliminates the need for Bison/Flex and a C++ toolchain, and enables MDL import in WASM builds.

  • Implements a complete MDL lexer, LALRPOP parser, and multi-pass AST-to-datamodel converter
  • Handles all equation types, subscripts/arrays, built-in functions, views/diagrams, groups, macros (parsing), and settings
  • Full view element comparison (UIDs, coordinates, link shapes, polarity) passes across all 12 test models
  • 453+ unit tests for parsing and conversion, plus integration equivalence tests against xmutil output

Key components

  • src/simlin-compat/src/mdl/ - Lexer, normalizer, LALRPOP parser, AST types, reader
  • src/simlin-compat/src/mdl/convert/ - AST-to-datamodel conversion (variables, stocks/flows, dimensions, groups)
  • src/simlin-compat/src/mdl/view/ - Sketch section parsing and view element conversion
  • src/simlin-compat/src/mdl/xmile_compat.rs - XMILE-compatible expression formatting
  • src/simlin-compat/src/mdl/settings.rs - Settings section parsing (integration type, unit equivalences)
  • src/simlin-compat/tests/mdl_equivalence.rs - Element-by-element equivalence tests against xmutil

Test plan

  • All 453+ MDL parser unit tests pass (cargo test -p simlin-compat mdl::)
  • All 12 equivalence test models produce identical output between native and xmutil paths, including full view element comparison
  • All existing simulation tests continue to pass (cargo test)
  • Pre-commit checks pass (fmt, clippy, tests, lint, tsc, WASM, pysimlin)

This implements the TokenNormalizer layer that sits between the
context-free RawLexer and the LALRPOP parser for Vensim MDL files.

Changes to lexer.rs:
- Renamed Token to RawToken and Lexer to RawLexer
- Removed context-tracking state (in_units_comment, in_units,
  saw_explicit_eq_end)
- Added Newline token for tabbed array row tracking
- Removed UnitsSymbol variant (now in normalized Token enum)
- RawLexer returns None at EOF instead of implicit EqEnd
- Added support for $ to start symbols (for unit symbols like $/Year)
- Fixed colon keyword span calculation to use actual consumed byte length
  instead of keyword string length (handles underscores/tabs/spaces)

New builtins.rs module:
- Implements to_lower_space() for XMUtil-compatible name canonicalization
- Provides is_builtin() to check if a symbol is a known Vensim function
- Includes is_tabbed_array() for keyword detection

New normalizer.rs module:
- TokenNormalizer tracks section state (equation/units/comment)
- Transforms Symbol to UnitsSymbol in units section with units_mode active
- Transforms Symbol to Function for known builtin function names
- Detects exact "WITH LOOKUP" match (case-insensitive, single space)
- Parses TABBED ARRAY constructs using Newline tokens for row boundaries
- Handles GET XLS/VDF special functions with to_lower_space canonicalization
  so GET_XLS, GET  XLS, etc. are all recognized
- Sign tokens in TABBED ARRAY allow newlines before the number (C++ treats
  newline as whitespace after a sign)
- GET XLS placeholders include closing ) and use actual source spans

The architecture now separates concerns clearly:
- RawLexer: pure tokenization, no context
- TokenNormalizer: context-sensitive transformations
- EquationReader (future): implicit EqEnd at EOF, comment extraction

This is Phase 1 of the MDL parser implementation. The types are
re-exported publicly from mod.rs for use by the LALRPOP parser in
the next phase. The parser will consume TokenNormalizer output and
the EquationReader will handle looping over equations and extracting
comments.
Implement Rust AST types for the Vensim MDL parser. These types will
be produced by the LALRPOP parser from the normalized tokens and later
converted to simlin_core::datamodel.

Key design decisions:
- Single Expr type with validation during conversion to datamodel
- Cow<'input, str> for zero-copy string references tied to input lifetime
- Own Loc type in mdl/ast.rs to keep the module self-contained
- Rust enums following simlin-engine's pattern
- Generic App for all calls since parser cannot distinguish lookups
  from functions/macros; resolution happens during conversion
- f64 for table values (NA/NAN don't appear in lookup/tabbed array data)
- Box<FullEquation> and Box<MacroDef> in MdlItem to satisfy clippy's
  large_enum_variant lint

The type hierarchy covers:
- Location tracking (Loc)
- Expressions (Expr, Subscript, UnaryOp, BinaryOp)
- Lookup tables (LookupTable)
- Left-hand sides (Lhs, ExceptList, InterpMode)
- Equations (Equation, SubscriptDef, SubscriptElement, SubscriptMapping)
- Units (UnitExpr, UnitRange, Units)
- Top-level items (FullEquation, Group, MacroDef, MdlItem)
Previously, only the exact "WITH LOOKUP" (single space) form was
recognized as Token::WithLookup. Forms like "WITH_LOOKUP" or
"WITH  LOOKUP" fell through to Token::Function.

Since WITH LOOKUP has special syntax (inline table as second argument)
that requires grammar support, all spacing variants should produce
the same token type. This uses the existing to_lower_space
canonicalization to handle underscore, multiple spaces, and tabs.

Also verified that all expression lists in the test corpus are numeric
(patterns like `a[DimA] = 1, 2, 3`), so the existing Equation::NumberList
variant covers all real-world cases.
Fix several issues identified during comparison with xmutil C++:

1. Legacy XY table semantics (High):
   - Fixed documentation: legacy format is `x1, x2, ..., xN, y1, y2, ..., yN`
     (flat vector split in half), NOT evenly spaced X from range
   - Added `transform_legacy()` method matching xmutil's `TransformLegacy()`
   - Added `add_raw()` method for legacy table parsing

2. Equation::Implicit behavior (High):
   - Added detailed documentation explaining that conversion must create a
     default table with points (0,1),(1,1) and TIME as input
   - Matches xmutil's AddTable() behavior when tbl is NULL

3. Extrapolation flag (Medium):
   - Added `extrapolate: bool` field to LookupTable
   - Set during conversion when LOOKUP EXTRAPOLATE or TABXL is used
   - Affects XMILE output (type="extrapolate" on gf element)

4. TabbedArray row boundaries (Medium):
   - Changed from Vec<Vec<f64>> to Vec<f64> (flat vector)
   - Matches xmutil's ExpressionNumberTable which ignores row boundaries
   - Updated normalizer to flatten tabbed array data

5. Trailing comma function calls (Low):
   - Documented that Expr::Literal("?") is produced for trailing comma
   - Matches xmutil's vpyy_literal_expression("?") behavior

The number list unary ops issue is already handled by the normalizer for
TABBED ARRAY, and will be handled by the parser for regular number lists
during parsing (fold +/- into numeric values).
Add a LALRPOP-based parser that consumes tokens from TokenNormalizer
and produces the AST types defined in ast.rs. The grammar structure
mirrors xmutil's VYacc.y for compatibility.

Key components:
- parser.lalrpop: Full grammar with expression precedence matching
  VYacc.y, support for all equation types (regular, lookup, data,
  implicit, subscript definitions, equivalence, tabbed array)
- reader.rs: EquationReader wrapper that drives the parser and handles
  raw comment text capture between second ~ and |, plus macro state
  tracking for :MACRO:/:END OF MACRO: blocks
- parser_helpers.rs: Helper functions for number parsing and equation
  construction (number lists vs regular equations)
- ast.rs: Added SectionEnd and ExprListResult types for parser returns

The EquationReader collects tokens until the second tilde or pipe,
then passes them to the LALRPOP parser. After the parser returns with
SectionEnd::Tilde, it captures the raw comment text from the source
and constructs the FullEquation with equation, units, and comment.

Includes 40 comprehensive tests covering expressions, number lists,
function calls, lookup tables, subscripts, units, and special markers.
Add the core conversion logic to transform parsed MDL AST directly to
simlin_core::datamodel::Project, bypassing the C++ xmutil dependency.
This completes Phase 6 of the MDL parser implementation plan documented
in src/simlin-compat/src/mdl/CLAUDE.md.

Key components:

convert.rs - Multi-pass AST to datamodel conversion:
- Symbol collection from parsed equations
- Dimension building with range expansion (A1-A10) and equivalences
- Variable type detection (stocks via top-level INTEG only)
- Flow linking using is_all_plus_minus algorithm (checks all equations per xmutil)
- Synthetic net flow generation for non-decomposable INTEG rates
- PurgeAFOEq logic for handling A FUNCTION OF placeholders
- Sim specs extraction from control variables (TIME STEP, etc.)
- TABXL detection for graphical function extrapolation marking
- Multi-dimensional number list conversion with Cartesian product
- Element-specific equation handling with subscript expansion

xmile_compat.rs - XMILE-compatible expression formatter:
- Function renames: IF THEN ELSE, LOG->LOG10/LN, ELMCOUNT->SIZE
- Argument reordering: DELAY N, SMOOTH N, RANDOM NORMAL
- Name formatting: spaces to underscores, TIME->STARTTIME mappings
- Complex transformations: PULSE, PULSE TRAIN, QUANTUM, ZIDZ/XIDZ
- Lookup invocation: Symbol calls become LOOKUP(table, input)

test_equivalence.rs - Integration tests:
- SIR.mdl parsing with stock/flow verification
- Simple model parsing with control variable filtering

The open_vensim_native() function now produces valid Project structures
with properly linked stocks/flows and extracted simulation specs.
Remaining work (Phases 7, 9-10) covers view/diagram parsing, model
post-processing, and settings section parsing.
Split the large convert.rs file (3402 lines) into a directory structure
with focused submodules for better maintainability:

- mod.rs (214 lines): Public API, ConversionContext struct, core methods
- types.rs (100 lines): ConvertError, VariableType, SymbolInfo, etc.
- helpers.rs (231 lines): Standalone helper functions
- dimensions.rs (553 lines): Dimension building and subscript expansion
- stocks.rs (1206 lines): Stock/flow linking and synthetic flow generation
- variables.rs (1200 lines): Variable building and equation construction

All 49 convert module tests preserved and distributed to their respective
modules. This reorganization improves code navigation and keeps individual
files within reasonable context window sizes for AI-assisted development.
Implements equivalence testing comparing open_vensim() (xmutil-based: MDL
-> XMILE -> datamodel) vs open_vensim_native() (pure Rust: MDL -> datamodel
directly) for MDL files under test/.

Uses static lists following the pattern in simulate.rs for curated control
over which tests run, easy tracking of known failures, and clear
progression documentation.

Key features:
- Normalizes projects for comparison, handling legitimate differences like
  identifier casing, spacing in expressions/units, model flattening
- Uses canonicalize() for proper variable sorting with quoted names
- Sorts arrayed equation elements by subscript key
- Uses name-based model/variable matching instead of positional zip
- Documents known feature gaps in native parser (unit definitions, views,
  loop_metadata, model-level sim_specs) with TODO comments

Currently 12 models pass equivalence testing. Models with known issues are
documented with explanations (ACTIVE INITIAL representation, subscript
handling differences, xmutil segfaults, etc.).

Deletes old test_equivalence.rs in favor of the new tests/mdl_equivalence.rs
structure.
The native MDL parser now correctly handles ACTIVE INITIAL by storing
the equation and initial value in separate fields, matching xmutil's
behavior:
- ACTIVE INITIAL(expr, initial) produces equation=expr, initial_equation=initial
- Works for scalar, apply-to-all, and arrayed equations

Previously, the native parser emitted INIT(expr, initial) as a single
equation string with initial_equation=None. The xmutil path via XMILE
correctly split these into separate fields.

Also fixes a pre-existing bug where the Arrayed equation element tuples
were incorrectly using the comment field instead of initial_equation.
The :SUPPLEMENTARY (or :SUP) flag was previously captured as raw text
in the documentation field. Now it is parsed as a structured boolean
field on FullEquation, since it has semantic meaning in Vensim.

The flag indicates that a variable is intentionally not connected to
feedback loops, suppressing Vensim's warning about unused variables.

MDL syntax example:
  profit = revenue - cost ~ $ ~ Companies Profits ~ :SUP |

The parser now:
- Detects :SUP or :SUPPLEMENTARY after the third tilde (case-insensitive)
- Strips the flag from the comment text
- Sets FullEquation.supplementary = true

The equivalence test's normalize_doc() function is updated to strip
the supplementary suffix from xmutil output for consistent comparison.
Port the supplementary flag parsing from the native Rust MDL parser
to the xmutil C++ code. Both parsers now handle the :SUP/:SUPPLEMENTARY
flag consistently:

- VensimLex::GetComment() detects the third tilde followed by :SUP or
  :SUPPLEMENTARY (case-insensitive)
- The flag is stripped from the comment text
- VensimLex tracks the flag via LastCommentHadSupplementary()
- VensimParse sets Variable::_supplementary when the flag is found
- Variable class gains _supplementary field with getter/setter

With both parsers handling supplementary the same way, the
strip_supplementary_suffix workaround in mdl_equivalence.rs tests
is no longer needed and has been removed.
Enable the as_sectors flag in xmutil_convert_mdl_to_xmile, which puts
all variables into a single main model instead of creating submodels
for Vensim groups. This matches the native parser's output structure.

With both parsers producing the same model structure, the equivalence
test's normalize_project function is simplified significantly:
- Remove the loop that collected variables from multiple models
- Remove skipping of Module variable references
- Remove synthetic main model creation
- Just filter stdlib models and normalize in place
…ion type

Add PostEquationParser in settings.rs to parse the settings section that
appears after the views/sketch section in MDL files. The parser:

- Finds the `:L<%^E!@` block marker (supports optional \x7F byte)
- Parses type 15 lines for integration method (comma-separated, 4th value
  is method code: 0,2 = Euler, 1,5 = RK4)
- Parses type 22 lines for unit equivalences (comma-separated: $,Name,Aliases)

The EquationReader now exposes a `remaining()` method that returns the
unparsed source after the equations section. ConversionContext uses this
to parse settings and populate:
- SimSpecs.sim_method from type 15
- Project.units from type 22

SimSpecsBuilder now accepts an optional sim_method field rather than
hardcoding Euler.
Add support for the Runge-Kutta 2 (RK2) integration method throughout
the codebase. Vensim uses method codes 3 and 4 for RK2, which was
previously being mapped to Euler.

Key changes:
- Add RungeKutta2 variant to SimMethod enum in core datamodel
- Propagate RK2 through protobuf schema, serialization, JSON/SDAI,
  and XMILE import/export
- Update MDL settings parser to map codes 3/4 to RK2
- Add parse_int_like_atoi() for C-compatible integer parsing that
  handles leading whitespace and stops at non-digits
- Add split_lines() helper for handling all line ending styles
  (LF, CRLF, and CR-only for old Mac files)
- Fix find_settings_block() to handle empty or short input without
  panicking
- Document the permissive marker detection strategy

The simulation engine still only implements Euler integration, but
the method is now correctly stored and serialized for models that
specify RK2.

31 new tests cover RK2 method codes, atoi-like parsing edge cases,
and various line ending formats.
Remove unnecessary normalization from the MDL equivalence tests by
systematically testing each normalization step. The parsers now produce
sufficiently similar output that many normalizations are no longer needed.

Removed normalization (parsers match directly):
- Unit equivalence case normalization and alias sorting
- Unit sorting in project
- Model sorting in project
- Variable sorting in models
- Arrayed element sorting in equations
- Inflow/outflow sorting in stocks
- Graphical function scale normalization
- Lookup equation placeholder normalization
- Element comment clearing in arrayed equations
- Documentation control comment clearing (~ :SUPPLEMENTARY)

Still required normalization:
- Dimension case normalization (xmutil lowercases, native preserves)
- Dimension sorting (order differs between parsers)
- sim_specs save_step normalization (xmutil sets None when equal to dt)
- Documentation line continuation removal and whitespace collapsing
- Unit space/underscore removal and parentheses normalization
- Expression case normalization (xmutil lowercases function names)
- Expression operator space removal
- Equation dimension sorting in ApplyToAll/Arrayed
- Arrayed->ApplyToAll conversion for repeated equations

This reduces the test file by ~125 lines while maintaining full coverage.
Move unit simplification logic from test code (mdl_equivalence.rs) into
the production MDL module. The new implementation mirrors xmutil's
UnitExpression approach using numerator/denominator lists.

The format_unit_expr() function now:
- Flattens UnitExpr AST into separate numerator/denominator lists
- Cancels matching terms (e.g., "A/A" becomes "1")
- Outputs canonical form with "/(X*Y)" for compound denominators

This allows removing the normalize_unit_parens() and simplify_units()
helper functions from the test code since both parsers now produce
identical output.
When a Vensim variable has units like `[0, 100]` (just a range, no unit
expression), the MDL parser now correctly treats this as dimensionless
units ("1") rather than no units (None).

This affects both regular variables and control variables (TIME STEP,
FINAL TIME), ensuring sim_specs.time_units is correctly set to "1" for
range-only units.

When extracting time_units from control variables, explicit units are
preferred over range-only "1". For example, if FINAL TIME has `[0, 100]`
(range-only) but TIME STEP has `Months` (explicit), the result is
"Months" not "1". When both have explicit units, FINAL TIME's units are
used (it is processed first in the control_vars order).

Added extract_units() helper to convert/helpers.rs for shared use by
both variables.rs and stocks.rs.

Added comprehensive tests for:
- Range-only units producing "1" (scalar, arrayed, time_units)
- Units from real equation when A FUNCTION OF placeholder is present
- TIME STEP vs FINAL TIME precedence for time_units extraction

This matches xmutil's behavior where UnitsRange() creates a "1" unit
expression when the input expression is NULL.
The mdl module in simlin-compat is a derivative work based on Bob
Eberlein's xmutil (https://github.com/bobeberlein/xmutil). This commit
adds the MIT license from xmutil and a README explaining the provenance
of the MDL parsing code, aligning the licensing with the original work.
The feature that enables Vensim MDL import via the xmutil C++ library
is now named "xmutil" to reflect what it actually enables. This is more
accurate since the feature controls whether the xmutil dependency is
included, not the format being parsed.

The function names like `open_vensim` and `simlin_project_open_vensim`
remain unchanged since they describe the format being opened, not the
implementation mechanism.
Add support for semantic/organizational groups that categorize model
variables into named sections like "Control Panel" or "Financial Sector".
This is distinct from visual diagram groups already in ViewElement::Group.

Phase 1 (datamodel):
- Add ModelGroup struct with name, doc, parent, members, run_enabled
- Add protobuf message and JSON serialization
- Add TypeScript ModelGroup class with Immutable.js Record pattern

Phase 2 (XMILE):
- Add SemanticGroup struct for parsing <group> elements in <views> section
- Handle groups in XMILE import/export when no diagram views exist
- Groups use <var> children with space_to_underbar formatted names

Phase 3 (MDL Import):
- Track groups during collect_symbols() with current_group_index
- Assign variables to groups on first equation only
- Group hierarchy via parent_index (each new group's parent is previous)
- AdjustGroupNames ensures unique names that don't conflict with symbols
- Support both {**name**} and ***\nname\n***| group marker formats

Phase 4 (Integration Tests):
- Test MDL -> datamodel -> XMILE -> datamodel preserves groups
- Test variables after macros still assigned to groups

The implementation follows xmutil's VensimParse.cpp algorithm for group
membership assignment and hierarchy determination.
The CLAUDE.md plan was out of sync with actual implementation progress.
Key updates:
- Phase 9 Groups: Marked as COMPLETE (was NOT STARTED)
- Phase 10 Settings: Already marked complete, added RK2 method support
- Phase 5 range-only units: Marked as implemented
- Test count: Updated from 221 to 430 tests
- Module structure: Updated to show convert/ as directory with submodules
- Removed view.rs from tree (doesn't exist yet)
- Fixed Implementation Order section (Settings was incorrectly marked NOT STARTED)
- Renumbered advanced features list to avoid duplicate item 9

The parser implementation is further along than documented - only Phase 7
(Views/Diagrams) and some Phase 9 post-processing remain.
This adds Phase 7 of the MDL parser implementation - view/diagram parsing.
The sketch section follows the equations section in Vensim MDL files and
starts with the `\\\---///` marker.

The implementation parses:
- View headers (V300/V364 versions, titles, font lines)
- Type 10: Variable elements (stocks, flows, auxiliaries)
- Type 11: Valve elements (flow gates)
- Type 12: Comment elements (clouds and annotations)
- Type 1: Connector elements (links between variables)

Key features:
- Handles scratch_name flag (bits & 4) where comment text is on next line
- atoi-style integer parsing matching xmutil behavior exactly
- Quoted string parsing with escape handling
- AngleFromPoints algorithm for curved connectors
- Flow point computation using connected stocks/clouds
- Ghost/primary variable tracking for aliases with cross-view offset support
- Multi-view merging into a single StockFlow view with group wrappers

The view parser handles both cases where the EqEnd marker (`\\\---///`)
is present in remaining source and where it was already consumed by the
EquationReader. This is done by looking for either the marker or a
version line (V300/V364) when starting to parse the sketch section.
Two bugs in the native MDL parser's view conversion were causing
element count mismatches with xmutil:

1. Ghost/alias detection: The is_primary check only compared view
   index, not the specific UID. This caused all variables in the same
   view to be considered "primary", preventing ghost variables from
   being converted to Alias elements. Fixed by checking both view_idx
   AND uid match.

2. Time variable inclusion: The built-in "Time" variable was being
   included in views along with its connectors, but xmutil filters
   these out since XMILE handles Time automatically. Added filtering
   for built-in system variables.

Added two new tests following TDD:
- test_ghost_variable_becomes_alias: verifies duplicate variable
  names create one Aux and one Alias
- test_time_variable_filtered_out: verifies Time and its connectors
  are excluded

All tested models now match xmutil output in element counts and types:
- SIR.mdl: 29 elements (alias=5, aux=6, flow=2, link=13, stock=3)
- initial.mdl: 10 elements (aux=5, link=5) - was 12 before fix
- teacup.mdl, active_initial.mdl, etc: all match

Also added diagnostic output to mdl_equivalence.rs to show element
type breakdowns during testing.
Implement comprehensive fixes for native MDL view parsing to match
xmutil's C++ implementation:

- Preserve (0,0) control point sentinel during coordinate transforms
- Fix ghost/alias promotion with two-pass algorithm and duplicate detection
- Add anchor snapping for flow endpoints (vertical vs horizontal alignment)
- Sequential UID assignment matching XMILE import
- Complete variable filtering (Time, Unwanted control variables)
- View title normalization with symbol namespace deduplication
- Stock ghost geometry offset (-22, -17)
- Flow position logic requiring both attached flags
- Group geometry matching exact xmutil formula
- Link detection with wrap-around and tight epsilon (0.01 degrees)
- Alias label_side defaults to Bottom

Flow endpoint computation now uses connector control points and applies
xmutil's anchor snapping algorithm: if control point X coordinates are
equal (vertical flow), snap Y to target centers; otherwise snap X to
target centers. This matches XMILEGenerator.cpp lines 987-1072.

These changes enable passing equivalence tests against open_vensim.
Six fixes for remaining divergences between native MDL view conversion
and the xmutil C++ reference implementation:

1. UID padding off by one: VensimView::insert now resizes to idx + 26
   (matching xmutil's len + 1 = uid + 26) instead of idx + 25. This
   ensures subsequent views have correct UID offsets.

2. Lettered polarity preservation: Track whether connectors used S/O
   letter notation vs +/- symbols. Added use_lettered_polarity flag to
   StockFlow (datamodel, proto, serde, json) matching xmutil's
   bLetterPolarity. The flag is computed per-model from all connectors.

3. View name dedup parity: Removed group names from the symbol namespace
   used for view title deduplication (groups are adjusted separately in
   xmutil). Changed used_names comparison to case-sensitive, matching
   xmutil's std::set<std::string> behavior.

4. Array variable filtering: Added documentation that subscript/dimension
   variables are implicitly filtered from views because they live in the
   dimensions map, not the symbols map.

5. JSON polarity round-trip: Added polarity field to LinkViewElement in
   json.rs so link polarity survives JSON/WASM round-trips.

6. Documentation: Added note about future module-style view splitting
   requirements to CLAUDE.md.
xmutil's XMILEGenerator uses a simple priority chain for time_units:
TIME STEP > FINAL TIME > INITIAL TIME > "Months". The native parser
had the wrong priority order (FINAL TIME first) and added unnecessary
"prefer explicit over range-only" logic that doesn't match xmutil.

Also uses select_equation() instead of equations.first() so that
AFO-first control variables still have their values extracted, and
extract_first_units() checks ALL equations rather than just the first.
Vensim allows \_ in identifiers as a literal underscore. The lexer's
symbol() method previously exited when it hit \ (not in is_symbol_char),
and the main next() dispatch produced UnrecognizedToken for \ not
followed by a newline.

The fix adds an outer loop in symbol() that checks for \_ after the
normal symbol char loop exits, and handles \_ at token start in next().
The trailing-strip logic is also updated to not strip _ that is part
of an escaped \_ sequence.

Also adds a comment clarifying that we intentionally trim leading
whitespace from comments, unlike xmutil which only trims trailing.
Previously the equivalence tests cleared views before comparison. This
commit enables element-by-element view comparison across all 12 test
models by fixing three root causes of divergence between the native MDL
path and the xmutil/XMILE path:

1. Sequential UID assignment: the native path preserved sparse Vensim
   UIDs while the XMILE path assigned sequential UIDs from 1. Added
   reassign_uids_sequential() post-processing to the native path.

2. Flow point stock-edge adjustment: the XMILE path snaps flow
   endpoints from stock centers to stock edges. Added
   fixup_flow_takeoffs() to the native path with matching geometry.

3. XMILE polarity parsing: the Link struct was missing the polarity
   attribute, silently dropping it during deserialization. Added the
   field and conversion functions.

Also fixed UID gaps in the XMILE path caused by assign_uids() numbering
elements before filtering unresolvable links. The retain() step now
re-sequences UIDs to close gaps.

Arc angle comparison uses epsilon tolerance (1e-10) to account for
inherent floating-point differences between the two computation paths
(C++ -> string -> Rust vs direct Rust).
@claude
Copy link

claude bot commented Jan 26, 2026

Pull Request Review - Native Vensim MDL Parser

This is an exceptionally well-executed PR that replaces the C++ xmutil dependency with a pure Rust implementation for parsing Vensim MDL files. The implementation demonstrates strong software engineering practices and careful attention to correctness.

✅ Strengths

Architecture & Design

  • Clean separation of concerns: The multi-phase parsing pipeline (lexer → normalizer → parser → AST → datamodel) is well-architected and mirrors the reference implementation appropriately
  • Excellent module organization: Breaking convert.rs into focused submodules (dimensions.rs, stocks.rs, variables.rs, helpers.rs) keeps files manageable and improves maintainability
  • Zero-copy design: Use of Cow<'input, str> in the AST for string references tied to input lifetime shows performance awareness
  • LALRPOP integration: Leveraging a proven parser generator rather than hand-rolling a parser reduces bugs

Testing & Correctness

  • Outstanding test coverage: 453+ unit tests plus comprehensive equivalence tests against the reference implementation (xmutil)
  • Rigorous validation: Element-by-element comparison of outputs between native and xmutil parsers across 12 test models ensures behavioral parity
  • TDD approach evident: Tests like test_ghost_variable_becomes_alias and test_time_variable_filtered_out show bugs were fixed with tests first
  • Edge case handling: Tests cover escaped underscores, range-only units, ACTIVE INITIAL, supplementary flags, etc.

Code Quality

  • Comprehensive documentation: Each module has clear explanations, the CLAUDE.md plan documents implementation phases, and complex algorithms reference the C++ source
  • Proper error handling: Custom error types (ConvertError, LexError, ReaderError) with structured variants
  • Follows project conventions: Adheres to commit message style, uses .unwrap() appropriately per guidelines, runs pre-commit hooks
  • Licensing compliance: Properly attributes xmutil derivation with MIT license and README

Integration

  • Backward compatible: Maintains existing open_vensim() function while adding open_vensim_native()
  • Feature-gated: Uses "xmutil" feature flag to control C++ dependency
  • Cross-language: Updates both Rust and TypeScript/protobuf schemas for new datamodel fields (ModelGroup, SimMethod::RungeKutta2, polarity)

🔍 Areas for Consideration

1. Test Coverage Gaps (Minor)
The equivalence tests filter out several model categories with known limitations:

  • Models using advanced builtins (ALLOCATE AVAILABLE, DELAY FIXED, NPV, QUANTUM, SAMPLE IF TRUE)
  • GET XLS/VDF data functions
  • Complex subscript features (EXCEPT, mappings)

While these are documented as future work, consider adding tracking issues for each category to ensure they don't get forgotten.

2. Floating-Point Comparison (Minor)
The ANGLE_EPSILON tolerance (1e-10) for arc angle comparisons is well-justified, but consider adding a comment explaining that the epsilon value is chosen based on empirical observation of maximum divergence rather than being theoretically derived.

3. Performance Considerations (Info)
No benchmarks are included to compare performance vs the C++ implementation. While this is a correctness-first PR (appropriate!), consider:

  • Adding criterion benchmarks for common MDL files
  • Profiling the multi-pass conversion to identify any bottlenecks
    This can be follow-up work.

4. Error Messages (Minor)
Some error paths use generic ConvertError with format strings. Consider adding specific error variants for common cases (e.g., undefined dimension reference, invalid control variable) to improve debuggability.

5. Dead Code (Trivial)
The integration_method field in ConversionContext is marked #[allow(dead_code)] - if it's truly unused after Phase 10 completion, it should be removed.

🎯 Specific Code Observations

Good Practices:

  • Use of unreachable!() and todo!() macros rather than comments (per guidelines)
  • Proper use of std::mem::take() to avoid borrow checker issues while maintaining clarity
  • Consistent naming: canonical_name() for spaces, canonical_ident() for underscores
  • Smart buffering strategy in EquationReader to handle macro state tracking

Minor Issues:

  1. In src/simlin-compat/src/mdl/convert/mod.rs:56, the #[allow(dead_code)] on integration_method should be revisited
  2. The test at line 747 has a comment about parent being "1-1 Sub" but the assertion expects a different group - verify this is correct
  3. Consider extracting magic numbers like 26 (UID padding offset) into named constants

🛡️ Security Considerations

✅ No security concerns identified:

  • Parser handles untrusted input appropriately with proper error handling
  • No unsafe code in the new modules
  • String operations use safe Rust APIs
  • No file I/O beyond the public API boundary

📊 Impact Assessment

Lines Changed: ~14,000 additions across 65 files
Risk Level: Low - well-tested, feature-gated, backward compatible
Value: High - eliminates C++ toolchain requirement, enables WASM builds

✨ Summary

This is exemplary work that sets a high bar for contributions to this codebase. The implementation:

  • Achieves full parity with the reference C++ implementation across tested models
  • Maintains excellent code organization and documentation
  • Demonstrates thorough testing with both unit and integration tests
  • Enables the project's strategic goal of WASM support

The minor suggestions above are mostly about long-term maintainability and don't block merging. This PR is strongly recommended for approval.

Recommendation: ✅ APPROVE and merge


Review conducted via automated analysis of code structure, tests, documentation, and adherence to project guidelines.

@codecov
Copy link

codecov bot commented Jan 26, 2026

Codecov Report

❌ Patch coverage is 80.55172% with 564 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.76%. Comparing base (d44d3ae) to head (22b4164).

Files with missing lines Patch % Lines
src/simlin-compat/src/mdl/convert/variables.rs 63.79% 147 Missing ⚠️
src/simlin-compat/src/mdl/lexer.rs 76.75% 106 Missing ⚠️
src/simlin-compat/src/mdl/normalizer.rs 63.04% 85 Missing ⚠️
src/simlin-compat/src/mdl/convert/stocks.rs 79.41% 63 Missing ⚠️
src/simlin-compat/src/mdl/reader.rs 79.51% 34 Missing ⚠️
src/simlin-compat/src/mdl/view/convert.rs 89.32% 30 Missing ⚠️
src/simlin-compat/src/mdl/view/mod.rs 79.77% 18 Missing ⚠️
src/simlin-compat/src/mdl/view/types.rs 80.85% 18 Missing ⚠️
src/simlin-compat/src/mdl/view/processing.rs 92.61% 15 Missing ⚠️
src/simlin-compat/src/mdl/convert/helpers.rs 84.61% 14 Missing ⚠️
... and 7 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #211      +/-   ##
==========================================
+ Coverage   75.99%   76.76%   +0.77%     
==========================================
  Files          48       69      +21     
  Lines       15533    19138    +3605     
==========================================
+ Hits        11804    14691    +2887     
- Misses       3729     4447     +718     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6a6a9152b9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 2588 to 2590
#[cfg(feature = "xmutil")]
#[no_mangle]
pub unsafe extern "C" fn simlin_project_open_vensim(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Expose native MDL import in non-xmutil builds

The new native MDL parser is still unreachable in WASM builds because simlin_project_open_vensim is compiled only when the xmutil feature is enabled (and it still routes through simlin_compat::open_vensim). When libsimlin is built without xmutil (the default for WASM), this export is missing and engine2 will throw for MDL imports, so the “native parser” path never becomes available. Consider exporting a non-xmutil variant that calls open_vensim_native or relaxing the feature gate so MDL import actually works in WASM.

Useful? React with 👍 / 👎.

simlin_project_open_vensim was gated behind #[cfg(feature = "xmutil")]
and routed through the C++ xmutil path. This meant WASM builds (which
use --no-default-features) had no MDL import capability, defeating the
purpose of the native parser.

Switch the FFI function to call open_vensim_native directly and remove
the feature gate so it is always available. Also remove the xmutil
feature gates from the corresponding tests and clean up stale comments
in the TypeScript WASM bindings that referenced the xmutil requirement.

The native parser uses canonical underscore-separated identifiers (e.g.
"heat_loss_to_room") rather than space-separated names from the xmutil
path, so the engine2 and pysimlin test expectations are updated to match.
@claude
Copy link

claude bot commented Jan 26, 2026

Pull Request Review: Native Vensim MDL Parser

This is an impressive and well-executed PR that replaces the C++ xmutil dependency with a pure Rust implementation. The scope is massive (35k+ additions) but the implementation demonstrates strong engineering practices.

Summary

Recommendation: APPROVE with minor suggestions

This PR successfully implements a complete Vensim MDL parser in Rust, achieving equivalence with the existing xmutil-based implementation across 12 test models. The code quality is high, testing is comprehensive, and the architectural decisions are sound.


Strengths

1. Excellent Documentation

  • Comprehensive CLAUDE.md provides clear implementation roadmap
  • Proper attribution and licensing (MIT) for derivative work
  • Well-structured commit history with descriptive messages
  • Inline documentation explains complex algorithms (e.g., flow endpoint computation, angle calculation)

2. Strong Testing Strategy

  • 453+ unit tests across all modules (lexer, parser, converter, views)
  • Integration tests compare native vs xmutil output element-by-element
  • TDD approach evident in commit history (tests added before implementation)
  • Test files include edge cases and error conditions

3. Architectural Excellence

  • Clean separation of concerns: lexer → normalizer → parser → AST → converter
  • Proper use of Result types for error handling
  • Minimal .unwrap() usage (162 instances across 14 files, most with context)
  • No unwrap_or_default() anti-patterns found in production code

4. Code Quality

  • Follows Rust idioms and best practices
  • Uses appropriate data structures (Cow for zero-copy strings)
  • Modular design with focused submodules (convert/ split into dimensions, stocks, variables, etc.)
  • Proper handling of edge cases (escaped underscores, various line endings, etc.)

5. Security & Correctness

  • No unsafe code detected
  • Proper bounds checking (no get_unchecked usage)
  • UTF-8 validation handled correctly
  • All panic!/expect() calls confined to test code

6. Completeness

  • Handles all MDL format features: equations, subscripts, macros, views, settings
  • Proper XMILE compatibility layer for expression formatting
  • Support for RK2/RK4/Euler integration methods
  • Full view/diagram parsing with coordinate transforms and angle calculations

Issues & Suggestions

Critical: None

Major: None

Minor Suggestions

1. Test Coverage Visibility

Consider adding code coverage reporting to track which edge cases are tested. With 453+ tests, it would be valuable to see coverage metrics.

# Suggestion: Add to CI
cargo tester --all-features --workspace

2. Error Messages Could Be More Specific

In mod.rs, generic error messages could be more helpful:

// Current:
Some(format!("Failed to parse MDL: {:?}", e))

// Suggestion: Include line numbers when available
Some(format!("Failed to parse MDL at line {}: {:?}", line_num, e))

3. Consider Adding Fuzzing

With a complex parser handling untrusted input, fuzzing could reveal edge cases:

# Add to Cargo.toml
[dev-dependencies]
cargo-fuzz = "0.11"

4. unwrap() Usage Pattern

While most .unwrap() calls are justified, consider documenting why each is safe:

// Instead of:
.unwrap()

// Consider:
.expect("dimension_map always contains entries from collect_symbols phase")

This helps future maintainers understand invariants.

5. Module Organization Documentation

The convert/ directory split is excellent, but could benefit from a module-level doc comment explaining the multi-pass conversion strategy:

//! Multi-pass MDL to datamodel conversion:
//! 
//! 1. **Symbol Collection** (mod.rs): Gather all variable/dimension names
//! 2. **Dimension Building** (dimensions.rs): Expand ranges, handle equivalences
//! 3. **Variable Type Detection** (variables.rs): Classify stocks/flows/aux
//! 4. **Stock/Flow Linking** (stocks.rs): Connect flows to stocks via inflow/outflow
//! 5. **Expression Formatting** (via xmile_compat.rs): XMILE-compatible output

Performance Considerations

Positive

  • Zero-copy string references with Cow<'input, str>
  • Efficient parsing with LALRPOP (LR parser generator)
  • No unnecessary allocations in hot paths

Potential Optimization Opportunities

None critical, but for very large models:

  • Consider arena allocation for AST nodes if profiling shows allocation overhead
  • View element processing could be parallelized with rayon

Security Assessment

No security concerns identified

  • No unsafe code
  • Proper input validation
  • No command injection vectors
  • No file path traversal issues
  • Appropriate error handling prevents panics on malformed input

Test Coverage Assessment

Based on mdl_equivalence.rs:

  • ✅ 12 models passing full equivalence tests
  • ✅ View element comparison (UIDs, coordinates, link shapes, polarity)
  • ✅ Known issues documented (xmutil segfaults, feature gaps)
  • ✅ Clear separation of working vs pending models

Suggestion: Add a test that deliberately provides malformed input to verify graceful error handling rather than panics.


Compatibility & Migration Path

The PR maintains backward compatibility:

  • ✅ Existing open_vensim() function preserved (uses xmutil)
  • ✅ New open_vensim_native() function available
  • ✅ FFI layer updated to use native parser
  • ✅ WASM builds now support MDL import (major win!)

Future consideration: Document deprecation timeline for xmutil dependency once native parser is battle-tested in production.


Commit Quality

Excellent commit hygiene:

  • ✅ Logical atomic commits
  • ✅ Descriptive commit messages following project style
  • ✅ Body text explains "why" not just "what"
  • ✅ Co-authored attribution to Claude

Alignment with CLAUDE.md Guidelines

Checking against project guidelines:

Test Driven Development: Evidence of tests-first approach
No backwards compatibility shims: Clean implementation
Prefer .unwrap() over .unwrap_or_default(): Followed correctly
Use unreachable!() for impossible cases: Pattern followed
No placeholder comments: All TODOs removed
Proper error types: Result types used throughout
Commit message style: Matches component: description format


Final Recommendations

Before Merge

  1. ✅ Pre-commit checks pass (mentioned in PR description)
  2. ✅ All tests pass (453+ unit tests, 12 equivalence tests)
  3. ⚠️ Recommended: Run full simulation test suite to verify models parse AND simulate correctly
  4. ⚠️ Recommended: Manual review of panic! locations to confirm all are test-only

Post-Merge

  1. Monitor production usage of native parser vs xmutil
  2. Add telemetry to track which code path is used (if applicable)
  3. Consider adding performance benchmarks to prevent regressions
  4. Document migration guide for external users if any exist

Conclusion

This is exemplary work that:

  • ✅ Eliminates complex C++ build dependency
  • ✅ Enables WASM support for MDL imports
  • ✅ Maintains full compatibility with existing behavior
  • ✅ Includes comprehensive test coverage
  • ✅ Follows project conventions meticulously

The 35k line addition is justified by the complexity of the task (replacing a 15k line C++ parser) and the inclusion of extensive tests and documentation.

Approved with high confidence. Minor suggestions above are truly optional improvements, not blockers.

Great work! 🎉


Reviewed by: Claude Sonnet 4.5
Review Date: 2026-01-26
PR: #211 - Native Vensim MDL parser

The native Rust MDL parser is now always available via engine2's
openVensim(), so the xmutil-js fallback path in NewProject.tsx is no
longer needed. Remove the convertMdlToXmile import, the xmutil package
dependency, the tsconfig project reference, and the rsbuild alias.

The .mdl file type was already accepted by the file picker; this just
simplifies the import handler to always use the native path.
@claude
Copy link

claude bot commented Jan 26, 2026

PR #211 Review: Native Vensim MDL Parser

Summary

This PR implements a comprehensive pure Rust parser for Vensim MDL files, replacing the C++ xmutil dependency. This is a massive and impressive undertaking with 35,493 additions across 71 files, including a complete lexer, parser, AST, and conversion pipeline.

Overall Assessment: ✅ APPROVE with Minor Recommendations

This is excellent work that successfully achieves its goals. The implementation is thorough, well-tested (453+ tests), and carefully designed to match xmutil's behavior while being maintainable.


Strengths

1. Exceptional Architecture & Design ⭐⭐⭐⭐⭐

  • Clean separation of concerns: RawLexer → TokenNormalizer → Parser → AST → Datamodel
  • Well-organized module structure with focused submodules (convert/, view/)
  • Comprehensive documentation in CLAUDE.md explaining implementation phases and decisions
  • Proper use of Rust idioms (Cow<'input, str> for zero-copy, Result/Option error handling)

2. Outstanding Test Coverage ⭐⭐⭐⭐⭐

  • 453+ unit tests across lexer, parser, normalizer, converter, view processing
  • 12 full-model equivalence tests comparing xmutil output vs native output
  • Tests verify UIDs, coordinates, link shapes, polarity across all test models
  • Well-documented known gaps and unsupported features in tests/mdl_equivalence.rs

3. Proper Error Handling ⭐⭐⭐⭐

  • Custom error types (LexError, NormalizerError, ReaderError, ConvertError)
  • Consistent use of Result types throughout
  • Good use of unwrap() over unwrap_or_default() per project guidelines
  • Appropriate use of unreachable!() and panic!() for truly impossible cases

4. Licensing & Attribution ⭐⭐⭐⭐⭐

  • Proper MIT license attribution to Bob Eberlein's xmutil
  • Clear README explaining derivative work provenance
  • Maintains original copyright notices

5. C-Compatible Parsing ⭐⭐⭐⭐⭐

  • parse_int_like_atoi() function with comprehensive edge case tests
  • Handles whitespace, signs, non-digit characters exactly like C's atoi()
  • 31 tests for RK2 method codes, atoi parsing, line ending formats

Code Quality Issues

Critical Issues: None ✅

High Priority Issues

1. Excessive #[allow(dead_code)] Usage ⚠️

Location: src/simlin-compat/src/mdl/lexer.rs:5, and 7 other files

Issue: Multiple files have blanket #[allow(dead_code)] directives, which can hide legitimately unused code that should be removed.

Recommendation:

// Instead of:
#![allow(dead_code)]

// Use targeted suppression:
#[allow(dead_code)]
pub struct OnlyUsedInTests { ... }

Files affected:

  • lexer.rs, convert/dimensions.rs, convert/mod.rs, convert/types.rs, view/processing.rs

2. High unwrap() Count ⚠️

Count: 162 unwraps across 14 files

Issue: While project guidelines prefer unwrap() over unwrap_or_default() for debugging, this many unwraps could cause panics on malformed input.

Recommendations:

  • Add fuzzing tests for malformed MDL files
  • Consider converting some unwraps in hot paths to proper error propagation
  • Document which unwraps are "safe" (based on previous validation) vs which assume valid input

Example locations to review:

  • src/simlin-compat/src/mdl/convert/stocks.rs (30 unwraps)
  • src/simlin-compat/src/mdl/convert/variables.rs (28 unwraps)

Medium Priority Issues

3. Missing Documentation on Complex Algorithms 📝

Locations:

  • is_all_plus_minus() algorithm in stocks.rs
  • reassign_uids_sequential() in view/processing.rs
  • fixup_flow_takeoffs() in view/processing.rs

Issue: While implementation comments exist, high-level documentation of why these algorithms work this way is sparse.

Recommendation: Add module-level or function-level doc comments explaining the mathematical/logical basis, especially for algorithms ported from xmutil C++.

4. Panic Count Acceptable but Worth Monitoring ℹ️

Count: 170 panics across 12 files

Context: Most appear to be in unreachable!() or assertion macros which is appropriate.

Recommendation: Audit panic! calls in production code paths (not tests) to ensure they're only for truly impossible states.


Performance Considerations

Positive

  • Zero-copy string references with Cow<'input, str>
  • Efficient HashMap/HashSet usage for lookups ✅
  • Lazy parsing (EquationReader iterator) ✅

Potential Concerns

  • Large enum variants: Fixed with Box and Box ✅
  • String allocations: Canonical name conversions create new strings frequently, but acceptable for parsing workload ✅
  • No obvious performance issues

Security Considerations

✅ No Unsafe Code

  • Zero unsafe blocks found in mdl/ module ✅

✅ Input Validation

  • Lexer handles malformed input with proper error types ✅
  • Parser uses LALRPOP which provides bounds checking ✅
  • Settings parser is "intentionally permissive" (documented in code) ✅

⚠️ Potential DoS Vectors

Issue: Large or deeply nested MDL files could cause stack overflow or memory exhaustion

Recommendation:

  • Add maximum recursion depth limits for expression parsing
  • Consider fuzzing with large malformed MDL files
  • Document any practical file size limits

Example: Deep nesting in format_expr_inner() is recursive without depth limit


Testing Quality

Strengths

  • Equivalence tests compare against known-good xmutil output ✅
  • Element-by-element view comparison with coordinate/UID validation ✅
  • Comprehensive edge case coverage (escaped underscores, line continuations, etc.) ✅
  • TDD approach evident (test_ghost_variable_becomes_alias, test_time_variable_filtered_out) ✅

Gaps

  • ❌ No fuzzing tests for malformed input
  • ❌ No performance benchmarks for large models
  • ⚠️ Some models excluded from equivalence tests due to unsupported features (documented)

Integration & Dependencies

✅ Clean Integration

  • Properly exposed via parse_mdl() public API ✅
  • FFI bindings updated for non-xmutil builds ✅
  • WASM compatibility achieved (main goal) ✅
  • TypeScript bindings updated ✅

⚠️ Feature Flag Complexity

  • The xmutil feature flag logic is a bit convoluted
  • cfg(feature = "xmutil") appears in multiple places
  • Consider documenting the feature flag strategy in CLAUDE.md

Specific Code Review Findings

1. settings.rs - Permissive Parsing Strategy

Lines 76-88: Good documentation of intentional deviation from xmutil strictness
Good: Well-documented design decision with clear rationale

2. lexer.rs - Line Continuation Handling

Lines 156-184: Complex lookahead/pushback logic
Good: Handles edge cases correctly
⚠️ Consider: Add diagram or state machine documentation

3. convert/mod.rs - Multi-pass Conversion

Lines 123-150: Well-structured conversion pipeline
Excellent: Clear separation of concerns with numbered passes

4. xmile_compat.rs - Function Transformations

Lines 126-150: Complex function renaming logic
Good: Comprehensive handling of XMILE function differences
⚠️ Minor: Consider extracting transformation rules to a data structure


Commit Quality

✅ Excellent Commit Messages

  • Follows project style guide perfectly ✅
  • Component prefixes used correctly (compat:, xmutil:, app:) ✅
  • Present tense, lowercase, descriptive ✅
  • Bodies explain "why" not just "what" ✅

Example:

compat: fix view ghost/alias detection and filter Time variable

Two bugs in the native MDL parser's view conversion were causing
element count mismatches with xmutil:
1. Ghost/alias detection...
2. Time variable inclusion...

Recommendations

Required Before Merge: None

This PR is ready to merge as-is.

Strongly Recommended for Follow-up

  1. Add fuzzing tests for malformed MDL input
  2. Audit and reduce #[allow(dead_code)] directives
  3. Add recursion depth limits to prevent stack overflow
  4. Document complex algorithms (is_all_plus_minus, flow takeoff computation)

Optional Improvements

  1. Extract function transformation rules to data structures in xmile_compat.rs
  2. Add performance benchmarks for large models
  3. Consider streaming parser for memory efficiency on huge files
  4. Add code coverage metrics to CI

Conclusion

This is exceptional engineering work that successfully:

  • ✅ Eliminates C++ toolchain dependency
  • ✅ Enables WASM builds with MDL import
  • ✅ Maintains full compatibility with xmutil
  • ✅ Provides comprehensive test coverage
  • ✅ Follows project conventions and Rust best practices

The implementation is production-ready with only minor recommendations for follow-up work.

Verdict: ✅ APPROVE


Additional Notes

  • The 35K+ line addition is appropriate for a complete parser implementation
  • The derivative work licensing is handled correctly
  • The refactoring into submodules (convert/ directory) shows good judgment
  • The equivalence testing approach is rigorous and clever
  • The CLAUDE.md documentation will be valuable for future maintenance

Great work! 🎉

@bpowers
Copy link
Owner Author

bpowers commented Jan 26, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a8bc71f7c9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 148 to 150
if (file.name.endsWith('.mdl')) {
// For Vensim MDL files, try direct import first if available
const hasVensim = await Engine2Project.hasVensimSupport();
if (hasVensim) {
engine2Project = await Engine2Project.openVensim(contents);
} else {
// Fall back to xmutil conversion when direct Vensim support is not available
const [xmileContents, conversionLogs] = await convertMdlToXmile(contents, true);
logs = conversionLogs;
if (xmileContents.length === 0) {
throw new Error('Vensim converter: ' + (logs || 'unknown error'));
}
engine2Project = await Engine2Project.open(xmileContents);
}
engine2Project = await Engine2Project.openVensim(contents);
} else {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate MDL import on Vensim support

This path now unconditionally calls Engine2Project.openVensim for .mdl files. The engine2 API still exposes Project.hasVensimSupport() and its tests explicitly handle the case where openVensim must throw when the WASM build lacks simlin_project_open_vensim. In that scenario, users will now hit a runtime exception and lose MDL import entirely (there’s no fallback anymore), whereas the previous code avoided that by checking support and falling back. Consider restoring the support check (or another fallback) so MDL import doesn’t regress on builds that omit Vensim support.

Useful? React with 👍 / 👎.

The native MDL import was setting view_box to the bounding box of
diagram elements (e.g. {x: 50, y: 70, width: 431, height: 168}).
But the Editor expects view_box to be a viewport descriptor where
{0, 0, 0, 0} means "not yet configured, please center automatically."

When Canvas.tsx sees a non-zero view_box.width, it skips the simple
centering path and instead treats view_box.x/y as pan offsets, causing
the diagram to appear off-screen.

The XMILE import path correctly uses Default::default() for view_box.
This change makes the MDL path match, so the Editor's centering logic
works as intended.
The native Rust MDL parser (open_vensim_native) is always available --
it has no feature gate. The FFI export simlin_project_open_vensim()
calls it directly and unconditionally. The WASM build includes it.
So hasVensimSupport() always returned true and was dead code.

This removes the hasVensimSupport function from import-export.ts and
the Project class, removes the conditional guards wrapping MDL tests
so they run unconditionally, and cleans up stale "requires xmutil
feature" references in doc comments, simlin.h, and README.md.
@bpowers
Copy link
Owner Author

bpowers commented Jan 26, 2026

@codex review

@claude
Copy link

claude bot commented Jan 26, 2026

PR Review: Native Vensim MDL Parser

Summary

This is an exceptionally well-executed PR that replaces the C++ xmutil dependency with a pure Rust implementation for parsing Vensim MDL files. The implementation is comprehensive, well-tested, and follows Rust best practices. The code quality is outstanding and this represents a significant architectural improvement for the project.

Strengths

Architecture & Design ✅

  1. Smart Design Decisions: Direct MDL → datamodel conversion (bypassing XMILE intermediate) is elegant and efficient

  2. Clean Module Organization: Well-structured codebase with clear separation of concerns:

    • Lexer/normalizer pipeline with context-sensitive transformations
    • LALRPOP-based parser with comprehensive grammar
    • Multi-pass AST-to-datamodel conversion
    • Excellent documentation in CLAUDE.md providing implementation context
  3. Comprehensive Feature Coverage:

    • All equation types, subscripts/arrays, built-in functions
    • View/diagram parsing with coordinate transformation
    • Settings and groups parsing
    • Macro parsing (output not yet implemented but captured)

Code Quality ✅

  1. Excellent Error Handling:

    • Proper Result types throughout
    • No unsafe code
    • Strategic use of .unwrap() over .unwrap_or_default() per project guidelines (only 4 uses of unwrap_or_default, all justified)
    • Only 9 uses of .expect() in total, all with clear panic messages
  2. Strong Type Safety:

    • Leverages Rust's type system effectively
    • Proper use of lifetimes (Cow<'input, str> for zero-copy parsing)
    • No use of todo!() or unimplemented!() macros
  3. Clean Code Patterns:

    • Minimal clone operations (44 in convert/ module for ~15k+ lines)
    • No panics in production code paths (parse_number panic is defensive - lexer guarantees validity)
    • Clear, descriptive function and variable names

Testing ✅

  1. Outstanding Test Coverage: 453+ unit tests covering:

    • Lexer (29+ tests), Normalizer (30+ tests), Parser helpers (15+ tests)
    • Builtins (15+ tests), Reader (60+ tests), Convert (55+ tests)
    • XMILE compat (30+ tests), Settings (44+ tests), Views (51+ tests)
  2. Integration Testing:

    • Equivalence tests comparing native parser vs xmutil output
    • 12 models pass full equivalence tests with view element comparison
    • Element-by-element validation of conversion correctness
  3. Test Quality: Tests follow TDD principles with clear assertions and edge case coverage

Documentation ✅

  1. Exceptional Documentation:
    • 746-line CLAUDE.md provides complete implementation context
    • Clear inline comments explaining non-obvious logic
    • Module-level documentation for each component
    • Comprehensive checklist tracking feature completeness

Areas for Improvement

Minor Issues

  1. Panic in parse_number (parser_helpers.rs:18)

    • While the comment says this "should never happen", consider returning a Result for more graceful error handling
    • This is a minor concern since the lexer does guarantee validity
  2. View Composition Edge Cases

    • The CLAUDE.md notes that module/level-structured models may need enhancement
    • Consider adding TODO comments in the code near compose_views() to flag this for future work

Security Considerations ✅

  1. No Unsafe Code: Excellent - no unsafe blocks found
  2. Input Validation: Lexer properly handles malformed input with error codes
  3. Buffer Overflows: Not possible - Rust's safety guarantees prevent this

Potential Bugs

None identified - the code appears robust and well-tested.

Performance Considerations

  1. Memory Usage: Parser holds entire source in memory - acceptable for typical model files (< 10MB)
  2. Computational Complexity: View processing has some O(n²) operations, but acceptable for typical model sizes

Consistency with Project Guidelines ✅

  • Follows commit message style exactly
  • Proper use of .unwrap() over .unwrap_or_default()
  • No compatibility shims or backwards compatibility concerns
  • Comprehensive test coverage with no stubbed tests
  • Uses proper error types (Result, not panics)

Recommendations

Before Merge

  1. ✅ Verify cargo test -p simlin-compat passes
  2. ✅ Verify cargo clippy -p simlin-compat has no warnings
  3. ✅ Verify cargo fmt
  4. ⚠️ Verify all CI checks are passing

Optional Improvements (Post-Merge)

  1. Consider making parse_number return Result instead of panicking
  2. Add inline TODO comments for known limitations
  3. Profile memory usage on large models
  4. Consider adding fuzzing tests with cargo-fuzz

Conclusion

APPROVE WITH ENTHUSIASM

This PR is a model of excellent engineering:

  • Solves real problems (build complexity, WASM compatibility)
  • High code quality with strong typing and error handling
  • Comprehensive test coverage (453+ tests)
  • Outstanding documentation
  • No blocking issues identified
  • Follows all project conventions

Recommendation: Merge after CI passes. This is production-ready code.


Detailed Metrics: 35,434 lines added, 71 files changed, 425+ test functions, 0 unsafe blocks, 0 TODO/unimplemented macros

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 22b416447d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 316 to +318
pub multi_points: Vec<LinkPoint>,
#[serde(skip_serializing_if = "Option::is_none", default)]
pub polarity: Option<String>,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve new link polarity metadata in TS models

The JSON schema now emits LinkViewElement.polarity (and View.useLetteredPolarity nearby), but the TypeScript JSON types and core datamodel still omit these fields (JsonLinkViewElement/JsonView in src/engine2/src/json-types.ts and LinkViewElement.fromJson/toJson in src/core/datamodel.ts). When a project containing link polarities or lettered polarity settings is opened in the app and saved back, those fields are dropped, so Vensim imports with polarity metadata will not round-trip. Consider wiring these fields through the TS types and core datamodel so they are preserved on load/save.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants