Skip to content

fix: replace SheetJS dependency with custom XLSX file reader#833

Open
chrispcampbell wants to merge 11 commits into
mainfrom
chris/832-xlsx-alternative
Open

fix: replace SheetJS dependency with custom XLSX file reader#833
chrispcampbell wants to merge 11 commits into
mainfrom
chris/832-xlsx-alternative

Conversation

@chrispcampbell
Copy link
Copy Markdown
Contributor

Fixes #832

@ToddFincannonEI: See issue for details. As noted, this was largely implemented by Claude but I reviewed the code and tests and it all seems pretty straightforward. Normally I would avoid going with a custom solution but in this case it seemed warranted and the performance wins are a nice bonus.

I tested this branch with the model referenced in the issue and it produces identical values as the previous SheetJS-based reader (the generated C/JS code is identical), and our existing XLSX tests pass as well, so I feel like it should be good enough for SDE's purposes.

AI disclosure: I guided Claude Code (Opus 4.7) to plan and implement the changes, and then I reviewed and refined the changes.

Add decodeCell, encodeCell, decodeCol, and decodeRow helpers in a new
_shared/xlsx.js module, alongside a spec covering single/multi-letter
columns, case-insensitivity, round-tripping, and invalid input. These
mirror the corresponding XLSX.utils.* helpers from SheetJS and will
replace those callsites in the compile pipeline.
Add readXlsx() in _shared/xlsx.js that returns a SheetJS-shaped workbook
({ SheetNames, Sheets }) by unzipping the xlsx with fflate and scanning
the workbook, rels, sharedStrings, and worksheet XML. Sheets are
materialized lazily on first access via a Proxy and the workbook is
cached by path, mirroring the existing readXlsx helper.

Cell-type handling covers numeric cells (with and without explicit t='n'),
shared strings (including rich-text runs), inline strings, formula cells
(reads the cached <v>, ignores <f>), formula cells without a cached value
(returns undefined so the caller's missing-cell handling kicks in),
booleans, and error cells (skipped — see follow-up commit for callsite
swap).

Adds fflate as a runtime dependency. The xlsx (SheetJS) dependency
remains for now; it will be removed once the callsites are migrated.

Spec covers all cell-type paths plus a regression for the self-closing
empty-cell bug (greedy regex that swallowed following cells) and the
URL-in-attribute issue (workbook/rels parse needs to allow / in attrs).
Replace all SheetJS imports in the compile package with the new
_shared/xlsx module and drop the xlsx dependency from package.json.
The previous distribution had to be pulled from cdn.sheetjs.com rather
than the npm registry, and we only used a tiny slice of the API.

- helpers.js readXlsx() now delegates to the in-tree reader (which
  handles workbook-level caching itself).
- direct-data-helpers.js uses encodeCell instead of XLSX.utils.encode_cell.
- gen-direct-const.js and read-subscripts.js use decodeCell instead of
  XLSX.utils.decode_cell.
- gen-lookup-from-direct.js uses decodeCell/decodeCol/decodeRow.

Note: error cells in xlsx files (#REF!, #VALUE!, etc.) are now treated
as missing instead of as the SheetJS-internal numeric error code (e.g.
23 for #REF!, 15 for #VALUE!) that would previously sneak through cdbl()
into generated code as bogus constants. This is the correct behavior;
no model in the in-repo fixtures or the lauranne-dymemds reference model
has GET calls that target an error cell.

All 833 compile unit tests pass; the data-file-resolution integration
test (which exercises the GET DIRECT CONSTANTS xlsx path) passes for
both the JS and Wasm builds.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace SheetJS (xlsx package) with custom XLSX file reader

1 participant