fix: replace SheetJS dependency with custom XLSX file reader#833
Open
chrispcampbell wants to merge 11 commits into
Open
fix: replace SheetJS dependency with custom XLSX file reader#833chrispcampbell wants to merge 11 commits into
chrispcampbell wants to merge 11 commits into
Conversation
Add decodeCell, encodeCell, decodeCol, and decodeRow helpers in a new _shared/xlsx.js module, alongside a spec covering single/multi-letter columns, case-insensitivity, round-tripping, and invalid input. These mirror the corresponding XLSX.utils.* helpers from SheetJS and will replace those callsites in the compile pipeline.
Add readXlsx() in _shared/xlsx.js that returns a SheetJS-shaped workbook
({ SheetNames, Sheets }) by unzipping the xlsx with fflate and scanning
the workbook, rels, sharedStrings, and worksheet XML. Sheets are
materialized lazily on first access via a Proxy and the workbook is
cached by path, mirroring the existing readXlsx helper.
Cell-type handling covers numeric cells (with and without explicit t='n'),
shared strings (including rich-text runs), inline strings, formula cells
(reads the cached <v>, ignores <f>), formula cells without a cached value
(returns undefined so the caller's missing-cell handling kicks in),
booleans, and error cells (skipped — see follow-up commit for callsite
swap).
Adds fflate as a runtime dependency. The xlsx (SheetJS) dependency
remains for now; it will be removed once the callsites are migrated.
Spec covers all cell-type paths plus a regression for the self-closing
empty-cell bug (greedy regex that swallowed following cells) and the
URL-in-attribute issue (workbook/rels parse needs to allow / in attrs).
Replace all SheetJS imports in the compile package with the new _shared/xlsx module and drop the xlsx dependency from package.json. The previous distribution had to be pulled from cdn.sheetjs.com rather than the npm registry, and we only used a tiny slice of the API. - helpers.js readXlsx() now delegates to the in-tree reader (which handles workbook-level caching itself). - direct-data-helpers.js uses encodeCell instead of XLSX.utils.encode_cell. - gen-direct-const.js and read-subscripts.js use decodeCell instead of XLSX.utils.decode_cell. - gen-lookup-from-direct.js uses decodeCell/decodeCol/decodeRow. Note: error cells in xlsx files (#REF!, #VALUE!, etc.) are now treated as missing instead of as the SheetJS-internal numeric error code (e.g. 23 for #REF!, 15 for #VALUE!) that would previously sneak through cdbl() into generated code as bogus constants. This is the correct behavior; no model in the in-repo fixtures or the lauranne-dymemds reference model has GET calls that target an error cell. All 833 compile unit tests pass; the data-file-resolution integration test (which exercises the GET DIRECT CONSTANTS xlsx path) passes for both the JS and Wasm builds.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #832
@ToddFincannonEI: See issue for details. As noted, this was largely implemented by Claude but I reviewed the code and tests and it all seems pretty straightforward. Normally I would avoid going with a custom solution but in this case it seemed warranted and the performance wins are a nice bonus.
I tested this branch with the model referenced in the issue and it produces identical values as the previous SheetJS-based reader (the generated C/JS code is identical), and our existing XLSX tests pass as well, so I feel like it should be good enough for SDE's purposes.
AI disclosure: I guided Claude Code (Opus 4.7) to plan and implement the changes, and then I reviewed and refined the changes.