Skip to content

fix(mdxish): preserve blank lines in JSX table parsing to fix roundtrip corruption#1430

Merged
maximilianfalco merged 12 commits intonextfrom
falco/rm-16064-jsx-table-parsing-roundtrip-breaks-markdown
Apr 14, 2026
Merged

fix(mdxish): preserve blank lines in JSX table parsing to fix roundtrip corruption#1430
maximilianfalco merged 12 commits intonextfrom
falco/rm-16064-jsx-table-parsing-roundtrip-breaks-markdown

Conversation

@maximilianfalco
Copy link
Copy Markdown
Contributor

@maximilianfalco maximilianfalco commented Apr 9, 2026

🎫 Resolve RM-16064

🎯 What does this PR do?

The jsxTable micromark tokenizer captures <Table>...</Table> as a single opaque html node. mdxishTables then re-parses that node's value through a separate unified() pipeline (tableNodeProcessor) to extract structured cells.

The problem is that every node produced by that re-parse carries positions relative to the html node's value string and not the outer document. So when a downstream consumer slices originalSource by node.position.offset (as the editor's nodeToSource does for SyntaxFlow and SyntaxText), it's reading bytes from the wrong location in the outer source.

This was a gap that #1371 failed to account for mainly because:

  • The rendering path (mdxish() → hast) doesn't use positions
  • Markdown-only roundtrips (mdast()mdxishMdastToMd()) don't slice by offset
  • Only the editor's PM pipeline and only syntaxNodes uses nodeToSource and the slicing

The fix had 2 places:

  1. preserve blank lines while reassembling the html node's value. The original code joined per-line chunks with '\n', collapsing multi-line gaps. This is a prerequisite for (2): without it, inner offsets drift relative to the outer source past the first blank line.

  2. after tableNodeProcessor returns its subtree, walk every node and shift position.start.offset / position.end.offset / position.start.line / position.end.line by the outer html node's base offset and line. This translates inner coordinates into the outer source coordinate space so downstream consumers can slice correctly.

The blank-line fix isn't the actual bug (it happens at the reassembly stage), but rather simply a hard prerequisite for the position translation to produce byte-identical slices against the outer source.

🧪 QA tips

  • Paste a <Table> with <ul>/<li> inside a <td> in the mdxish editor
  • Save and re-open multiple times — closing tags should remain intact
  • The companion test PR in readme is readmeio/readme#18209

📸 Screenshot or Loom

Screen.Recording.2026-04-10.at.00.11.30.mov

@maximilianfalco maximilianfalco marked this pull request as ready for review April 9, 2026 14:13
@kevinports
Copy link
Copy Markdown
Contributor

The underlying bug here is really concerning. Do you know if this is a regression and when it started happening?

@maximilianfalco
Copy link
Copy Markdown
Contributor Author

maximilianfalco commented Apr 10, 2026

@kevinports I did a bit more digging and this is a genuine gap from my initial introduction of the table tokenizer in #1371. I had not accounted for the fact that since we are using a subparser, the position value of the tableCells are different from the base root content. apologies for not catching this earlier

this fell through the cracks since rendering didnt use positions (which is why this wasnt caught in the original PR) but the MdxishEditor does use it and specifically only uses it for syntax nodes (in nodeToSource). the fix was pretty simple, just need to translate the inner position with the offset value of the initial html node

Note

we also need the blank line preservation in jsxTableFromMarkdown to ensure the inner representation of the content is byte-identical to the html.value. it guarantees the reconstructed html.value is byte-identical to the outer source slice, so shifted offsets land at the right bytes

@maximilianfalco maximilianfalco merged commit 9fa040c into next Apr 14, 2026
7 checks passed
@maximilianfalco maximilianfalco deleted the falco/rm-16064-jsx-table-parsing-roundtrip-breaks-markdown branch April 14, 2026 01:17
rafegoldberg pushed a commit that referenced this pull request Apr 14, 2026
## Version 13.8.5
### 🛠 Fixes & Updates

* actually fix toc links with query params ([#1436](#1436)) ([b59ce5f](b59ce5f))
* **mdxish:** fix FA Emojis in Callouts ([#1421](#1421)) ([f7caa3e](f7caa3e))
* **mdxish:** preserve blank lines in JSX table parsing to fix roundtrip corruption ([#1430](#1430)) ([9fa040c](9fa040c)), closes [#1371](#1371)
* proper spacing in empty callouts ([#1433](#1433)) ([97b8a41](97b8a41))
* scope link color inheritance to headings ([#1432](#1432)) ([cd78a62](cd78a62))

<!--SKIP CI-->
@rafegoldberg
Copy link
Copy Markdown
Contributor

This PR was released!

🚀 Changes included in v13.8.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants