Skip to content

fix(stripComments): preserve HTMLBlock template literals in mdxish mode#1410

Draft
eaglethrost wants to merge 8 commits intonextfrom
dimas/rm-15880-broken-view-as-markdown-for-html-block
Draft

fix(stripComments): preserve HTMLBlock template literals in mdxish mode#1410
eaglethrost wants to merge 8 commits intonextfrom
dimas/rm-15880-broken-view-as-markdown-for-html-block

Conversation

@eaglethrost
Copy link
Copy Markdown
Contributor

@eaglethrost eaglethrost commented Mar 30, 2026

🎫 Resolve RM-15880

Summary

There was an mdxish doc that was creating a messy markdown (from the "View as markdown" option), this usually happens if there's an error in the pipeline rendering:

Screenshot 2026-03-30 at 3 28 36 pm

Found the issue to be with <HTMLBlock> elements. <HTMLBlock>{ backtick...backtick }</HTMLBlock> expression caused stripComments to error with "Unexpected end of file in expression" when mdxish: true, because the mdxExpression micromark parser can't handle JS template literals at the text level. This is fine with MDX because it uses remarkMdx which takes care of these JSX syntaxes.

To fix this, I considered that I don't think we actually need to touch HTMLBlocks in strip comments, because I believe it should be untouched, and if we're already not touching HTML magic blocks, it would make sense to not touch MDX magic blocks. Given that, the simplest fix is to just extract & replace HTMLBlocks like we do for magic blocks, for mdxish specifically.

Let me know if we don't actually want that and should not preprocess the HTMLBlock though.

Changes:

  • Extracted HTMLBlock protect/restore logic used in preprocess-jsx-expressions.ts and mdxish-html-blocks.ts into a shared lib/utils/extractors/html-blocks.ts module
  • Used the extractor in stripComments as well, for mdxish
  • Improved the marker format of HTML comments by adding more special characters, to reduce change of overlap

Testing

  1. HTMLBlock in mdxish strip comments not erroring anymore
  • Test in the markdown playground with mdxish and strip comments options on, various HTMLBlock examples
<HTMLBlock>{`
<!-- comment -->
<div>Hello world</div>
`}</HTMLBlock>
  1. HTMLBlock in mdxish still behave the same, the refactor shouldn't regress anything for rendering
  2. Good to also link this markdown to the main repo, and test it on an mdxish doc with HTMLBlock content and view its markdown. The markdown shouldn't get render a bunch of HTML tree and just clean markdown content, with the retained:
Screenshot 2026-03-30 at 3 24 36 pm

@eaglethrost eaglethrost requested review from maximilianfalco and rafegoldberg and removed request for rafegoldberg March 30, 2026 04:29
Copy link
Copy Markdown
Contributor

@maximilianfalco maximilianfalco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice thanks for finding this out! logic looks good to me but i do wonder that this feels more like a bandaid than an actual fix. l wonder if we can do something like I did in #1371 where we make a tokenizer specifically just to keep the <HTMLBlock> token from being parsed by another transformer

the actual logic can still just live in mdxish-html-blocks.ts but we create a tokenizer essentially to guard the <HTMLBlock> from being parsed by the mdxJsxExpression tokenizer.

i prefer moving away from the whole protect and restore paradigm we have but i do understand if we want to get a bandaid fix rolled out for the meantime.

Comment thread lib/stripComments.ts Outdated
Comment thread lib/utils/extractors/html-blocks.ts Outdated
Comment thread lib/utils/extractors/html-blocks.ts Outdated
@eaglethrost
Copy link
Copy Markdown
Contributor Author

eaglethrost commented Mar 30, 2026

nice thanks for finding this out! logic looks good to me but i do wonder that this feels more like a bandaid than an actual fix. l wonder if we can do something like I did in #1371 where we make a tokenizer specifically just to keep the token from being parsed by another transformer

Yeah I agree, a tokenizer for HTMLBlock is definitely the way to go in the future and should just follow the example in Tables. Though, a consideration I had is that doing the tokenizer to fix this issue would I think be overkill since if we were to integrate the tokenizer, it would create quite a lot of changes with big radius. We have quite a lot of code to handle html blocks in mdxish and quite a few things would change, so it will require more time & I want to get this fix out as soon as possible. Also the extractor code change in this PR should persist since it's a cleanup.

I will definitely do that in a follow up! And there we can remove the extract & replace in the stripComments, but in the meantime the issue would be already fixed. Thanks for the suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants