Skip to content

Conversation

@jakebailey
Copy link
Member

I'm on a quest to eliminate regexes from the shipped code as much as possible.

This one is not trivial, but it's not too bad either when split up.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces regex-based HTML entity parsing with handwritten code to eliminate the regexp2 dependency from the JSX transformer. The change maintains the same functionality while using a manual string parsing approach.

Changes:

  • Removed dependency on github.com/dlclark/regexp2 package
  • Replaced htmlEntityMatcher regex and htmlEntityReplacer function with handwritten parsing logic
  • Split implementation into decodeEntities (main parsing loop) and decodeEntity (single entity decoder)

if word != nil && word.Capture.String() != "" {
res, ok := entities[word.Capture.String()]

entity := text[1:semi]
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The slice operation text[1:semi] is incorrect. At this point, text has already been sliced to start at the & character (line 871: text = text[i:]), and semi is the index of ; within this already-sliced text. Therefore, text[1:semi] correctly extracts the entity name between & and ;. However, on line 883, when the entity is not recognized, the code writes text[:semi+1] which includes the & at position 0 through the ; at position semi. This is correct. But consider the edge case where text = \"&;\" (empty entity). In this case, semi = 1, so entity = text[1:1] which is an empty string, and decodeEntity(\"\") returns (0, false). Then line 883 writes text[:2] which is \"&;\". This preserves the invalid entity correctly. The logic appears sound upon closer inspection.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hilarious circle

Copy link
Member

@weswigham weswigham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know, reading this is making me realize we probably need to be doing this in the parser stage so the .Text() of JsxText as read by the checker is post-entity-decoding - I think the only reason it ends up not mattering is because the literal type of a JsxText element isn't currently accessible - but if we ever did start typing jsx children better, it'd matter.

@jakebailey jakebailey added this pull request to the merge queue Jan 26, 2026
@jakebailey
Copy link
Member Author

Yeah, that would match other string values and so on...

Merged via the queue into main with commit 0a92d37 Jan 26, 2026
22 checks passed
@jakebailey jakebailey deleted the jabaile/jsx-entity-no-regex branch January 26, 2026 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants