Skip to content

Fix HTMLStripCharFilter dropping content after =''>#15756

Merged
msfroh merged 3 commits intoapache:mainfrom
Seungmin123:fix-html-strip-char-filter
Mar 3, 2026
Merged

Fix HTMLStripCharFilter dropping content after =''>#15756
msfroh merged 3 commits intoapache:mainfrom
Seungmin123:fix-html-strip-char-filter

Conversation

@Seungmin123
Copy link
Copy Markdown
Contributor

Description

This PR fixes an issue where HTMLStripCharFilter fails to recognize the closing double quote of an attribute value if the value ends with an equals sign and is immediately followed by the tag closer (>). This causes the filter to incorrectly discard content until the next double quote.

Resolves #15754.

This fixes an issue where HTMLStripCharFilter fails to recognize the closing double quote of an attribute value if the value ends with an equals sign and is immediately followed by the tag closer (>).

Closes apache#15754
@github-actions github-actions Bot added this to the 11.0.0 milestone Feb 23, 2026
@msfroh
Copy link
Copy Markdown
Contributor

msfroh commented Feb 23, 2026

I think this looks promising. (The fact that existing test cases don't break is a good sign.) Thanks a lot @Seungmin123!

It looks like this was a side-effect of #11724. Can you double-check with the unit test in the PR opened by @mjustice3 in #13157? It looks very similar to your test cases, so I'm pretty sure it's the same bug.

Adds the testForIssue10520Regression test case originally from PR apache#13157
to verify the fix correctly handles general attribute values without breaking
backwards compatibility. Also adds a test case for single quoted attributes
ending with an equals sign to ensure comprehensive test coverage.
@Seungmin123 Seungmin123 force-pushed the fix-html-strip-char-filter branch from 798375f to c221d58 Compare February 24, 2026 01:55
@Seungmin123
Copy link
Copy Markdown
Contributor Author

Hi @msfroh, thank you for the helpful suggestion and the review!

I have added the testForIssue10520Regression test case from PR #13157 as you suggested, along with another edge case test for single-quoted attributes ending with an equals sign.

All tests, including existing ones and the newly added cases, pass successfully. I appreciate you pointing that out!

Comment thread lucene/CHANGES.txt Outdated
@github-actions github-actions Bot modified the milestones: 11.0.0, 10.5.0 Feb 26, 2026
@Seungmin123
Copy link
Copy Markdown
Contributor Author

Thanks for the review! I've moved the entry in CHANGES.txt to the Lucene 10.5.0 section as suggested for backporting.

@msfroh msfroh merged commit 02014a6 into apache:main Mar 3, 2026
14 checks passed
msfroh pushed a commit that referenced this pull request Mar 3, 2026
This fixes an issue where HTMLStripCharFilter fails to recognize the closing double quote of an attribute value if the value ends with an equals sign and is immediately followed by the tag closer (>).

Closes #15754
@msfroh
Copy link
Copy Markdown
Contributor

msfroh commented Mar 3, 2026

Backport commit to 10.x: b2640a8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HTMLStripCharFilter incorrectly discards content when attribute value ends with '=' followed by '>'

2 participants