Skip to content

JSONLayout escaping for malformed UTF-8 input#686

Open
jmestwa-coder wants to merge 1 commit into
apache:masterfrom
jmestwa-coder:jsonlayout-malformed-utf8-escaping
Open

JSONLayout escaping for malformed UTF-8 input#686
jmestwa-coder wants to merge 1 commit into
apache:masterfrom
jmestwa-coder:jsonlayout-malformed-utf8-escaping

Conversation

@jmestwa-coder
Copy link
Copy Markdown
Contributor

Summary

Fix JSONLayout escaping for malformed UTF-8 input.

Previously, malformed decoded input could bypass replacement escaping and preserve the original malformed bytes in emitted JSON output. This happened because the replacement character (U+FFFD) incorrectly followed the printable fast path.

This change ensures malformed decoded input and invalid Unicode scalar values are consistently emitted through the JSON escaping path as \ufffd.

Changes

  • Add escapeReplacement handling in JSONLayout::appendItem()
  • Ensure replacement characters bypass the printable fast path
  • Preserve existing behavior for valid UTF-8 input
  • Add regression coverage for malformed UTF-8 escaping behavior

Regression Test

Added a focused regression test for malformed UTF-8 input to verify malformed bytes no longer survive into JSON output and are emitted as escaped replacement characters instead.

@jmestwa-coder jmestwa-coder changed the title SONLayout escaping for malformed UTF-8 input JSONLayout escaping for malformed UTF-8 input May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant