Skip to content

Fix UB and locale-dependent behavior in StringHelper::toLowerCase#687

Open
metsw24-max wants to merge 1 commit into
apache:masterfrom
metsw24-max:stringhelper-tolower-ub-and-locale-dependence
Open

Fix UB and locale-dependent behavior in StringHelper::toLowerCase#687
metsw24-max wants to merge 1 commit into
apache:masterfrom
metsw24-max:stringhelper-tolower-ub-and-locale-dependence

Conversation

@metsw24-max
Copy link
Copy Markdown
Contributor

Fix undefined behavior and locale-dependent output in StringHelper::toLowerCase.

The previous implementation passed LogString characters directly to ::tolower(int) via std::transform. This violates the C standard requirement that the argument must either be EOF or representable as unsigned char.

When logchar = char (commonly signed), any byte greater than 0x7F sign-extends to a negative int, triggering undefined behavior. The behavior also depended on the active LC_CTYPE locale, causing the same configuration file to produce different lowercased values on different systems.

This patch replaces the locale-sensitive transformation with deterministic ASCII-only folding (A-Z -> a-z) while preserving all non-ASCII bytes unchanged.

Changes

src/main/cpp/stringhelper.cpp

  • Replaced:

    • std::transform(..., tolower)
  • With:

    • deterministic ASCII-only lowercase conversion
  • Eliminates UB from invalid tolower inputs

  • Removes locale-dependent behavior

  • Preserves non-ASCII bytes unchanged

src/test/cpp/helpers/stringhelpertestcase.cpp

Added regression coverage:

  • testToLowerCaseAscii

    • verifies normal ASCII lowercase conversion
  • testToLowerCaseNonAsciiPassesThrough

    • verifies non-ASCII bytes remain unchanged
    • validates locale-independent behavior

Reproducer

With the original implementation:

  • testToLowerCaseNonAsciiPassesThrough fails on systems using locales such as English_India.1252

  • Example:

    • 0xC9 (É) may be transformed into 0xE9 (é) through locale-sensitive tolower

With this patch:

  • all stringhelpertestcase tests pass
  • patternparsertestcase also passes unchanged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant