Skip to content

Conversation

@XingY
Copy link
Contributor

@XingY XingY commented Nov 14, 2025

Rationale

Surrogate pair characters (such as certain emoji characters) are represented by 2 characters. String.substring operates at character level, which might result in surrogate pair being split, resulting in undesired outcome. This PR switch usages of substring(0, staticEnd) to leftSurrogatePairFriendly, so avoid ending the substring in the middle of a surrogate pair.

Related Pull Requests

Changes

@XingY XingY requested a review from labkey-nicka November 18, 2025 18:18
value = String.valueOf(o);
if (value.length() > 100)
value = value.substring(0, 100) + ". . .";
value = StringUtilsLabKey.leftSurrogatePairFriendly(value, 100) + ". . .";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This isn't a part of these changes but the names of these utility functions does not describe what they do. Consider a rename refactor to something like leftSurrogateSafeTruncate() and rightSurrogateSafeTruncate().

import org.labkey.api.util.NetworkDrive;
import org.labkey.api.util.PageFlowUtil;
import org.labkey.api.util.Path;
import org.labkey.api.util.StringUtilsLabKey;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is this line in PageFlowUtil. Does it need to use these methods?

if (ext != null && mimeType != null)
{
    fileName = fileName.substring(0, fileName.length() - (ext.length() + 1)) + mimeType.getExtension();
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this one needs it. ext and mimeType both are unlikely to have special characters, so the endInd should be safe.

if (titleCase)
{
return noun.substring(0, 1).toUpperCase() + noun.substring(1);
return StringUtilsLabKey.leftSurrogatePairFriendly(noun, 1).toUpperCase() + StringUtilsLabKey.rightSurrogatePairFriendly(noun, noun.length() - 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, how should someone know if they should use substring() versus using one of these utility methods? For example, I still see a lot of usages of .substring(0, some of which seem eligible for switching out for these utilities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like discussed, I think generally a static constant endIndex is a good candidate for using those new methods. When paired with indexOf... as end, then probably substring is better. But each case needs to be evaluated separately to prioritize overflow/accuracy, etc.

@XingY XingY merged commit 6a5d527 into develop Nov 18, 2025
13 checks passed
@XingY XingY deleted the fb_surrogateChar branch November 18, 2025 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants