-
Notifications
You must be signed in to change notification settings - Fork 7
Refactor string.substring usages to account for surrogate pair characters #7206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| value = String.valueOf(o); | ||
| if (value.length() > 100) | ||
| value = value.substring(0, 100) + ". . ."; | ||
| value = StringUtilsLabKey.leftSurrogatePairFriendly(value, 100) + ". . ."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This isn't a part of these changes but the names of these utility functions does not describe what they do. Consider a rename refactor to something like leftSurrogateSafeTruncate() and rightSurrogateSafeTruncate().
| import org.labkey.api.util.NetworkDrive; | ||
| import org.labkey.api.util.PageFlowUtil; | ||
| import org.labkey.api.util.Path; | ||
| import org.labkey.api.util.StringUtilsLabKey; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is this line in PageFlowUtil. Does it need to use these methods?
if (ext != null && mimeType != null)
{
fileName = fileName.substring(0, fileName.length() - (ext.length() + 1)) + mimeType.getExtension();
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this one needs it. ext and mimeType both are unlikely to have special characters, so the endInd should be safe.
| if (titleCase) | ||
| { | ||
| return noun.substring(0, 1).toUpperCase() + noun.substring(1); | ||
| return StringUtilsLabKey.leftSurrogatePairFriendly(noun, 1).toUpperCase() + StringUtilsLabKey.rightSurrogatePairFriendly(noun, noun.length() - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, how should someone know if they should use substring() versus using one of these utility methods? For example, I still see a lot of usages of .substring(0, some of which seem eligible for switching out for these utilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like discussed, I think generally a static constant endIndex is a good candidate for using those new methods. When paired with indexOf... as end, then probably substring is better. But each case needs to be evaluated separately to prioritize overflow/accuracy, etc.
Rationale
Surrogate pair characters (such as certain emoji characters) are represented by 2 characters. String.substring operates at character level, which might result in surrogate pair being split, resulting in undesired outcome. This PR switch usages of substring(0, staticEnd) to leftSurrogatePairFriendly, so avoid ending the substring in the middle of a surrogate pair.
Related Pull Requests
Changes