Skip to content

Comments

Optimized CommentedString.validString#1067

Open
ChrisBenua wants to merge 1 commit intotuist:mainfrom
ChrisBenua:optimisedCommentedString
Open

Optimized CommentedString.validString#1067
ChrisBenua wants to merge 1 commit intotuist:mainfrom
ChrisBenua:optimisedCommentedString

Conversation

@ChrisBenua
Copy link

@ChrisBenua ChrisBenua commented Feb 20, 2026

Resolves #1066

Short description 📝

Optimised CommendString.validString

Solution 📦

  • Replace StringProtocol.contains with checking that one UInt8 array (specifically UnsafeBufferPointer) is subarray of another. Why this approach is faster? Because StringProtocol.contains spends most of its time on string indexing (which requires much logic to preserve grapheme clusters).
  • Replace rangeOfCharacter with checking that UInt8 array (specifically UnsafeBufferPointer) contains specific value.
  • Instead of calling append for each string character in reduce function - calculate needed capacity upfront and create String using String(unsafeUninitializedCapacity:) init

Implementation 👩‍💻👨‍💻

Detail in a checklist the steps that you took to implement the PR.

  • Replace StringProtocol.contains and rangeOfCharacters with UnsafeBufferPointer<UInt8> operations
  • Get rid of reduce + append. It required too much memory allocation. Instead - calculate needed capacity upfront and create String using String(unsafeUninitializedCapacity:) init

Performance

Performance Comparison

Measured with M2 Max 32 GB RAM, MacOS 26.2, release build. If I run these benchmarks many times, average deviation is about 3-4%

Previous validString implementation performance:

simple alphanumeric (fast path)       0.47 µs/call (0.047 s total, 100000 iters)
underscore+slash (valid, special)     1.30 µs/call (0.130 s total, 100000 iters)
// and ___ (needs quoting)            0.77 µs/call (0.077 s total, 100000 iters)
needs escaping (\" \n \t \\)          0.74 µs/call (0.074 s total, 100000 iters)
spaces (invalid chars, no escape)     0.39 µs/call (0.039 s total, 100000 iters)
"true"                                0.01 µs/call (0.001 s total, 100000 iters)
"false"                               0.01 µs/call (0.001 s total, 100000 iters)
long clean (2200 chars)               119.46 µs/call (11.946 s total, 100000 iters)
long dirty (1500 chars, escaping)     36.50 µs/call (3.650 s total, 100000 iters)
mixed batch (10 strings)              161.31 µs/call (8.065 s total, 50000 iters)

Introduced validString implementation performance:

simple alphanumeric (fast path)       0.03 µs/call (0.003 s total, 100000 iters) - 15x boost
underscore+slash (valid, special)     0.06 µs/call (0.006 s total, 100000 iters) - 20x boost
// and ___ (needs quoting)            0.09 µs/call (0.009 s total, 100000 iters) - 8x boost
needs escaping (\" \n \t \\)          0.09 µs/call (0.009 s total, 100000 iters) - 8x boost
spaces (invalid chars, no escape)     0.07 µs/call (0.007 s total, 100000 iters) - 5x boost
"true"                                0.01 µs/call (0.001 s total, 100000 iters) - same
"false"                               0.00 µs/call (0.000 s total, 100000 iters) - same
long clean (2200 chars)               5.50 µs/call (0.550 s total, 100000 iters) - 20x boost
long dirty (1500 chars, escaping)     2.30 µs/call (0.230 s total, 100000 iters) - 16x boost
mixed batch (10 strings)              8.12 µs/call (0.406 s total, 50000 iters) - 20x boost

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Feb 20, 2026
@dosubot
Copy link

dosubot bot commented Feb 20, 2026

Related Documentation

Checked 1 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

@ChrisBenua
Copy link
Author

@dosubot could you review this PR? It optimized CommentedString.validString calculation by avoiding String indexing overhead by using UnsafeBufferPointer-s instead. Also avoid multiple String allocations by replacing reduce and string concatenation by calculating needed capacity upfront and create String using String(unsafeUninitializedCapacity:) init

@dosubot
Copy link

dosubot bot commented Feb 20, 2026

Thanks for working on this performance optimization! The 5-20x improvement benchmarks are impressive, especially the 20x boost for long strings (119µs → 5.5µs). Here's my review:

✅ What Looks Good

  • Optimization strategy is sound: Replacing StringProtocol.contains and rangeOfCharacter with UnsafeBufferPointer<UInt8> operations avoids expensive grapheme cluster handling
  • Memory-efficient string building: Using String(unsafeUninitializedCapacity:) with upfront capacity calculation eliminates repeated allocations from reduce + append
  • Performance impact is justified: Since validString is called during project encoding, this could save significant time on large projects

⚠️ Points to Verify

Memory Safety:

  • Ensure UnsafeBufferPointer operations stay within withContiguousStorageIfAvailable closures (no escaping pointers)
  • Handle the fallback case if withContiguousStorageIfAvailable returns nil
  • Verify exact byte count is returned from String(unsafeUninitializedCapacity:)

Correctness:

  • Pattern matching for // and ___ must handle boundary cases correctly at the byte level
  • Capacity calculation must account for worst-case escaping scenarios
  • All escape characters (\, ", \t, \n) are ASCII single-byte, so byte-level escaping is safe

Edge Cases to Test:

  • Current tests cover 51 scenarios but consider adding:
    • Long strings (1500-2200 chars matching your benchmarks)
    • Multi-byte UTF-8 characters (emoji, combining marks)
    • High escape density strings
    • Empty/single character edge cases

💡 Suggestion

The codebase has a good precedent for safe UnsafeBufferPointer usage in String+md5.swift - following those patterns (scoped wrappers, explicit cleanup) would be great.

Overall this looks like a valuable optimization! The complexity is justified by the significant performance gains. 👍

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other

Chat with Dosu Open in Cursor Join Discord Share on X

Copy link
Member

@fortmarek fortmarek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤯

Thanks a lot for this PR, the improvements are impressive. I have a couple minor nits, but otherwise, looks good!

// calculate exact size
let escapedCapacity = buf.escapedCommentCapacity

if #available(iOS 14.0, macOS 11.0, tvOS 14.0, watchOS 7.0, visionOS 1.0, *) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can bump the minimum versions in Package.swift, we don't need to support platforms older than this.

var escapedCommentCapacity: Int {
var escapeCount = 0

for ch in self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for ch in self {
for character in self {

would limit the usage of such short variable names (I would change this across the PR)


/// Calculates escaped string size
/// Basically, `count + count(where: { [.backslash, .doubleQuotes, .tab, .newline].contains($0) }`
var escapedCommentCapacity: Int {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're going through all chars of self, so I think this should be a method instead of a computed variable (same goes for the other computed properties in this extension)

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Feb 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve CommentedString.validString performance

2 participants