perf: scan splitLimitR from the right#842
Merged
stephenamar-db merged 1 commit intoMay 12, 2026
Merged
Conversation
Motivation: std.splitLimitR previously reversed the input and separator, split from the left, then reversed every output piece. That preserved most behavior but added avoidable string copies on bounded right-split workloads and missed the official CPP Jsonnet overlap behavior for maxsplits == -1. Modification: Add a direct right-to-left scanner for std.splitLimitR, with a preallocated bounded path for common maxsplits values and an array-builder path for unbounded right scans. Keep maxsplits == -1 delegated to splitLimit so overlapping separators match official CPP Jsonnet semantics, and add regression coverage for Unicode, trailing separators, overlap, and splitLimit vs splitLimitR direction. Result: Full ./mill --no-server --ticker false --color false -j 1 __.test passed. JVM split_resolve mixed benchmark improved 0.159 ms/op to 0.148 ms/op. Scala Native splitLimitR repeat improved 12.4 +/- 0.4 ms to 7.0 +/- 0.4 ms, and the branch ran 1.77x faster than local source-built jrsonnet on the same workload.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
std.splitLimitRpreviously implemented right splits by reversing the input string and separator, calling the left-to-rightsplitLimit, then reversing every output segment again. That creates avoidable string copies on bounded right-split workloads and is unfriendly to both JVM JIT and Scala Native/LLVM because the hot path is dominated by whole-string reversals rather than a tight index scan.This also fixes a compatibility edge case discovered during review: official CPP Jsonnet treats
maxsplits == -1as the forward unlimited split behavior for overlapping separators.Key Design Decision
Use a direct right-to-left scanner for the normal right-split paths, but keep
maxsplits == -1delegated to the existing left-to-rightsplitLimitimplementation to match official CPP Jsonnet overlap semantics.For bounded splits up to 4096, the implementation fills a preallocated result array from the right and trims only if fewer splits are found. Larger or unbounded right scans use an
ArrayBuilderand reverse the small result array, avoiding whole-input and per-segment reversals.Modification
str.reverse/splitLimit/ segment reverse implementation instd.splitLimitR.maxsplits == -1behavior by delegating tosplitLimit.splitLimitvssplitLimitRdirection.Benchmark Results
JVM / JMH mixed split workload
Command:
bench/resources/jdk17_suite/split_resolve.jsonnetNote: this is a mixed workload containing
split,splitLimit,splitLimitR,resolvePath, and joins, so it is a conservative JVM signal rather than an isolated splitLimitR-only benchmark.Scala Native / hyperfine splitLimitR repeat
Benchmark expression repeatedly evaluates bounded right splits over a
::-joined 1024-part string withmaxsplits = 512.Command shape:
Scala Native / hyperfine vs jrsonnet
Analysis
The previous implementation copied the full input once for
str.reverse, copied the separator, allocated split pieces on the reversed string, then copied every segment again while reversing each piece back. The new implementation scans indexes from the right and only allocates final substrings, so the common bounded path becomes a predictable tight loop with fewer allocations and better cache behavior.The
maxsplits == -1branch intentionally keeps the existing left-to-right split path because official CPP Jsonnet returns["", "a"]forstd.splitLimitR("aaa", "aa", -1). Other negative values, such as-2, continue to use right-to-left unlimited splitting and return["a", ""]for the same overlapping input.References
jsonnet -e 'std.splitLimitR("aaa", "aa", -1)'returns["", "a"].b9ecb1d9 perf: scan splitLimitR from the right.Result
./mill --no-server --ticker false --color false -j 1 __.testpassed with 438 tests. Formatting was applied with__.reformat, andgit diff --checkpassed.