perf: scan splitLimitR from the right by He-Pin · Pull Request #842 · databricks/sjsonnet

He-Pin · 2026-05-12T03:50:16Z

Motivation

std.splitLimitR previously implemented right splits by reversing the input string and separator, calling the left-to-right splitLimit, then reversing every output segment again. That creates avoidable string copies on bounded right-split workloads and is unfriendly to both JVM JIT and Scala Native/LLVM because the hot path is dominated by whole-string reversals rather than a tight index scan.

This also fixes a compatibility edge case discovered during review: official CPP Jsonnet treats maxsplits == -1 as the forward unlimited split behavior for overlapping separators.

Key Design Decision

Use a direct right-to-left scanner for the normal right-split paths, but keep maxsplits == -1 delegated to the existing left-to-right splitLimit implementation to match official CPP Jsonnet overlap semantics.

For bounded splits up to 4096, the implementation fills a preallocated result array from the right and trims only if fewer splits are found. Larger or unbounded right scans use an ArrayBuilder and reverse the small result array, avoiding whole-input and per-segment reversals.

Modification

Added a right-to-left separator scanner specialized for 1-char, 2-char, and longer separators.
Replaced the old str.reverse / splitLimit / segment reverse implementation in std.splitLimitR.
Preserved official maxsplits == -1 behavior by delegating to splitLimit.
Added regression coverage for Unicode separators, missing separators, trailing separators, multi-char separators, overlapping separators, and splitLimit vs splitLimitR direction.

Benchmark Results

JVM / JMH mixed split workload

Command:

./mill --no-server --ticker false --color false -j 1 bench.runRegressions bench/resources/jdk17_suite/split_resolve.jsonnet

Workload	master	this PR	Result
`bench/resources/jdk17_suite/split_resolve.jsonnet`	0.159 ms/op	0.148 ms/op	6.9% faster

Note: this is a mixed workload containing split, splitLimit, splitLimitR, resolvePath, and joins, so it is a conservative JVM signal rather than an isolated splitLimitR-only benchmark.

Scala Native / hyperfine splitLimitR repeat

Benchmark expression repeatedly evaluates bounded right splits over a ::-joined 1024-part string with maxsplits = 512.

Command shape:

hyperfine --warmup 3 --runs 25 -N \
  'sjsonnet-master -o /dev/null split_limitr_repeat.jsonnet' \
  'sjsonnet-this-pr -o /dev/null split_limitr_repeat.jsonnet'

Runtime	Mean +- sigma	Result
sjsonnet master Native	12.4 +- 0.4 ms	baseline
sjsonnet this PR Native	7.0 +- 0.4 ms	1.76x faster

Scala Native / hyperfine vs jrsonnet

Runtime	Mean +- sigma	Result
sjsonnet this PR Native	6.8 +- 0.4 ms	baseline
local source-built jrsonnet	12.1 +- 1.4 ms	sjsonnet is 1.77x faster

Analysis

The previous implementation copied the full input once for str.reverse, copied the separator, allocated split pieces on the reversed string, then copied every segment again while reversing each piece back. The new implementation scans indexes from the right and only allocates final substrings, so the common bounded path becomes a predictable tight loop with fewer allocations and better cache behavior.

The maxsplits == -1 branch intentionally keeps the existing left-to-right split path because official CPP Jsonnet returns ["", "a"] for std.splitLimitR("aaa", "aa", -1). Other negative values, such as -2, continue to use right-to-left unlimited splitting and return ["a", ""] for the same overlapping input.

References

Official CPP Jsonnet compatibility check: jsonnet -e 'std.splitLimitR("aaa", "aa", -1)' returns ["", "a"].
Local branch commit: b9ecb1d9 perf: scan splitLimitR from the right.

Result

./mill --no-server --ticker false --color false -j 1 __.test passed with 438 tests. Formatting was applied with __.reformat, and git diff --check passed.

Motivation: std.splitLimitR previously reversed the input and separator, split from the left, then reversed every output piece. That preserved most behavior but added avoidable string copies on bounded right-split workloads and missed the official CPP Jsonnet overlap behavior for maxsplits == -1. Modification: Add a direct right-to-left scanner for std.splitLimitR, with a preallocated bounded path for common maxsplits values and an array-builder path for unbounded right scans. Keep maxsplits == -1 delegated to splitLimit so overlapping separators match official CPP Jsonnet semantics, and add regression coverage for Unicode, trailing separators, overlap, and splitLimit vs splitLimitR direction. Result: Full ./mill --no-server --ticker false --color false -j 1 __.test passed. JVM split_resolve mixed benchmark improved 0.159 ms/op to 0.148 ms/op. Scala Native splitLimitR repeat improved 12.4 +/- 0.4 ms to 7.0 +/- 0.4 ms, and the branch ran 1.77x faster than local source-built jrsonnet on the same workload.

He-Pin marked this pull request as ready for review May 12, 2026 03:52

stephenamar-db merged commit 8c61c20 into databricks:master May 12, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: scan splitLimitR from the right#842

perf: scan splitLimitR from the right#842
stephenamar-db merged 1 commit into
databricks:masterfrom
He-Pin:perf/split-limit-r-right-scan

He-Pin commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

He-Pin commented May 12, 2026

Motivation

Key Design Decision

Modification

Benchmark Results

JVM / JMH mixed split workload

Scala Native / hyperfine splitLimitR repeat

Scala Native / hyperfine vs jrsonnet

Analysis

References

Result

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants