Tracking issue for a specific perf gap found while comparing sjsonnet (native, master) against jrsonnet (master). Parent comparison: #666.
Observation
std.substr is 1.78× slower than jrsonnet on a tight loop.
Scenario: bench/resources/go_suite/substr.jsonnet — calls std.substr 101 times on a ~4 KB string.
|
mean |
min |
range |
| sjsonnet (native) |
4.8 ± 0.5 ms |
4.0 ms |
4.0–5.7 |
| jrsonnet |
2.7 ± 0.6 ms |
1.9 ms |
1.9–4.0 |
Repro:
hyperfine --warmup 2 --runs 10 -N \
"sjsonnet bench/resources/go_suite/substr.jsonnet" \
"jrsonnet bench/resources/go_suite/substr.jsonnet"
Code
sjsonnet/src/sjsonnet/stdlib/StringModule.scala:130-180 — Substr builtin. For each call it allocates a new java.lang.String via str.substring(...) and wraps it in Val.Str.
Hypothesis
Two sources of overhead per call:
String.substring always copies on modern JVMs (the shared-char[] optimization was removed in Java 7u6). For 101 substrings of a 4 KB string, that's 101 × up-to-4 KB allocations.
Val.Str wrapping + codepoint-length path on the resulting string when it may already be ASCII-safe (the original is).
jrsonnet's strings are UTF-8 &str slices into the original buffer — no copy for a substring.
Directions
- For ASCII-safe inputs, skip the codepoint re-scan and use
Val.Str.asciiSafe.
- Explore a lightweight "string slice" value (offset + length + base
String) for hot substring workloads, materialized to a real String only at render/materialize time. This is a bigger change — open question whether the allocation win justifies the complexity against the rest of the stdlib that assumes String.
- Cache
Val.Str(empty) for len == 0.
Part of the jrsonnet-parity effort tracked in #666.
Tracking issue for a specific perf gap found while comparing sjsonnet (native, master) against jrsonnet (master). Parent comparison: #666.
Observation
std.substris 1.78× slower than jrsonnet on a tight loop.Scenario:
bench/resources/go_suite/substr.jsonnet— callsstd.substr101 times on a ~4 KB string.Repro:
Code
sjsonnet/src/sjsonnet/stdlib/StringModule.scala:130-180—Substrbuiltin. For each call it allocates a newjava.lang.Stringviastr.substring(...)and wraps it inVal.Str.Hypothesis
Two sources of overhead per call:
String.substringalways copies on modern JVMs (the shared-char[]optimization was removed in Java 7u6). For 101 substrings of a 4 KB string, that's 101 × up-to-4 KB allocations.Val.Strwrapping + codepoint-length path on the resulting string when it may already be ASCII-safe (the original is).jrsonnet's strings are UTF-8
&strslices into the original buffer — no copy for a substring.Directions
Val.Str.asciiSafe.String) for hot substring workloads, materialized to a realStringonly at render/materialize time. This is a bigger change — open question whether the allocation win justifies the complexity against the rest of the stdlib that assumesString.Val.Str(empty)forlen == 0.Part of the jrsonnet-parity effort tracked in #666.