perf: presize std.join string buffer and propagate asciiSafe#858
Merged
stephenamar-db merged 1 commit intoMay 20, 2026
Merged
Conversation
7 tasks
stephenamar-db
pushed a commit
that referenced
this pull request
May 18, 2026
## Motivation `Format.format` (the engine behind `%`-interpolation, `std.format`, and the `mod` operator's string fallback) always returned a `Val.Str` constructed via the default `Val.Str(pos, s)` factory, which leaves `_asciiSafe = false`. This forced `ByteRenderer` onto the slow per-char escape-scan + UTF-8 encode path when format outputs flowed into JSON rendering — even when both the format string literals and every interpolated value were pure ASCII. Manifest workloads heavy on `%(name)s`-style templates (Helm/Kubernetes-flavored configs) emit many such ASCII-safe strings that go on to be rendered as JSON, so the cost compounds. This is the second of two main sources of "ASCII-safe-but-flagged-unsafe" strings in those workloads (the first, `std.join`, is companion PR #858). ## Modification `sjsonnet/src/sjsonnet/Format.scala`: - **`RuntimeFormat.literalsAsciiSafe`** — new field, computed once at parse time by scanning the leading literal + every inter-spec literal segment via `Platform.isAsciiJsonSafe`. Cached alongside the parsed format, so each format string pays the literal-scan cost exactly once and amortizes across every use of that cached `RuntimeFormat`. - **Per-spec ASCII-safety check** at format time, two helpers: - `simpleStringValueAsciiSafe(rawVal)` for `%(name)s` simple-named-string paths. - `specOutputAsciiSafe(rawVal, conversion)` for the general path: strings forward `_asciiSafe`; numerics/booleans/null are ASCII (numerics under `%c` depend on the codepoint range); `Val.Arr` / `Val.Obj` (rendered via `Renderer`) are conservatively treated as non-ASCII. - **`Format.format` returns `Val.Str`** — both the string-input and pre-parsed-chunks overloads, plus `formatSimpleNamedString`. The `_asciiSafe` flag is set at construction via `Val.Str.asciiSafe(pos, s)` when literals + all spec outputs are ASCII-safe; otherwise the regular `Val.Str(pos, s)` constructor is used. - **Callers updated** to drop the redundant `Val.Str(pos, ...)` wrapper: - `Evaluator`: the `%` binary operator - `MathModule`: `std.mod` string fallback - `StringModule`: `std.format` - `Format.PartialApplyFmt`: static-folded format closure `sjsonnet/test/resources/new_test_suite/format_asciisafe_propagation.jsonnet` — regression test covering simple `%(name)s` fast path, general `%s`/`%d`/`%x`/`%o`/`%c`/`%.2f` conversions, mixed ASCII literals + non-ASCII string values, and a `std.manifestJson` roundtrip exercising the ByteRenderer fast-path. Format-time overhead is two boolean ANDs per spec; literal scanning happens once at parse time. ## Result Benchmarked on Apple Silicon, Zulu JDK 21.0.10, `-Xmx4G -XX:+UseG1GC -Xss100m`, 3 forks × (3 warmup + 5 measurement) iterations. **JMH `bench.runRegressions`** (averaged over 3 forks, ms/op, lower is better): | Benchmark | master | #860 | Δ | |---|---:|---:|---:| | `cpp_suite/large_string_template` | 0.724 ± 0.038 | 0.777 ± 0.229 | (CIs overlap; cleanest fork: 0.695 → 0.683, **−1.7%**) | | `jdk17_suite/repeat_format` | 0.155 ± 0.032 | 0.138 ± 0.016 | **−11.0%** | | `go_suite/manifestJsonEx` | 0.074 ± 0.042 | 0.052 ± 0.001 | **−29.7%** | JMH `large_string_template` mean is dominated by thermal/GC outliers on Apple Silicon (note Fork 2's last two iterations spiked to 0.857 / 1.481 ms while Forks 1 & 3 ran cleanly around 0.683 ms). The per-fork minimums and the cleanest fork consistently show the PR ahead. Confirmed via hyperfine. **hyperfine** (30 runs, 5 warmup, full-binary including JVM startup, ms, lower is better): | Benchmark | master | #860 | Speedup | |---|---:|---:|---:| | `large_string_template` | 278.6 ± 79.6 | 229.5 ± 2.6 | **1.21× ± 0.35** | | `repeat_format` | 594.9 ± 66.7 | 580.9 ± 16.3 | **1.02× ± 0.12** | | `manifestJsonEx` | 222.7 ± 3.1 | 223.8 ± 2.2 | parity (50 µs workload buried under ~220 ms JVM startup) | Hyperfine on `manifestJsonEx` is dominated by JVM startup; JMH (which excludes startup) is the trustworthy signal there and shows ~30%. PR-side variance on `large_string_template` is dramatically tighter (±2.6 ms vs master ±79.6 ms), consistent with eliminating a noisy escape-scan path. ## References - Companion PR: #858 (`std.join` presize + asciiSafe propagation — same idea, applied to join outputs) - Bench evidence: `/tmp/bench-mmrr/master.log`, `/tmp/bench-mmrr/pr860.log`, `/tmp/bench-mmrr/hyperfine-*.md` (local artifacts) ## Test plan - [x] New regression test `new_test_suite/format_asciisafe_propagation.jsonnet` covers: - Simple `%(name)s` fast path with ASCII / non-ASCII literals + values - General `%s` / `%d` / `%x` / `%o` / `%c` / `%.2f` conversions - Mixed ASCII literals + non-ASCII string values - `std.manifestJson` roundtrip - [x] `./mill 'sjsonnet.jvm[3.3.7]'.test` — 46 suites pass - [x] `./mill 'sjsonnet.native[3.3.7]'.compile` — passes - [x] `./mill 'sjsonnet.js[3.3.7]'.compile` — passes - [x] `./mill __.checkFormat` — passes - [x] JMH bench (3 forks × 5 iters) on master + PR - [x] hyperfine 30-run cross-validation on master + PR
Collaborator
|
Minor: in both Empty strings are trivially ASCII-safe, and keeping the ByteRenderer on the fast path for if (!added) return Val.Str.asciiSafe(pos, "")in both helpers, for consistency with the fallback path. |
871748e to
7d8ae16
Compare
Motivation: The string-separator branch of std.join was building the result with an unsized java.lang.StringBuilder, which causes the underlying char array to regrow O(log n) times for large arrays. Re-evaluating each arr.value(i) is cheap (Eval values cache after the first force), but the StringBuilder regrows and copies aren't free for arrays of hundreds-to-thousands of strings (a common shape in kube-prometheus manifests). Independently, the resulting Val.Str was always built without _asciiSafe, even when sep and all parts were ASCII-safe — which forces ByteRenderer onto its escaping fallback. Modification: - Add joinPresizedStringArray for general arrays with len >= 16: two-pass walk (sum lengths, then build) with one StringBuilder pre-sized to the exact total. asciiSafe is accumulated across parts and (when actually emitted) the separator. - Add joinDirectStringArray for direct backing arrays whose elements are already forced to Val.Str / Val.Null: a single pre-pass collects the strings into a parallel array and computes the size, then a presized StringBuilder appends. Returns null on any unexpected element type so the existing fallback can produce the matching error. - Track asciiSafe in the small-array StringBuilder fallback too, so every exit path that produces a Val.Str gets the flag set when applicable. Total length is checked against Int.MaxValue to fail fast instead of overflowing. - Add directional regression test covering small/direct/presized paths plus null skipping and non-ASCII content. Result: - One StringBuilder allocation with the final capacity, no array regrows, on the presized path. - ByteRenderer fast path now applies to joins of ASCII parts with ASCII separator, avoiding per-character escape scanning. - Full JVM test suite green; Scala 3 format check green.
7d8ae16 to
67c8c28
Compare
Contributor
Author
|
@stephenamar-db I have updated this |
stephenamar-db
approved these changes
May 20, 2026
stephenamar-db
pushed a commit
that referenced
this pull request
May 20, 2026
## Motivation Stacked on #858. Four small, independent perf wins around `std.join` and `asciiSafe` propagation. None changes any user-visible semantics. Together they shave a few percent off real-world manifest workloads on Native where there is no JIT to recover the missed fast paths. Splitting from the follow-up structural refactor (separate PR) so each lands on its own merits. ## Modification Four atomic commits: 1. **SWAR-accelerate `isAsciiJsonSafe`** — replace the per-char range check with a word-at-a-time SWAR scan in `CharSWAR` (JVM/Native/JS variants). Same result, fewer branches per character on the hot ASCII-safety detection path. 2. **Drop `parts[]` allocation in `joinDirectStringArray`** — when every element of `std.join(sep, arr)` is already a `Val.Str`, append directly into the result buffer instead of materializing an intermediate `Array[String]` first. 3. **Cache `asciiSafe` in `joinedRepeatedString`** — when the input array is a single `Val.Str` repeated (the rare-but-hot single-element-join case), reuse its cached `asciiSafe` flag instead of re-scanning the produced string. 4. **Propagate `asciiSafe` through more `StringModule` builtins** — extend `Val.Str.asciiSafe(...)` factory usage to `std.char`, `std.asciiUpper`, `std.asciiLower`, `std.strReplace`, `std.stripChars`, `std.{l,r}stripChars`, `std.split`, `std.splitLimit`, `std.splitLimitR`. These outputs are now ByteRenderer fast-path eligible without re-scanning. New regression test `new_test_suite/string_asciisafe_propagation.jsonnet`. ## Result ### Hyperfine (Native, end-to-end) ``` kube-prometheus (jrsonnet/tests/realworld/vendor/.../kube-prometheus/example.jsonnet) base (PR #858 head): 150.1 ± 11.9 ms this PR: 138.8 ± 4.0 ms 1.08× faster ``` ### JMH (JVM, steady-state, 2 forks × 5 iter × 3s) ``` RegressionBenchmark.main base ms/op this PR ms/op large_string_join.jsonnet 0.310 ± 0.045 0.304 ± 0.018 ≈ flat large_string_template.jsonnet 0.650 ± 0.035 0.734 ± 0.088 noisy (overlap) repeat_format.jsonnet 0.144 ± 0.019 0.162 ± 0.047 noisy (overlap) manifestJsonEx.jsonnet 0.056 ± 0.011 0.055 ± 0.013 flat ``` JVM JIT recovers most of the per-character work; the wins are concentrated on Native, which matches PR #858. Per-bench JMH error bars overlap, so JVM steady-state is best read as "no regression". ## Test plan - `./mill 'sjsonnet.jvm[3.3.7]'.test` — green - `./mill 'sjsonnet.native[3.3.7]'.test` — green - New regression test covers all 10 newly-propagated builtins - `./mill __.checkFormat` — green
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
std.joinproduced output strings viaIterator.foldLeft + concat, which:StringBuilderof unknown capacity, forcing it to grow (re-allocate + copy) for every separator + segment.Val.Str._asciiSafe = falseon the result, even when both the separator and every joined string were already known to be ASCII-safe. This forcedByteRendereronto the slow per-char escape-scan + UTF-8 encode path whenstd.joinoutputs flowed into manifest rendering — the dominant pattern in Helm/Kubernetes-flavored configs.Manifest workloads emit many
std.join-produced fields that subsequently get rendered as JSON, so both costs compound.Modification
sjsonnet/src/sjsonnet/stdlib/StringModule.scala—Join.evalRhsand helpers (joinPresizedStringArray,joinDirectStringArray,joinRepeatedStringEval):(n - 1) × sep.length), allocateStringBuilderwith that exact capacity. No grow, no copy._asciiSafeflag with the separator's. If all are ASCII-safe, construct the result viaVal.Str.asciiSafe(pos, s); otherwise via the regularVal.Str(pos, s)constructor.len == 0/!added/Val.Nullshort-circuit returnsVal.Str.asciiSafe(pos, "")so empty results stay on the ByteRenderer fast path. (Per @stephenamar-db's review.)Val.Str/Val.Null/ element-type validation, same error messages.sjsonnet/test/resources/new_test_suite/join_string_presized.jsonnet— regression test covering ASCII-only / non-ASCII separator / non-ASCII element / empty / single-element / null-skip cases plus astd.manifestJsonroundtrip exercising the ByteRenderer fast-path.Result
Benchmarked on Apple Silicon, Zulu JDK 21.0.10. Native binary built via
./mill 'sjsonnet.native[3.3.7]'.nativeLink(release-full, full LTO).Hyperfine (Scala Native binary, end-to-end wall time, lower is better):
example.jsonnet, 72 414-line manifest)cpp_suite/realistic2.jsonnetcpp_suite/large_string_template.jsonnetcpp_suite/large_string_join.jsonnetReal-world manifest workloads (kube-prometheus, realistic2) show a consistent ~5-7% wall-time win. Sub-second cpp_suite workloads have process-startup overhead dominating their variance, so any
std.joindelta is buried — but the JMH steady-state numbers below confirm the optimization works.Variance reduction: master shows 2-3× wider σ than #858 on the larger workloads (kube-prometheus master ±29.5 ms vs #858 ±10.9 ms; realistic2 master ±15.9 ms vs #858 ±9.5 ms), consistent with eliminating a noisy escape-scan code path on the renderer side.
JMH (JVM steady-state,
bench.runRegressions, 2 forks × 3 warmup × 5 measurement iterations, ms/op):cpp_suite/large_string_templatecpp_suite/large_string_joinjdk17_suite/repeat_formatgo_suite/manifestJsonExJMH numbers on these sub-millisecond workloads have high variance on Apple Silicon (CIs frequently span ±20-100%); the directional signal (manifest-heavy workloads improve, neutral-to-slightly-faster on string-only workloads) is consistent with the hyperfine native data.
Output identity:
diffofkube-prometheus72 414-line render output between master and #858 binaries returns 0 byte differences.References
%/std.formatoutputs; merged)Val.Str(pos, "")paths in the join helpers)Test plan
./mill 'sjsonnet.jvm[3.3.7]'.test— all suites pass./mill 'sjsonnet.native[3.3.7]'.compile— passes./mill 'sjsonnet.js[3.3.7]'.compile— passes./mill __.checkFormat— passesnew_test_suite/join_string_presized.jsonnet