Skip to content

perf: incremental std.join + asciiSafe coverage improvements#861

Merged
stephenamar-db merged 4 commits into
databricks:masterfrom
He-Pin:perf/asciisafe-roi
May 20, 2026
Merged

perf: incremental std.join + asciiSafe coverage improvements#861
stephenamar-db merged 4 commits into
databricks:masterfrom
He-Pin:perf/asciisafe-roi

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented May 16, 2026

Motivation

Stacked on #858. Four small, independent perf wins around std.join and
asciiSafe propagation. None changes any user-visible semantics.

Together they shave a few percent off real-world manifest workloads on Native
where there is no JIT to recover the missed fast paths. Splitting from the
follow-up structural refactor (separate PR) so each lands on its own merits.

Modification

Four atomic commits:

  1. SWAR-accelerate isAsciiJsonSafe — replace the per-char range check with a
    word-at-a-time SWAR scan in CharSWAR (JVM/Native/JS variants). Same result,
    fewer branches per character on the hot ASCII-safety detection path.

  2. Drop parts[] allocation in joinDirectStringArray — when every element of
    std.join(sep, arr) is already a Val.Str, append directly into the result
    buffer instead of materializing an intermediate Array[String] first.

  3. Cache asciiSafe in joinedRepeatedString — when the input array is a
    single Val.Str repeated (the rare-but-hot single-element-join case), reuse
    its cached asciiSafe flag instead of re-scanning the produced string.

  4. Propagate asciiSafe through more StringModule builtins — extend
    Val.Str.asciiSafe(...) factory usage to std.char, std.asciiUpper,
    std.asciiLower, std.strReplace, std.stripChars, std.{l,r}stripChars,
    std.split, std.splitLimit, std.splitLimitR. These outputs are now
    ByteRenderer fast-path eligible without re-scanning. New regression test
    new_test_suite/string_asciisafe_propagation.jsonnet.

Result

Hyperfine (Native, end-to-end)

kube-prometheus (jrsonnet/tests/realworld/vendor/.../kube-prometheus/example.jsonnet)
  base (PR #858 head):  150.1 ± 11.9 ms
  this PR:              138.8 ±  4.0 ms     1.08× faster

JMH (JVM, steady-state, 2 forks × 5 iter × 3s)

RegressionBenchmark.main          base ms/op       this PR ms/op
large_string_join.jsonnet         0.310 ± 0.045    0.304 ± 0.018  ≈ flat
large_string_template.jsonnet     0.650 ± 0.035    0.734 ± 0.088  noisy (overlap)
repeat_format.jsonnet             0.144 ± 0.019    0.162 ± 0.047  noisy (overlap)
manifestJsonEx.jsonnet            0.056 ± 0.011    0.055 ± 0.013  flat

JVM JIT recovers most of the per-character work; the wins are concentrated on
Native, which matches PR #858. Per-bench JMH error bars overlap, so JVM
steady-state is best read as "no regression".

Test plan

  • ./mill 'sjsonnet.jvm[3.3.7]'.test — green
  • ./mill 'sjsonnet.native[3.3.7]'.test — green
  • New regression test covers all 10 newly-propagated builtins
  • ./mill __.checkFormat — green

@He-Pin He-Pin marked this pull request as draft May 16, 2026 11:54
@He-Pin He-Pin force-pushed the perf/asciisafe-roi branch 3 times, most recently from cae7a70 to 28aa687 Compare May 19, 2026 07:39
@He-Pin He-Pin changed the title perf: replace Val.Str._asciiSafe flag with AsciiSafeStr subclass perf: incremental std.join + asciiSafe coverage improvements May 19, 2026
@He-Pin He-Pin marked this pull request as ready for review May 19, 2026 08:36
He-Pin added 4 commits May 21, 2026 01:23
Motivation:
Format.scanFormat repeatedly tests literal windows of the format string
for ASCII-JSON-safety so each cached RuntimeFormat can pre-decide whether
ByteRenderer's fast path applies. The previous scalar loop in Format.scala
duplicated CharSWAR's bytewise check while bypassing the SWAR path that
the full-string variant already enjoys.

Modification:
* Add `isAsciiJsonSafe(s, from, to)` range overloads on CharSWAR for JVM
  (SWAR over 4 chars at a time), Native (same SWAR), and JS (scalar).
* Expose the overload via Platform.isAsciiJsonSafe on all three platforms.
* Replace Format.isAsciiJsonSafeRange's body with a direct delegate to
  Platform.isAsciiJsonSafe so the format-string scanner reuses the SWAR
  path on JVM/Native.

Result:
Long format-string literals get the same 4-char-per-step scan that
ByteRenderer already uses, without changing observable semantics. No
allocations introduced and no API changes outside `Platform`.
Motivation:
joinDirectStringArray allocated an Array[String] equal in length to the
input array purely to skip Val.Null entries between two passes. The
array survives until the join completes, so for arrays with thousands
of elements the temporary buffer was a hot allocation on the std.join
fast path.

Modification:
* Replace the parts[] cache with a single elemCount counter, tracking
  separator overhead as `sepLen * (elemCount - 1)` after pass 1
  completes instead of incrementally toggling on `added`.
* Pass 2 walks the original `direct` array, skipping Val.Null entries
  with isInstanceOf and casting non-null elements to Val.Str (already
  validated by pass 1) to avoid a redundant pattern dispatch.
* Empty-result branch now returns an asciiSafe empty string so
  ByteRenderer keeps the fast path even when the join collapses to "".

Result:
Eliminates one Array[String] allocation per std.join over an array of
strings; pass 1 still does the type-check and total-length accounting,
pass 2 reuses the input array. No observable behavior change.
Motivation:
joinedRepeatedString and its callers received raw `String` separators
and elements, then re-scanned them with `Platform.isAsciiJsonSafe` on
every join call. The Val.Str inputs already cached `_asciiSafe`, so the
re-scan was redundant work on the std.join hot path for repeated-string
arrays (e.g. `[x for ... in ...]` with identical x).

Modification:
* Switch joinedRepeatedString, joinRepeatedStringEval, and
  joinRepeatedDirectString to take `Val.Str` for both `sep` and the
  repeated element instead of `String`. The functions now read
  `_asciiSafe` directly from the inputs.
* When count == 1 the function returns the input Val.Str unchanged,
  preserving its asciiSafe flag with no allocation.
* Empty-result branches return an asciiSafe empty string so
  ByteRenderer keeps the fast path even when the repeat collapses.
* Update Join.evalRhs callers to pass the Val.Str directly.

Result:
Removes two `Platform.isAsciiJsonSafe` scans per repeated-element join
and avoids materializing a fresh Val.Str when count == 1. Behavior is
identical: result strings still carry the same asciiSafe bit they would
have under the old re-scan path.
Motivation:
ByteRenderer's fast path skips JSON-escape scanning when the source
Val.Str is marked asciiSafe. Several StringModule builtins were
discarding the input flag and returning a fresh Val.Str that always
re-scanned at render time, even when the transformation could not
introduce unsafe bytes.

Modification:
* `std.char(n)` marks the result asciiSafe when the codepoint is in
  printable ASCII excluding `"` and `\\`; non-ASCII / control codepoints
  fall through to the regular constructor as before.
* `std.asciiUpper`/`asciiLower` forward the input's asciiSafe flag —
  ASCII case folding cannot introduce unsafe bytes.
* `std.strReplace(src, from, to)` results are asciiSafe when both `src`
  and `to` are; the removed `from` cannot affect output safety.
* `std.stripChars`/`lstripChars`/`rstripChars` forward the input's
  flag — stripping codepoints cannot introduce unsafe bytes.
* `std.split`/`splitLimit`/`splitLimitR` thread an `asciiSafe` parameter
  through `splitLimit`, `splitLimitR`, and `splitLimitRBounded` so each
  resulting element inherits the source string's flag.
* Add `string_asciisafe_propagation.jsonnet` regression covering all
  five paths plus non-ASCII control inputs to verify that the
  ByteRenderer fast path stays correct after each transformation.

Result:
Strings produced by these stdlib calls keep ByteRenderer on the fast
path when their inputs are asciiSafe. No observable Jsonnet semantics
change; the only externally visible effect is fewer rescans during
JSON manifestation.
@He-Pin He-Pin force-pushed the perf/asciisafe-roi branch from 28aa687 to b21b719 Compare May 20, 2026 17:38
@He-Pin
Copy link
Copy Markdown
Contributor Author

He-Pin commented May 20, 2026

@stephenamar-db Rebased on the current master

@stephenamar-db stephenamar-db merged commit b252b18 into databricks:master May 20, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants