Skip to content

perf: speed up manifest JSON rendering#874

Closed
He-Pin wants to merge 7 commits into
databricks:masterfrom
He-Pin:perf/manifest-json-rendering-fastpath
Closed

perf: speed up manifest JSON rendering#874
He-Pin wants to merge 7 commits into
databricks:masterfrom
He-Pin:perf/manifest-json-rendering-fastpath

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented May 28, 2026

Motivation

std.manifestJson, std.manifestJsonMinified, and std.manifestJsonEx still routed through StringWriter, paying StringBuffer synchronization per write and per flush on the hot manifestation path. Source-built jrsonnet comparisons showed sjsonnet trailing on object-heavy manifest workloads.

Modification

  • Add StringBuilderWriter: an unsynchronized Writer over a StringBuilder.
  • Add package-private FastMaterializeJsonRenderer backed by StringBuilderWriter; route the three std.manifestJson* builtins through it. Public MaterializeJsonRenderer ABI/shape unchanged.
  • Fix codepoint comparison for raw surrogate prefixes: equal surrogate UTF-16 code units must be decoded before deciding ordering. UnicodeHandlingTests extended for the prefix-ordering case.

Result

Scala Native hyperfine on kube-prometheus, -N -w 4 -m 20, jrsonnet HEAD 2d7eed05:

Workload (native) Before After Δ
kube-prometheus, sjsonnet 158.4 ± 16.8 ms 143.7 ± 3.2 ms −9.3%
kube-prometheus, jrsonnet 101.2 ± 4.4 ms 97.4 ± 8.6 ms reference
manifestJsonEx, sjsonnet 5.09 ± 1.01 ms new
manifestJsonEx, jrsonnet 4.08 ± 1.40 ms reference

JMH regression post-PR: manifestJsonEx 0.055 ms/op, realistic2 43.6 ms/op, gen_big_object 0.842 ms/op.

Related: #666.

Test plan

  • ./mill __.reformat
  • ./mill -j 1 __.test — 517/517 pass

Follow-up stacked optimizations

Each commit below was verified for byte-identical output and measured before landing. Perf bar: JVM-positive and Native-non-regressing (changes that measured neutral/negative on Native — a YAML-renderer swap, a binary-operator Position deferral, and a first char-deboxing attempt — were measured and dropped).

Commit Change JVM Native
skip escape scan for AsciiSafeStr char renderer emits Val.AsciiSafeStr without the SWAR escape scan +10% render-only neutral
unsynchronized StringBuilderWriter in TomlRenderer drop StringBuffer sync on the manifestTomlEx path +6–14% 1.11× (≈+10%)
capture parse Position without boxing Parser.Pos writes the Position straight into fastparse successValue instead of Index.map (no Int box/unbox/closure per node) +5.4% parse +4.5% parse
defer Position alloc in exprSuffix2 allocate the suffix Position only on a matched suffix, not on every rep-terminating attempt +1.9% parse neutral
flush FastMaterializeJsonRenderer only at root depth accumulate in-memory, emit once at depth == 0; 4 KB initial buffer

Methodology: JVM via JMH (ParserBenchmark, plus isolated render benches added under bench/); Native via the binary's --debug-stats phase timing and interleaved hyperfine on kube-prometheus (cooled, min/p25). Render micro-wins (AsciiSafeStr) do not transfer to Native end-to-end because parse+eval dominate there; the parse-side and TOML changes do.

@He-Pin He-Pin marked this pull request as ready for review May 28, 2026 06:53
@He-Pin He-Pin marked this pull request as draft May 28, 2026 06:57
@He-Pin He-Pin marked this pull request as ready for review May 28, 2026 07:00
@He-Pin He-Pin marked this pull request as draft May 28, 2026 07:12
Motivation:
std.manifestJson* still contributed to the local Scala Native gap versus source-built jrsonnet, especially in real-world object-heavy rendering.

Modification:
Add an internal StringBuilder-backed FastMaterializeJsonRenderer for std.manifestJson, std.manifestJsonMinified, and std.manifestJsonEx while preserving the public MaterializeJsonRenderer StringWriter API. Reuse an in-place codepoint key sorter backed by java.util.Arrays.sort, and fix raw-surrogate prefix ordering in compareStringsByCodepoint.

Result:
Full validation passed: ./mill --no-server --ticker false --color false __.reformat and ./mill --no-server --ticker false --color false -j 1 __.test reported 451/451 tests passing. JMH regressions: manifestJsonEx 0.055 ms/op, realistic2 43.596 ms/op, gen_big_object 0.842 ms/op. Direct hyperfine against source-built jrsonnet: manifestJsonEx sjsonnet-native 5.090 ms vs jrsonnet 4.075 ms; kube-prometheus sjsonnet-native 143.738 ms vs jrsonnet 97.385 ms.
@He-Pin He-Pin force-pushed the perf/manifest-json-rendering-fastpath branch from da92dd1 to c3581e8 Compare May 28, 2026 07:17
@He-Pin He-Pin marked this pull request as ready for review May 28, 2026 07:17
@He-Pin He-Pin marked this pull request as draft May 29, 2026 20:41
Motivation:
The JVM/char render hot path (BaseCharRenderer.visitNonNullString) ran a
CharSWAR.hasEscapeChar scan on every string, even for Val.AsciiSafeStr which
is statically known to need no JSON escaping (chars 0x20-0x7e, no quote/backslash).
The Native ByteRenderer already had this bypass; the char path did not.

Modification:
- Add BaseCharRenderer.visitAsciiSafeString: quote + bulk getChars + quote,
  correct even under escapeUnicode since all chars are <= 0x7e.
- Route Val.AsciiSafeStr through it via a Materializer.visitStr helper at the
  three value-string sites; ujson.Value AST path falls back to visitString.
- Add AsciiSafeRenderBenchmark to isolate the render path for A/B.

Result:
JMH render-only, 335KB string-heavy output: 1.606 -> 1.441 ms/op (-10.3%,
non-overlapping error bands). 450/450 tests pass.
@He-Pin He-Pin marked this pull request as ready for review May 29, 2026 21:25
Motivation:
std.manifestTomlEx routed through java.io.StringWriter, whose backing
StringBuffer pays a monitor enter/exit on every write/flush on the hot TOML
manifestation path. The JSON renderer already switched to the unsynchronized
StringBuilderWriter in databricks#874 (-9.3% on kube-prometheus native); TOML did not.

Modification:
- Switch TomlRenderer and the manifestTomlEx render path in ManifestModule from
  java.io.StringWriter to the package-private StringBuilderWriter. Output is
  byte-identical. std.deepJoin keeps StringWriter (separate concern).
- Add TomlRenderBenchmark to A/B the render path.

Result:
Native hyperfine, TOML-heavy workload (1.79MB output): after ran 1.11 ± 0.07x
faster than before (~10%), output byte-identical. JMH (whole-pipeline) showed
AFTER < BEFORE in two independent rounds. 450/450 tests pass.
@He-Pin He-Pin marked this pull request as draft May 29, 2026 22:50
He-Pin added 4 commits May 30, 2026 15:42
Motivation:
Parser.Pos is invoked for nearly every AST node. It was `Index.map(off => new
Position(...))`: fastparse's `Index` stores the offset as an Int in its
`successValue: Any` field (boxing it), and the `.map` then unboxes it and
allocates a closure — per node. boxToInteger via SharedPackageDefs.Index was a
top self-frame in the parse flamegraph on kube-prometheus.

Modification:
- Rewrite Pos to write the Position object straight into successValue via
  ctx.freshSuccess(new Position(fileScope, ctx.index)), skipping the Int
  box/unbox and the map closure. Parse output (positions/errors) is unchanged.

Result:
JMH ParserBenchmark (parse-only, all test-suite files): 1.669 -> 1.579 ms/op
(+5.4%, non-overlapping bands). Native parse_time on kube-prometheus:
~105.6 -> ~100.9 ms (+4.5%, consistent). Output byte-identical. 450/450 tests pass.
Motivation:
exprSuffix2 was `Pos.flatMapX { i => CharIn(".[({")... }`, which allocated a
Position on EVERY attempt — including the failing attempt that terminates
`exprSuffix2.rep` after each expression. Most subexpressions have no suffix, so
that trailing failed attempt (one per expression) allocated a Position that was
immediately discarded.

Modification:
- Match the suffix char first; allocate `new Position(fileScope, ctx.index - 1)`
  only inside the matching branch. No suffix -> CharIn fails fast, no Position.
  Also drops the `.map(_(0))` Char step. Parse output (positions/errors) is
  unchanged.

Result:
JMH ParserBenchmark (-f0, same-session): 1.560 -> 1.530 ms/op (+1.9%). Native
parse_time on kube-prometheus: non-regressing, min/p25 ~2% lower (noise-limited
on a loaded machine). Output byte-identical. 517/517 tests pass.
Motivation:
std.manifestJson* render fully in memory via FastMaterializeJsonRenderer. The
inherited flushCharBuilder spilled the CharBuilder to the output writer at every
sub-tree boundary, adding buffer-to-buffer copies that are pure overhead when the
whole document is built in memory and emitted once.

Modification:
- Override flushCharBuilder to write out only when depth == 0 (root finished);
  accumulate everything in elemBuilder until then.
- Size StringBuilderWriter's initial buffer at 4096 (was 16) to cut early
  reallocations, and mark it private[sjsonnet].

Result:
Fewer intermediate copies on the manifestJson* path; output byte-identical.
…Chars ascii mask

Adds regression coverage:
- object_remove_key_directional: objectRemoveKey interaction with super /
  addSuper (`a+:`) merge and inline addSuper asserts.
- strip_chars_ascii_mask_directional: stripChars over the ASCII range.
@He-Pin
Copy link
Copy Markdown
Contributor Author

He-Pin commented May 30, 2026

Superseded — split into focused, independently-measured PRs off current master (each output byte-identical, no benchmark code):

The manifest-JSON rendering work this PR was based on is already in master (da92dd1). Closing in favor of the smaller PRs above.

@He-Pin He-Pin closed this May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant