perf: speed up manifest JSON rendering by He-Pin · Pull Request #874 · databricks/sjsonnet

He-Pin · 2026-05-28T06:41:23Z

Motivation

std.manifestJson, std.manifestJsonMinified, and std.manifestJsonEx still routed through StringWriter, paying StringBuffer synchronization per write and per flush on the hot manifestation path. Source-built jrsonnet comparisons showed sjsonnet trailing on object-heavy manifest workloads.

Modification

Add StringBuilderWriter: an unsynchronized Writer over a StringBuilder.
Add package-private FastMaterializeJsonRenderer backed by StringBuilderWriter; route the three std.manifestJson* builtins through it. Public MaterializeJsonRenderer ABI/shape unchanged.
Fix codepoint comparison for raw surrogate prefixes: equal surrogate UTF-16 code units must be decoded before deciding ordering. UnicodeHandlingTests extended for the prefix-ordering case.

Result

Scala Native hyperfine on kube-prometheus, -N -w 4 -m 20, jrsonnet HEAD 2d7eed05:

Workload (native)	Before	After	Δ
kube-prometheus, sjsonnet	158.4 ± 16.8 ms	143.7 ± 3.2 ms	−9.3%
kube-prometheus, jrsonnet	101.2 ± 4.4 ms	97.4 ± 8.6 ms	reference
`manifestJsonEx`, sjsonnet	—	5.09 ± 1.01 ms	new
`manifestJsonEx`, jrsonnet	—	4.08 ± 1.40 ms	reference

JMH regression post-PR: manifestJsonEx 0.055 ms/op, realistic2 43.6 ms/op, gen_big_object 0.842 ms/op.

Related: #666.

Test plan

./mill __.reformat
./mill -j 1 __.test — 517/517 pass

Follow-up stacked optimizations

Each commit below was verified for byte-identical output and measured before landing. Perf bar: JVM-positive and Native-non-regressing (changes that measured neutral/negative on Native — a YAML-renderer swap, a binary-operator Position deferral, and a first char-deboxing attempt — were measured and dropped).

Commit	Change	JVM	Native
`skip escape scan for AsciiSafeStr`	char renderer emits `Val.AsciiSafeStr` without the SWAR escape scan	+10% render-only	neutral
`unsynchronized StringBuilderWriter in TomlRenderer`	drop `StringBuffer` sync on the `manifestTomlEx` path	+6–14%	1.11× (≈+10%)
`capture parse Position without boxing`	`Parser.Pos` writes the `Position` straight into fastparse `successValue` instead of `Index.map` (no Int box/unbox/closure per node)	+5.4% parse	+4.5% parse
`defer Position alloc in exprSuffix2`	allocate the suffix `Position` only on a matched suffix, not on every rep-terminating attempt	+1.9% parse	neutral
`flush FastMaterializeJsonRenderer only at root depth`	accumulate in-memory, emit once at `depth == 0`; 4 KB initial buffer	—	—

Methodology: JVM via JMH (ParserBenchmark, plus isolated render benches added under bench/); Native via the binary's --debug-stats phase timing and interleaved hyperfine on kube-prometheus (cooled, min/p25). Render micro-wins (AsciiSafeStr) do not transfer to Native end-to-end because parse+eval dominate there; the parse-side and TOML changes do.

Motivation: std.manifestJson* still contributed to the local Scala Native gap versus source-built jrsonnet, especially in real-world object-heavy rendering. Modification: Add an internal StringBuilder-backed FastMaterializeJsonRenderer for std.manifestJson, std.manifestJsonMinified, and std.manifestJsonEx while preserving the public MaterializeJsonRenderer StringWriter API. Reuse an in-place codepoint key sorter backed by java.util.Arrays.sort, and fix raw-surrogate prefix ordering in compareStringsByCodepoint. Result: Full validation passed: ./mill --no-server --ticker false --color false __.reformat and ./mill --no-server --ticker false --color false -j 1 __.test reported 451/451 tests passing. JMH regressions: manifestJsonEx 0.055 ms/op, realistic2 43.596 ms/op, gen_big_object 0.842 ms/op. Direct hyperfine against source-built jrsonnet: manifestJsonEx sjsonnet-native 5.090 ms vs jrsonnet 4.075 ms; kube-prometheus sjsonnet-native 143.738 ms vs jrsonnet 97.385 ms.

Motivation: The JVM/char render hot path (BaseCharRenderer.visitNonNullString) ran a CharSWAR.hasEscapeChar scan on every string, even for Val.AsciiSafeStr which is statically known to need no JSON escaping (chars 0x20-0x7e, no quote/backslash). The Native ByteRenderer already had this bypass; the char path did not. Modification: - Add BaseCharRenderer.visitAsciiSafeString: quote + bulk getChars + quote, correct even under escapeUnicode since all chars are <= 0x7e. - Route Val.AsciiSafeStr through it via a Materializer.visitStr helper at the three value-string sites; ujson.Value AST path falls back to visitString. - Add AsciiSafeRenderBenchmark to isolate the render path for A/B. Result: JMH render-only, 335KB string-heavy output: 1.606 -> 1.441 ms/op (-10.3%, non-overlapping error bands). 450/450 tests pass.

Motivation: std.manifestTomlEx routed through java.io.StringWriter, whose backing StringBuffer pays a monitor enter/exit on every write/flush on the hot TOML manifestation path. The JSON renderer already switched to the unsynchronized StringBuilderWriter in databricks#874 (-9.3% on kube-prometheus native); TOML did not. Modification: - Switch TomlRenderer and the manifestTomlEx render path in ManifestModule from java.io.StringWriter to the package-private StringBuilderWriter. Output is byte-identical. std.deepJoin keeps StringWriter (separate concern). - Add TomlRenderBenchmark to A/B the render path. Result: Native hyperfine, TOML-heavy workload (1.79MB output): after ran 1.11 ± 0.07x faster than before (~10%), output byte-identical. JMH (whole-pipeline) showed AFTER < BEFORE in two independent rounds. 450/450 tests pass.

Motivation: Parser.Pos is invoked for nearly every AST node. It was `Index.map(off => new Position(...))`: fastparse's `Index` stores the offset as an Int in its `successValue: Any` field (boxing it), and the `.map` then unboxes it and allocates a closure — per node. boxToInteger via SharedPackageDefs.Index was a top self-frame in the parse flamegraph on kube-prometheus. Modification: - Rewrite Pos to write the Position object straight into successValue via ctx.freshSuccess(new Position(fileScope, ctx.index)), skipping the Int box/unbox and the map closure. Parse output (positions/errors) is unchanged. Result: JMH ParserBenchmark (parse-only, all test-suite files): 1.669 -> 1.579 ms/op (+5.4%, non-overlapping bands). Native parse_time on kube-prometheus: ~105.6 -> ~100.9 ms (+4.5%, consistent). Output byte-identical. 450/450 tests pass.

Motivation: exprSuffix2 was `Pos.flatMapX { i => CharIn(".[({")... }`, which allocated a Position on EVERY attempt — including the failing attempt that terminates `exprSuffix2.rep` after each expression. Most subexpressions have no suffix, so that trailing failed attempt (one per expression) allocated a Position that was immediately discarded. Modification: - Match the suffix char first; allocate `new Position(fileScope, ctx.index - 1)` only inside the matching branch. No suffix -> CharIn fails fast, no Position. Also drops the `.map(_(0))` Char step. Parse output (positions/errors) is unchanged. Result: JMH ParserBenchmark (-f0, same-session): 1.560 -> 1.530 ms/op (+1.9%). Native parse_time on kube-prometheus: non-regressing, min/p25 ~2% lower (noise-limited on a loaded machine). Output byte-identical. 517/517 tests pass.

Motivation: std.manifestJson* render fully in memory via FastMaterializeJsonRenderer. The inherited flushCharBuilder spilled the CharBuilder to the output writer at every sub-tree boundary, adding buffer-to-buffer copies that are pure overhead when the whole document is built in memory and emitted once. Modification: - Override flushCharBuilder to write out only when depth == 0 (root finished); accumulate everything in elemBuilder until then. - Size StringBuilderWriter's initial buffer at 4096 (was 16) to cut early reallocations, and mark it private[sjsonnet]. Result: Fewer intermediate copies on the manifestJson* path; output byte-identical.

…Chars ascii mask Adds regression coverage: - object_remove_key_directional: objectRemoveKey interaction with super / addSuper (`a+:`) merge and inline addSuper asserts. - strip_chars_ascii_mask_directional: stripChars over the ASCII range.

He-Pin · 2026-05-30T11:48:48Z

Superseded — split into focused, independently-measured PRs off current master (each output byte-identical, no benchmark code):

perf: use unsynchronized StringBuilderWriter in TomlRenderer #875 — TomlRenderer → unsynchronized StringBuilderWriter (Native ~1.14×)
perf: capture parse Position without boxing the offset Int #876 — Parser.Pos without boxing the offset Int (Native parse +6–8%, JVM +5.4%)
perf: defer Position alloc in exprSuffix2 to the matching branch #877 — defer Position alloc in exprSuffix2 (JVM +1.9%, Native neutral)

The manifest-JSON rendering work this PR was based on is already in master (da92dd1). Closing in favor of the smaller PRs above.

He-Pin marked this pull request as ready for review May 28, 2026 06:53

He-Pin marked this pull request as draft May 28, 2026 06:57

He-Pin marked this pull request as ready for review May 28, 2026 07:00

He-Pin marked this pull request as draft May 28, 2026 07:12

He-Pin force-pushed the perf/manifest-json-rendering-fastpath branch from da92dd1 to c3581e8 Compare May 28, 2026 07:17

He-Pin marked this pull request as ready for review May 28, 2026 07:17

He-Pin marked this pull request as draft May 29, 2026 20:41

He-Pin marked this pull request as ready for review May 29, 2026 21:25

He-Pin marked this pull request as draft May 29, 2026 22:50

He-Pin added 4 commits May 30, 2026 15:42

He-Pin closed this May 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: speed up manifest JSON rendering#874

perf: speed up manifest JSON rendering#874
He-Pin wants to merge 7 commits into
databricks:masterfrom
He-Pin:perf/manifest-json-rendering-fastpath

He-Pin commented May 28, 2026 •

edited

Loading

Uh oh!

He-Pin commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

He-Pin commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modification

Result

Test plan

Follow-up stacked optimizations

Uh oh!

He-Pin commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

He-Pin commented May 28, 2026 •

edited

Loading