perf: cache renderDouble for small integers by He-Pin · Pull Request #763 · databricks/sjsonnet

He-Pin · 2026-04-12T12:21:36Z

Motivation:

Reduce allocation overhead in common numeric rendering paths.

Key Design Decision:

Keep this PR focused on the renderDouble optimization. The original Builtin1.apply1 / Builtin3.apply3 specialization has already landed in current master via #807, so it is intentionally no longer part of this PR diff after the rebase.

Modification:

Materializer delegates number stringification to RenderUtils.renderDouble.
RenderUtils.renderDouble reuses a small integer string cache for exact integer doubles in the range 0 to 255.

Benchmark Results:

Rebased onto latest upstream/master at 8b67cb1e.

JMH, JVM harness, lower is better. Normal/noise-only rows are intentionally omitted.

Benchmark	master ms/op	PR ms/op	Delta
`realistic2`	47.484	41.848	-11.9%

Scala Native hyperfine against latest source-built jrsonnet master 5b43fa8 (jrsonnet 0.5.0-pre98), lower is better:

Benchmark	master-native	PR-native	jrsonnet-source	Result
`realistic2`	82.2 +/- 1.5 ms	85.4 +/- 6.1 ms	143.7 +/- 4.0 ms	Native neutral/noisy vs master; PR-native is 1.68x faster than latest jrsonnet

Analysis:

The current-master refresh leaves one clear JVM signal: realistic2 improves by about 12% through the shared numeric rendering path. Native CLI is not a standout improvement over master on this PR, but latest jrsonnet is substantially slower on the same realistic2 case.

Verification:

./mill -i __.checkFormat
./mill -i 'sjsonnet.jvm[3.3.7].test'
./mill -i bench.runRegressions bench/resources/cpp_suite/large_string_join.jsonnet bench/resources/cpp_suite/large_string_template.jsonnet bench/resources/cpp_suite/realistic2.jsonnet bench/resources/go_suite/parseInt.jsonnet
./mill -i 'sjsonnet.native[3.3.7].nativeLink'
hyperfine --warmup 10 --min-runs 50

References:

PR branch: perf/cached-render-double
Base: upstream/master at 8b67cb1eeffa764f2a7298658e5473d8402a8da1
Head: b21352970b5f19e8f929a259739d88e69df31627
Source-built jrsonnet: 5b43fa88b8c43856dd5a2daa9c5c251153c5e14d

Result:

Ready. The PR now contains only the cached renderDouble work; the Builtin1/3 apply override work is already covered by #807 on master.

stephenamar-db · 2026-04-28T20:11:49Z

I'm questioning the usefulness of this tweak (for numbers) - this seems quite neutral

He-Pin · 2026-04-28T21:01:03Z

It just a bit optimization around realistic2 where many small nunbers.

Split out from #763. Motivation: Reduce allocation and dispatch overhead when one- and three-argument builtins are called through the dynamic `Val.Func.apply1` / `apply3` path. Key Design Decision: Keep the optimization local and semantics-preserving. `Builtin2` already has an exact-arity `apply2` override; this adds matching `Builtin1.apply1` and `Builtin3.apply3` overrides. Exact positional calls directly invoke the structured `evalRhs` overload and skip constructing an intermediate `Array`. Non-exact paths still fall back to the generic parent application path. Correctness: - The direct path matches the existing `Builtin1.apply` / `Builtin3.apply` exact positional behavior: force the supplied `Eval` values, then call the typed `evalRhs`. - Named arguments, missing defaults, too many arguments, and other non-exact calls still use the generic function application logic. - Static `Expr.ApplyBuiltin1` / `Expr.ApplyBuiltin3` paths are unchanged; this only helps dynamic builtin calls such as a builtin stored in a local or returned from another function. Modification: - Add `Builtin1.apply1`. - Add `Builtin3.apply3`. Validation: - `./mill --no-server 'sjsonnet.jvm[3.3.7].compile'` - `./mill --no-server 'sjsonnet.jvm[3.3.7].test'` (`141/141, SUCCESS`) - `./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat'` - `./mill --no-server '_.jvm[_].__.test'` (`1104/1104, SUCCESS`) - Dynamic builtin smoke checks: - `local f = std.length; f([1, 2, 3])` -> `3` - `local f = std.substr; f("abcdef", 1, 3)` -> `"bcd"` - `local f = std.substr; f(str="abcdef", from=1, len=3)` -> `"bcd"` Hyperfine: Toolchain: - `hyperfine 1.20.0` - `--warmup 3 --min-runs 25` for targeted dynamic builtin benchmarks - `--warmup 3 --min-runs 20` for `realistic2` - JVM assemblies built with `./mill --no-server show 'sjsonnet.jvm[3.3.7].assembly'` - Base: `upstream/master` at `c04fc804` - Branch: `2067d8b5` Targeted `Builtin1` dynamic call benchmark: ```jsonnet local identity(x) = x; local len = identity(std.length); std.foldl( function(acc, i) acc + len("abcdef"), std.range(1, 5000000), 0 ) ``` | Command | Mean [ms] | Min [ms] | Max [ms] | Relative | |:---|---:|---:|---:|---:| | `master builtin1_dynamic` | 649.8 +/- 48.3 | 557.4 | 726.8 | 1.00 | | `branch builtin1_dynamic` | 661.7 +/- 41.0 | 606.1 | 747.3 | 1.02 +/- 0.10 | Result: statistically neutral in this hyperfine run. Targeted `Builtin3` dynamic call benchmark: ```jsonnet local identity(x) = x; local substr = identity(std.substr); std.foldl( function(acc, i) acc + std.length(substr("abcdef", 1, 3)), std.range(1, 3000000), 0 ) ``` | Command | Mean [ms] | Min [ms] | Max [ms] | Relative | |:---|---:|---:|---:|---:| | `master builtin3_dynamic` | 742.5 +/- 156.1 | 594.1 | 1254.7 | 1.12 +/- 0.30 | | `branch builtin3_dynamic` | 660.4 +/- 110.9 | 534.0 | 962.6 | 1.00 | Result: branch was faster in this run, but variance is high. End-to-end `realistic2`: | Command | Mean [ms] | Min [ms] | Max [ms] | Relative | |:---|---:|---:|---:|---:| | `master realistic2` | 544.9 +/- 95.3 | 414.1 | 706.5 | 1.27 +/- 0.27 | | `branch realistic2` | 428.4 +/- 54.1 | 378.3 | 565.8 | 1.00 | Result: branch was faster in this run; due JVM-startup and system noise, treat this as a non-regression signal rather than a guaranteed 1.27x speedup.

He-Pin · 2026-05-08T05:17:46Z

Closing stale low-priority draft. The branch is clean, but its benchmark base predates the current renderer/base64/TOML work and needs fresh attribution before being worth review.

He-Pin · 2026-05-08T05:21:37Z

Reopened. This is not negative; keep as draft pending fresh benchmarking on current master. The previous benchmark base is stale after later renderer/base64/TOML work.

He-Pin · 2026-05-08T05:25:21Z

Marking ready under the updated review bar: a PR can be ready when it has a clear positive result for at least one benchmark suite, even if it is not the primary docs/jrsonnet-gap case. Keep normal/noise-only benchmark movement out of the description.

Motivation: Reduce allocation overhead in common numeric rendering paths. Modification: 1. RenderUtils.renderDouble reuses pre-cached string representations for exact integer doubles in the range 0-255. 2. Materializer.stringify delegates number stringification to RenderUtils.renderDouble, removing its duplicate integer fast path. Result: Numeric materialization uses the shared renderDouble fast path. The Builtin1.apply1 / Builtin3.apply3 specialization from the original PR is already present in current master via databricks#807, so it is no longer part of this PR diff.

He-Pin force-pushed the perf/cached-render-double branch from 12b870c to 4c8c5ab Compare April 12, 2026 17:32

He-Pin marked this pull request as ready for review April 12, 2026 17:48

He-Pin mentioned this pull request Apr 12, 2026

performance optimization #666

Open

He-Pin marked this pull request as draft April 12, 2026 18:57

He-Pin force-pushed the perf/cached-render-double branch 3 times, most recently from 3ae5a56 to 5e55449 Compare April 26, 2026 10:46

He-Pin marked this pull request as ready for review April 26, 2026 10:48

He-Pin marked this pull request as draft April 28, 2026 21:00

He-Pin mentioned this pull request Apr 30, 2026

perf: specialize Builtin1 and Builtin3 apply paths #807

Merged

He-Pin force-pushed the perf/cached-render-double branch from 5e55449 to eb50f7f Compare April 30, 2026 04:34

He-Pin changed the title ~~perf: cached renderDouble for small integers + Builtin1/3 apply overrides~~ perf: cache renderDouble for small integers Apr 30, 2026

He-Pin closed this May 8, 2026

He-Pin reopened this May 8, 2026

He-Pin marked this pull request as ready for review May 8, 2026 05:25

He-Pin force-pushed the perf/cached-render-double branch from eb50f7f to b213529 Compare May 8, 2026 05:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: cache renderDouble for small integers#763

perf: cache renderDouble for small integers#763
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/cached-render-double

He-Pin commented Apr 12, 2026 •

edited

Loading

Uh oh!

stephenamar-db commented Apr 28, 2026

Uh oh!

He-Pin commented Apr 28, 2026

Uh oh!

He-Pin commented May 8, 2026

Uh oh!

He-Pin commented May 8, 2026

Uh oh!

He-Pin commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

He-Pin commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stephenamar-db commented Apr 28, 2026

Uh oh!

He-Pin commented Apr 28, 2026

Uh oh!

He-Pin commented May 8, 2026

Uh oh!

He-Pin commented May 8, 2026

Uh oh!

He-Pin commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

He-Pin commented Apr 12, 2026 •

edited

Loading