perf: cache renderDouble for small integers#763
perf: cache renderDouble for small integers#763He-Pin wants to merge 1 commit intodatabricks:masterfrom
Conversation
12b870c to
4c8c5ab
Compare
3ae5a56 to
5e55449
Compare
|
I'm questioning the usefulness of this tweak (for numbers) - this seems quite neutral |
|
It just a bit optimization around realistic2 where many small nunbers. |
Split out from #763. Motivation: Reduce allocation and dispatch overhead when one- and three-argument builtins are called through the dynamic `Val.Func.apply1` / `apply3` path. Key Design Decision: Keep the optimization local and semantics-preserving. `Builtin2` already has an exact-arity `apply2` override; this adds matching `Builtin1.apply1` and `Builtin3.apply3` overrides. Exact positional calls directly invoke the structured `evalRhs` overload and skip constructing an intermediate `Array`. Non-exact paths still fall back to the generic parent application path. Correctness: - The direct path matches the existing `Builtin1.apply` / `Builtin3.apply` exact positional behavior: force the supplied `Eval` values, then call the typed `evalRhs`. - Named arguments, missing defaults, too many arguments, and other non-exact calls still use the generic function application logic. - Static `Expr.ApplyBuiltin1` / `Expr.ApplyBuiltin3` paths are unchanged; this only helps dynamic builtin calls such as a builtin stored in a local or returned from another function. Modification: - Add `Builtin1.apply1`. - Add `Builtin3.apply3`. Validation: - `./mill --no-server 'sjsonnet.jvm[3.3.7].compile'` - `./mill --no-server 'sjsonnet.jvm[3.3.7].test'` (`141/141, SUCCESS`) - `./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat'` - `./mill --no-server '_.jvm[_].__.test'` (`1104/1104, SUCCESS`) - Dynamic builtin smoke checks: - `local f = std.length; f([1, 2, 3])` -> `3` - `local f = std.substr; f("abcdef", 1, 3)` -> `"bcd"` - `local f = std.substr; f(str="abcdef", from=1, len=3)` -> `"bcd"` Hyperfine: Toolchain: - `hyperfine 1.20.0` - `--warmup 3 --min-runs 25` for targeted dynamic builtin benchmarks - `--warmup 3 --min-runs 20` for `realistic2` - JVM assemblies built with `./mill --no-server show 'sjsonnet.jvm[3.3.7].assembly'` - Base: `upstream/master` at `c04fc804` - Branch: `2067d8b5` Targeted `Builtin1` dynamic call benchmark: ```jsonnet local identity(x) = x; local len = identity(std.length); std.foldl( function(acc, i) acc + len("abcdef"), std.range(1, 5000000), 0 ) ``` | Command | Mean [ms] | Min [ms] | Max [ms] | Relative | |:---|---:|---:|---:|---:| | `master builtin1_dynamic` | 649.8 +/- 48.3 | 557.4 | 726.8 | 1.00 | | `branch builtin1_dynamic` | 661.7 +/- 41.0 | 606.1 | 747.3 | 1.02 +/- 0.10 | Result: statistically neutral in this hyperfine run. Targeted `Builtin3` dynamic call benchmark: ```jsonnet local identity(x) = x; local substr = identity(std.substr); std.foldl( function(acc, i) acc + std.length(substr("abcdef", 1, 3)), std.range(1, 3000000), 0 ) ``` | Command | Mean [ms] | Min [ms] | Max [ms] | Relative | |:---|---:|---:|---:|---:| | `master builtin3_dynamic` | 742.5 +/- 156.1 | 594.1 | 1254.7 | 1.12 +/- 0.30 | | `branch builtin3_dynamic` | 660.4 +/- 110.9 | 534.0 | 962.6 | 1.00 | Result: branch was faster in this run, but variance is high. End-to-end `realistic2`: | Command | Mean [ms] | Min [ms] | Max [ms] | Relative | |:---|---:|---:|---:|---:| | `master realistic2` | 544.9 +/- 95.3 | 414.1 | 706.5 | 1.27 +/- 0.27 | | `branch realistic2` | 428.4 +/- 54.1 | 378.3 | 565.8 | 1.00 | Result: branch was faster in this run; due JVM-startup and system noise, treat this as a non-regression signal rather than a guaranteed 1.27x speedup.
5e55449 to
eb50f7f
Compare
|
Closing stale low-priority draft. The branch is clean, but its benchmark base predates the current renderer/base64/TOML work and needs fresh attribution before being worth review. |
|
Reopened. This is not negative; keep as draft pending fresh benchmarking on current master. The previous benchmark base is stale after later renderer/base64/TOML work. |
|
Marking ready under the updated review bar: a PR can be ready when it has a clear positive result for at least one benchmark suite, even if it is not the primary docs/jrsonnet-gap case. Keep normal/noise-only benchmark movement out of the description. |
Motivation: Reduce allocation overhead in common numeric rendering paths. Modification: 1. RenderUtils.renderDouble reuses pre-cached string representations for exact integer doubles in the range 0-255. 2. Materializer.stringify delegates number stringification to RenderUtils.renderDouble, removing its duplicate integer fast path. Result: Numeric materialization uses the shared renderDouble fast path. The Builtin1.apply1 / Builtin3.apply3 specialization from the original PR is already present in current master via databricks#807, so it is no longer part of this PR diff.
eb50f7f to
b213529
Compare
Motivation:
Reduce allocation overhead in common numeric rendering paths.
Key Design Decision:
Keep this PR focused on the
renderDoubleoptimization. The originalBuiltin1.apply1/Builtin3.apply3specialization has already landed in currentmastervia #807, so it is intentionally no longer part of this PR diff after the rebase.Modification:
Materializerdelegates number stringification toRenderUtils.renderDouble.RenderUtils.renderDoublereuses a small integer string cache for exact integer doubles in the range0to255.Benchmark Results:
Rebased onto latest
upstream/masterat8b67cb1e.JMH, JVM harness, lower is better. Normal/noise-only rows are intentionally omitted.
realistic2Scala Native hyperfine against latest source-built jrsonnet master
5b43fa8(jrsonnet 0.5.0-pre98), lower is better:realistic2Analysis:
The current-master refresh leaves one clear JVM signal:
realistic2improves by about 12% through the shared numeric rendering path. Native CLI is not a standout improvement over master on this PR, but latest jrsonnet is substantially slower on the samerealistic2case.Verification:
./mill -i __.checkFormat./mill -i 'sjsonnet.jvm[3.3.7].test'./mill -i bench.runRegressions bench/resources/cpp_suite/large_string_join.jsonnet bench/resources/cpp_suite/large_string_template.jsonnet bench/resources/cpp_suite/realistic2.jsonnet bench/resources/go_suite/parseInt.jsonnet./mill -i 'sjsonnet.native[3.3.7].nativeLink'hyperfine --warmup 10 --min-runs 50References:
perf/cached-render-doubleupstream/masterat8b67cb1eeffa764f2a7298658e5473d8402a8da1b21352970b5f19e8f929a259739d88e69df316275b43fa88b8c43856dd5a2daa9c5c251153c5e14dResult:
Ready. The PR now contains only the cached
renderDoublework; the Builtin1/3 apply override work is already covered by #807 onmaster.