Presize array copy consumers by He-Pin · Pull Request #823 · databricks/sjsonnet

He-Pin · 2026-05-05T08:56:21Z

Motivation

#822 gives consumers a cheap Eval copy API, but std.flattenArrays and array-separator std.join can still pay ArrayBuilder growth/copy costs when the outer array has a modest number of large child arrays.

This PR adds a guarded two-pass pre-size path for those consumers. The goal is to remove avoidable intermediate allocation in few-large-array workloads without regressing many-small-array workloads.

Constraints:

do not force element values
avoid many-small-array regressions
guard total length before allocation
keep hot paths as straight indexed loops
keep this PR narrowly stacked on Add array eval copy API #822

Modification

Stacked on #822.

Use Arr.copyEvalTo to presize high-volume array-copy consumers:

std.flattenArrays
array-separator std.join

The pre-sized path uses two linear scans only when the outer part count is modest (<= 1024). Large outer arrays fall back to the one-pass ArrayBuilder + copyEvalTo path from #822.

Result

Verification passed:

./mill --no-server 'sjsonnet.jvm[3.3.7].compile'
./mill --no-server 'sjsonnet.jvm[2.13.18].compile'
./mill --no-server 'sjsonnet.jvm[2.12.21].compile'
./mill --no-server 'sjsonnet.jvm[3.3.7].test.testOnly' sjsonnet.Std0150FunctionsTests sjsonnet.ValArrayViewTests
./mill --no-server 'sjsonnet.jvm[3.3.7].test'
./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat'
./mill --no-server 'sjsonnet.native[3.3.7].nativeLink'
git diff --check

JMH, JVM harness, compared with #822 copy-api baseline:

Benchmark	Before	After
`array_copy_views`	13.002 ms/op	8.454 ms/op
`realistic2`	see Native data	see Native data

Scala Native hyperfine, compared with #822 copy-api baseline, using Scala Native binaries, not JVM jars:

Benchmark	Before	After
`array_copy_views`	11.9 ms +/- 1.2 ms	10.5 ms +/- 1.0 ms
many-small fallback	7.0 ms +/- 0.7 ms	6.6 ms +/- 0.5 ms
`realistic2`	82.6 ms +/- 0.8 ms	82.5 ms +/- 0.7 ms

External performance diff, against jrsonnet built from source at 80cd36a with cargo build --release -p jrsonnet (jrsonnet 0.5.0-pre98):

Benchmark	sjsonnet Scala Native (#823)	source-built jrsonnet	Result
`array_copy_views`	9.3 ms +/- 0.2 ms	14.3 ms +/- 0.4 ms	sjsonnet 1.53 +/- 0.06x faster
`realistic2`	79.9 ms +/- 2.2 ms	92.9 ms +/- 1.9 ms	sjsonnet 1.16 +/- 0.04x faster

JIT / GC review:

The second pass copies Eval references into one preallocated Array[Eval]; it does not force element values.
totalLen is accumulated as Long and checked before allocating the final Array[Eval].
PresizedCopyMaxParts = 1024 avoids turning many-small arrays into an always-two-pass workload.
The fallback path preserves Add array eval copy API #822 behavior for large outer arrays.
The hot path is simple counted while-loops plus copyEvalTo, so it stays friendly to JIT inlining and Scala Native codegen.

Rollback boundary:

This PR only changes fully-consumed array-copy consumers.
It does not change string join, renderer, sort, callback invocation, or global array view semantics.
If a workload shows a many-small regression, the threshold can be lowered or the affected consumer can use the Add array eval copy API #822 one-pass path.

References

Builds on Add array eval copy API #822 array eval copy API.

Motivation: Avoid copying large array slices and remove/removeAt intermediates after the lazy-array work. This follows jrsonnet's indexed slice-view idea while keeping JVM retention under control for small sub-slices. Modifications: - add Val.Arr.sliced and SliceArr for large or compact-source slices - route array slicing and std.remove/removeAt through slice/concat views - let large concat decisions use total length, with overflow protection - add correctness coverage and a slice/remove benchmark resource Results: - ./mill --no-server 'sjsonnet.jvm[3.3.7].compile' - ./mill --no-server 'sjsonnet.jvm[2.13.18].compile' - ./mill --no-server 'sjsonnet.jvm[2.12.21].compile' - ./mill --no-server 'sjsonnet.jvm[3.3.7].test.testOnly' sjsonnet.ValArrayViewTests sjsonnet.Std0150FunctionsTests - ./mill --no-server 'sjsonnet.jvm[3.3.7].test' - ./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat' - ./mill --no-server bench.checkFormat - JMH runRegressions: lazy_array_slice_remove 5.890 -> 1.089 ms/op - hyperfine macro slice/remove: 498.6 ms -> 335.5 ms

Motivation: Several stdlib consumers fully copy array elements after the lazy-array work. Centralizing that path avoids repeated directBackingArray/range/view branches and lets concat, repeat, slice, range, and byte arrays expose cheap bulk Eval copies without forcing Val values. Modifications: - add Arr.copyEvalTo overloads for ArrayBuilder and preallocated Array[Eval] - teach concat materialization/eager concat to copy through the new API - add specialized copy implementations for repeat, slice, reversed lazy views, range, and byte arrays - route std.flattenArrays, array flatMap, and array-separator std.join through the API - add correctness coverage and an array_copy_views regression benchmark Results: - ./mill --no-server 'sjsonnet.jvm[3.3.7].compile' - ./mill --no-server 'sjsonnet.jvm[2.13.18].compile' - ./mill --no-server 'sjsonnet.jvm[2.12.21].compile' - ./mill --no-server 'sjsonnet.jvm[3.3.7].test.testOnly' sjsonnet.ValArrayViewTests sjsonnet.Std0150FunctionsTests - ./mill --no-server 'sjsonnet.jvm[3.3.7].test' - ./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat' - ./mill --no-server bench.checkFormat - ./mill --no-server 'sjsonnet.native[3.3.7].nativeLink' - JMH runRegressions vs slice baseline: array_copy_views 16.871 -> 13.937 ms/op - Scala Native hyperfine vs slice baseline: array_copy_views 26.1 ms -> 10.9 ms, 2.39x faster

Motivation: After adding Arr.copyEvalTo, high-volume consumers can avoid ArrayBuilder growth by counting output length first and copying into a single Array[Eval]. This targets small outer arrays that contain large view-backed subarrays, while preserving the one-pass builder path for many-small-array workloads. Modifications: - presize std.flattenArrays when the outer part count is modest - presize array-separator std.join when the outer part count is modest - keep the one-pass ArrayBuilder + copyEvalTo fallback for large part counts Results: - ./mill --no-server 'sjsonnet.jvm[3.3.7].compile' - ./mill --no-server 'sjsonnet.jvm[2.13.18].compile' - ./mill --no-server 'sjsonnet.jvm[2.12.21].compile' - ./mill --no-server 'sjsonnet.jvm[3.3.7].test.testOnly' sjsonnet.Std0150FunctionsTests sjsonnet.ValArrayViewTests - ./mill --no-server 'sjsonnet.jvm[3.3.7].test' - ./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat' - ./mill --no-server 'sjsonnet.native[3.3.7].nativeLink' - JMH runRegressions vs copy-api baseline: array_copy_views 13.002 -> 8.454 ms/op - Scala Native hyperfine vs copy-api baseline: array_copy_views 11.9 ms -> 10.5 ms - Scala Native hyperfine many-small fallback: 7.0 ms -> 6.6 ms - Scala Native hyperfine realistic2: 82.6 ms -> 82.5 ms

## Motivation After #814, sjsonnet has lazy array views for large stdlib arrays, but array slicing and `std.remove` / `std.removeAt` still had paths that eagerly allocated new `Array[Eval]` backing arrays. This PR adds a focused slice view so large slices avoid copying unless fully materialized, while keeping JVM/Scala Native behavior conservative. Constraints: - keep Jsonnet indexed laziness: `length`, `eval(i)`, and `value(i)` stay O(1) - do not force elements while slicing - keep small slices eager to avoid retaining large sources - prevent deep concat trees - keep hot paths JIT/GC friendly ## Modification Add a lazy `SliceArr` view and route `Arr.sliced(...)` through it when the slice is large enough or the source is already compact/view-backed. Changed behavior: - array slicing can return `SliceArr` instead of eagerly copying `Array[Eval]` - `std.remove` and `std.removeAt` reuse slice + concat views - compact sources (`RangeArr`, `ByteArr`, lazy indexed arrays, repeat, slice) can slice as O(1) views - flat/reversed/concat arrays only use a view when the slice is large enough to justify source retention - concat still caps tree depth by flattening when either side is already a concat view ## Result Verification passed: - `./mill --no-server 'sjsonnet.jvm[3.3.7].compile'` - `./mill --no-server 'sjsonnet.jvm[2.13.18].compile'` - `./mill --no-server 'sjsonnet.jvm[2.12.21].compile'` - `./mill --no-server 'sjsonnet.jvm[3.3.7].test.testOnly' sjsonnet.ValArrayViewTests sjsonnet.Std0150FunctionsTests` - `./mill --no-server 'sjsonnet.jvm[3.3.7].test'` - `./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat'` - `./mill --no-server bench.checkFormat` - `./mill --no-server 'sjsonnet.native[3.3.7].nativeLink'` JMH, full JVM regression sweep, compared with `upstream/master` (lower is better, notable result only): - Baseline: `upstream/master` at `b4c667d5` - PR head: `3bde215b` - Command: `./mill --no-server bench.runRegressions` - Full sweep covered 45 regression inputs; non-target movement was noise-level, so only the targeted slice/remove case is listed. | Benchmark | Before | After | Result | | --- | ---: | ---: | ---: | | `lazy_array_slice_remove` | 5.805 ms/op | 1.088 ms/op | 5.34x faster | Scala Native hyperfine, full regression-input sweep, compared with `upstream/master` (lower is better, notable result only): - Binaries: `./mill --no-server show 'sjsonnet.native[3.3.7].nativeLink'` - Command shape: `hyperfine --warmup 5 --min-runs 20 -N --output=null ...` - Full sweep covered the same 45 regression inputs; `bench.07` was run with `ulimit -s 65520` for both sides because the native binary needs a larger process stack for that input. | Benchmark | Before | After | Result | | --- | ---: | ---: | ---: | | `lazy_array_slice_remove` | 13.2 +/- 0.4 ms | 5.89 +/- 0.32 ms | 2.24x faster | External performance diff, against jrsonnet built from source at `80cd36a` with `cargo build --release -p jrsonnet` (`jrsonnet 0.5.0-pre98`): | Benchmark | sjsonnet Scala Native (#821) | source-built jrsonnet | Result | | --- | ---: | ---: | --- | | `lazy_array_slice_remove` | 5.8 ms +/- 0.2 ms | 7.0 ms +/- 0.2 ms | sjsonnet 1.21 +/- 0.05x faster | JIT / GC review: - `SliceArr` preserves indexed laziness: `eval(i)` returns an `Eval`; `value(i)` forces only the requested element. - Materializing a slice releases the source reference, so long-lived fully-consumed slices do not keep the original array alive. - Large slices avoid allocating and copying `Array[Eval]`; small slices still copy to avoid source-retention overhead. - `std.remove` / `std.removeAt` reuse slice and concat views, avoiding large intermediate arrays. - Concat depth remains bounded. Rollback boundary: - This PR only changes slice/remove array representation. - If a retained-source workload regresses, the slice threshold is the rollback lever without changing the public API. ## References - Builds on #814 lazy array architecture. - Follow-up stack: #822 adds shared `Arr.copyEvalTo`; #823 presizes selected copy consumers.

He-Pin · 2026-05-08T05:17:39Z

Closing obsolete/conflicting stack PR. It builds on #822 and mainly targets array_copy_views; not a current high-priority docs/jrsonnet-gap PR.

He-Pin added 3 commits May 5, 2026 16:16

He-Pin marked this pull request as draft May 5, 2026 09:15

This was referenced May 5, 2026

Add lazy slice array view #821

Merged

Add array eval copy API #822

Draft

He-Pin closed this May 8, 2026

He-Pin reopened this May 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Presize array copy consumers#823

Presize array copy consumers#823
He-Pin wants to merge 3 commits intodatabricks:masterfrom
He-Pin:perf/presized-array-copy-consumers

He-Pin commented May 5, 2026 •

edited

Loading

Uh oh!

He-Pin commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

He-Pin commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modification

Result

References

Uh oh!

He-Pin commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

He-Pin commented May 5, 2026 •

edited

Loading