Presize array copy consumers#823
Draft
He-Pin wants to merge 3 commits intodatabricks:masterfrom
Draft
Conversation
Motivation: Avoid copying large array slices and remove/removeAt intermediates after the lazy-array work. This follows jrsonnet's indexed slice-view idea while keeping JVM retention under control for small sub-slices. Modifications: - add Val.Arr.sliced and SliceArr for large or compact-source slices - route array slicing and std.remove/removeAt through slice/concat views - let large concat decisions use total length, with overflow protection - add correctness coverage and a slice/remove benchmark resource Results: - ./mill --no-server 'sjsonnet.jvm[3.3.7].compile' - ./mill --no-server 'sjsonnet.jvm[2.13.18].compile' - ./mill --no-server 'sjsonnet.jvm[2.12.21].compile' - ./mill --no-server 'sjsonnet.jvm[3.3.7].test.testOnly' sjsonnet.ValArrayViewTests sjsonnet.Std0150FunctionsTests - ./mill --no-server 'sjsonnet.jvm[3.3.7].test' - ./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat' - ./mill --no-server bench.checkFormat - JMH runRegressions: lazy_array_slice_remove 5.890 -> 1.089 ms/op - hyperfine macro slice/remove: 498.6 ms -> 335.5 ms
Motivation: Several stdlib consumers fully copy array elements after the lazy-array work. Centralizing that path avoids repeated directBackingArray/range/view branches and lets concat, repeat, slice, range, and byte arrays expose cheap bulk Eval copies without forcing Val values. Modifications: - add Arr.copyEvalTo overloads for ArrayBuilder and preallocated Array[Eval] - teach concat materialization/eager concat to copy through the new API - add specialized copy implementations for repeat, slice, reversed lazy views, range, and byte arrays - route std.flattenArrays, array flatMap, and array-separator std.join through the API - add correctness coverage and an array_copy_views regression benchmark Results: - ./mill --no-server 'sjsonnet.jvm[3.3.7].compile' - ./mill --no-server 'sjsonnet.jvm[2.13.18].compile' - ./mill --no-server 'sjsonnet.jvm[2.12.21].compile' - ./mill --no-server 'sjsonnet.jvm[3.3.7].test.testOnly' sjsonnet.ValArrayViewTests sjsonnet.Std0150FunctionsTests - ./mill --no-server 'sjsonnet.jvm[3.3.7].test' - ./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat' - ./mill --no-server bench.checkFormat - ./mill --no-server 'sjsonnet.native[3.3.7].nativeLink' - JMH runRegressions vs slice baseline: array_copy_views 16.871 -> 13.937 ms/op - Scala Native hyperfine vs slice baseline: array_copy_views 26.1 ms -> 10.9 ms, 2.39x faster
Motivation: After adding Arr.copyEvalTo, high-volume consumers can avoid ArrayBuilder growth by counting output length first and copying into a single Array[Eval]. This targets small outer arrays that contain large view-backed subarrays, while preserving the one-pass builder path for many-small-array workloads. Modifications: - presize std.flattenArrays when the outer part count is modest - presize array-separator std.join when the outer part count is modest - keep the one-pass ArrayBuilder + copyEvalTo fallback for large part counts Results: - ./mill --no-server 'sjsonnet.jvm[3.3.7].compile' - ./mill --no-server 'sjsonnet.jvm[2.13.18].compile' - ./mill --no-server 'sjsonnet.jvm[2.12.21].compile' - ./mill --no-server 'sjsonnet.jvm[3.3.7].test.testOnly' sjsonnet.Std0150FunctionsTests sjsonnet.ValArrayViewTests - ./mill --no-server 'sjsonnet.jvm[3.3.7].test' - ./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat' - ./mill --no-server 'sjsonnet.native[3.3.7].nativeLink' - JMH runRegressions vs copy-api baseline: array_copy_views 13.002 -> 8.454 ms/op - Scala Native hyperfine vs copy-api baseline: array_copy_views 11.9 ms -> 10.5 ms - Scala Native hyperfine many-small fallback: 7.0 ms -> 6.6 ms - Scala Native hyperfine realistic2: 82.6 ms -> 82.5 ms
This was referenced May 5, 2026
stephenamar-db
pushed a commit
that referenced
this pull request
May 7, 2026
## Motivation After #814, sjsonnet has lazy array views for large stdlib arrays, but array slicing and `std.remove` / `std.removeAt` still had paths that eagerly allocated new `Array[Eval]` backing arrays. This PR adds a focused slice view so large slices avoid copying unless fully materialized, while keeping JVM/Scala Native behavior conservative. Constraints: - keep Jsonnet indexed laziness: `length`, `eval(i)`, and `value(i)` stay O(1) - do not force elements while slicing - keep small slices eager to avoid retaining large sources - prevent deep concat trees - keep hot paths JIT/GC friendly ## Modification Add a lazy `SliceArr` view and route `Arr.sliced(...)` through it when the slice is large enough or the source is already compact/view-backed. Changed behavior: - array slicing can return `SliceArr` instead of eagerly copying `Array[Eval]` - `std.remove` and `std.removeAt` reuse slice + concat views - compact sources (`RangeArr`, `ByteArr`, lazy indexed arrays, repeat, slice) can slice as O(1) views - flat/reversed/concat arrays only use a view when the slice is large enough to justify source retention - concat still caps tree depth by flattening when either side is already a concat view ## Result Verification passed: - `./mill --no-server 'sjsonnet.jvm[3.3.7].compile'` - `./mill --no-server 'sjsonnet.jvm[2.13.18].compile'` - `./mill --no-server 'sjsonnet.jvm[2.12.21].compile'` - `./mill --no-server 'sjsonnet.jvm[3.3.7].test.testOnly' sjsonnet.ValArrayViewTests sjsonnet.Std0150FunctionsTests` - `./mill --no-server 'sjsonnet.jvm[3.3.7].test'` - `./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat'` - `./mill --no-server bench.checkFormat` - `./mill --no-server 'sjsonnet.native[3.3.7].nativeLink'` JMH, full JVM regression sweep, compared with `upstream/master` (lower is better, notable result only): - Baseline: `upstream/master` at `b4c667d5` - PR head: `3bde215b` - Command: `./mill --no-server bench.runRegressions` - Full sweep covered 45 regression inputs; non-target movement was noise-level, so only the targeted slice/remove case is listed. | Benchmark | Before | After | Result | | --- | ---: | ---: | ---: | | `lazy_array_slice_remove` | 5.805 ms/op | 1.088 ms/op | 5.34x faster | Scala Native hyperfine, full regression-input sweep, compared with `upstream/master` (lower is better, notable result only): - Binaries: `./mill --no-server show 'sjsonnet.native[3.3.7].nativeLink'` - Command shape: `hyperfine --warmup 5 --min-runs 20 -N --output=null ...` - Full sweep covered the same 45 regression inputs; `bench.07` was run with `ulimit -s 65520` for both sides because the native binary needs a larger process stack for that input. | Benchmark | Before | After | Result | | --- | ---: | ---: | ---: | | `lazy_array_slice_remove` | 13.2 +/- 0.4 ms | 5.89 +/- 0.32 ms | 2.24x faster | External performance diff, against jrsonnet built from source at `80cd36a` with `cargo build --release -p jrsonnet` (`jrsonnet 0.5.0-pre98`): | Benchmark | sjsonnet Scala Native (#821) | source-built jrsonnet | Result | | --- | ---: | ---: | --- | | `lazy_array_slice_remove` | 5.8 ms +/- 0.2 ms | 7.0 ms +/- 0.2 ms | sjsonnet 1.21 +/- 0.05x faster | JIT / GC review: - `SliceArr` preserves indexed laziness: `eval(i)` returns an `Eval`; `value(i)` forces only the requested element. - Materializing a slice releases the source reference, so long-lived fully-consumed slices do not keep the original array alive. - Large slices avoid allocating and copying `Array[Eval]`; small slices still copy to avoid source-retention overhead. - `std.remove` / `std.removeAt` reuse slice and concat views, avoiding large intermediate arrays. - Concat depth remains bounded. Rollback boundary: - This PR only changes slice/remove array representation. - If a retained-source workload regresses, the slice threshold is the rollback lever without changing the public API. ## References - Builds on #814 lazy array architecture. - Follow-up stack: #822 adds shared `Arr.copyEvalTo`; #823 presizes selected copy consumers.
Contributor
Author
|
Closing obsolete/conflicting stack PR. It builds on #822 and mainly targets array_copy_views; not a current high-priority docs/jrsonnet-gap PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
#822 gives consumers a cheap
Evalcopy API, butstd.flattenArraysand array-separatorstd.joincan still payArrayBuildergrowth/copy costs when the outer array has a modest number of large child arrays.This PR adds a guarded two-pass pre-size path for those consumers. The goal is to remove avoidable intermediate allocation in few-large-array workloads without regressing many-small-array workloads.
Constraints:
Modification
Stacked on #822.
Use
Arr.copyEvalToto presize high-volume array-copy consumers:std.flattenArraysstd.joinThe pre-sized path uses two linear scans only when the outer part count is modest (
<= 1024). Large outer arrays fall back to the one-passArrayBuilder + copyEvalTopath from #822.Result
Verification passed:
./mill --no-server 'sjsonnet.jvm[3.3.7].compile'./mill --no-server 'sjsonnet.jvm[2.13.18].compile'./mill --no-server 'sjsonnet.jvm[2.12.21].compile'./mill --no-server 'sjsonnet.jvm[3.3.7].test.testOnly' sjsonnet.Std0150FunctionsTests sjsonnet.ValArrayViewTests./mill --no-server 'sjsonnet.jvm[3.3.7].test'./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat'./mill --no-server 'sjsonnet.native[3.3.7].nativeLink'git diff --checkJMH, JVM harness, compared with #822 copy-api baseline:
array_copy_viewsrealistic2Scala Native hyperfine, compared with #822 copy-api baseline, using Scala Native binaries, not JVM jars:
array_copy_viewsrealistic2External performance diff, against jrsonnet built from source at
80cd36awithcargo build --release -p jrsonnet(jrsonnet 0.5.0-pre98):array_copy_viewsrealistic2JIT / GC review:
Evalreferences into one preallocatedArray[Eval]; it does not force element values.totalLenis accumulated asLongand checked before allocating the finalArray[Eval].PresizedCopyMaxParts = 1024avoids turning many-small arrays into an always-two-pass workload.copyEvalTo, so it stays friendly to JIT inlining and Scala Native codegen.Rollback boundary:
References