Add array eval copy API by He-Pin · Pull Request #822 · databricks/sjsonnet

He-Pin · 2026-05-05T08:38:27Z

Motivation

#821 adds slice views, but stdlib consumers still need one narrow way to bulk-copy arrays without each call site reimplementing directBackingArray / range / view branches.

This PR adds a shared Eval copy API that preserves laziness and gives fully-consumed array paths a simple JVM/Scala Native friendly implementation.

Constraints:

copy references, not values
do not force thunks
keep loops simple enough for HotSpot and Scala Native
avoid widening the API beyond current consumers
keep the PR stacked and reviewable on top of Add lazy slice array view #821

Modification

Add a narrow Arr.copyEvalTo API for bulk copying Eval references without forcing Val values.

This PR routes these consumers through the API:

std.flattenArrays
array std.flatMap
array-separator std.join

Specialized copy paths cover concat, repeat, slice, reversed lazy views, range, and byte arrays.

Result

Rebased onto latest upstream/master at 8b67cb1e.

JMH, JVM harness, lower is better:

Benchmark	master	this PR	Delta
`sjsonnet_suite/array_copy_views`	18.027 ms/op	13.972 ms/op	-22.5%

Scala Native hyperfine against latest source-built jrsonnet master 5b43fa8 (jrsonnet 0.5.0-pre98), lower is better:

Benchmark	master-native	this PR	latest jrsonnet	Result
`sjsonnet_suite/array_copy_views`	23.0 +/- 1.6 ms	9.1 +/- 3.1 ms	9.4 +/- 3.0 ms	2.53x faster than master; roughly even with latest jrsonnet under noisy CLI timing

JIT / GC review:

The API copies Eval, not Val, so laziness and callback/error retry behavior stay intact.
copyEvalTo(Array[Eval], offset) lets hot consumers write into one preallocated array with indexed while-loops.
Concat materialization now copies through the same API, so concat children can provide cheap copy paths without first materializing themselves.
Consumer code no longer repeats polymorphic branches at every call site.

Verification:

git diff --check
./mill --no-server 'sjsonnet.jvm[3.3.7].compile'
./mill --no-server 'sjsonnet.jvm[2.13.18].compile'
./mill --no-server 'sjsonnet.jvm[2.12.21].compile'
./mill --no-server 'sjsonnet.jvm[3.3.7].test'
./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat'
./mill -i 'sjsonnet.native[3.3.7].nativeLink'
./mill -i bench.runRegressions bench/resources/sjsonnet_suite/array_copy_views.jsonnet
hyperfine --warmup 10 --min-runs 50

Rollback boundary:

This PR is infrastructure plus three consumers.
It does not change string join sizing, callback invocation, sort, or renderer behavior.
If a consumer regresses, it can move back to the previous local copy path while keeping the API for other consumers.

References

Builds on Add lazy slice array view #821 lazy slice array view.
Follow-up stack: Presize array copy consumers #823 presizes selected copy consumers.
Head: 4f65897fff2ea8f736cc569de356d963ae78ed7e
Latest source-built jrsonnet: 5b43fa88b8c43856dd5a2daa9c5c251153c5e14d

## Motivation After #814, sjsonnet has lazy array views for large stdlib arrays, but array slicing and `std.remove` / `std.removeAt` still had paths that eagerly allocated new `Array[Eval]` backing arrays. This PR adds a focused slice view so large slices avoid copying unless fully materialized, while keeping JVM/Scala Native behavior conservative. Constraints: - keep Jsonnet indexed laziness: `length`, `eval(i)`, and `value(i)` stay O(1) - do not force elements while slicing - keep small slices eager to avoid retaining large sources - prevent deep concat trees - keep hot paths JIT/GC friendly ## Modification Add a lazy `SliceArr` view and route `Arr.sliced(...)` through it when the slice is large enough or the source is already compact/view-backed. Changed behavior: - array slicing can return `SliceArr` instead of eagerly copying `Array[Eval]` - `std.remove` and `std.removeAt` reuse slice + concat views - compact sources (`RangeArr`, `ByteArr`, lazy indexed arrays, repeat, slice) can slice as O(1) views - flat/reversed/concat arrays only use a view when the slice is large enough to justify source retention - concat still caps tree depth by flattening when either side is already a concat view ## Result Verification passed: - `./mill --no-server 'sjsonnet.jvm[3.3.7].compile'` - `./mill --no-server 'sjsonnet.jvm[2.13.18].compile'` - `./mill --no-server 'sjsonnet.jvm[2.12.21].compile'` - `./mill --no-server 'sjsonnet.jvm[3.3.7].test.testOnly' sjsonnet.ValArrayViewTests sjsonnet.Std0150FunctionsTests` - `./mill --no-server 'sjsonnet.jvm[3.3.7].test'` - `./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat'` - `./mill --no-server bench.checkFormat` - `./mill --no-server 'sjsonnet.native[3.3.7].nativeLink'` JMH, full JVM regression sweep, compared with `upstream/master` (lower is better, notable result only): - Baseline: `upstream/master` at `b4c667d5` - PR head: `3bde215b` - Command: `./mill --no-server bench.runRegressions` - Full sweep covered 45 regression inputs; non-target movement was noise-level, so only the targeted slice/remove case is listed. | Benchmark | Before | After | Result | | --- | ---: | ---: | ---: | | `lazy_array_slice_remove` | 5.805 ms/op | 1.088 ms/op | 5.34x faster | Scala Native hyperfine, full regression-input sweep, compared with `upstream/master` (lower is better, notable result only): - Binaries: `./mill --no-server show 'sjsonnet.native[3.3.7].nativeLink'` - Command shape: `hyperfine --warmup 5 --min-runs 20 -N --output=null ...` - Full sweep covered the same 45 regression inputs; `bench.07` was run with `ulimit -s 65520` for both sides because the native binary needs a larger process stack for that input. | Benchmark | Before | After | Result | | --- | ---: | ---: | ---: | | `lazy_array_slice_remove` | 13.2 +/- 0.4 ms | 5.89 +/- 0.32 ms | 2.24x faster | External performance diff, against jrsonnet built from source at `80cd36a` with `cargo build --release -p jrsonnet` (`jrsonnet 0.5.0-pre98`): | Benchmark | sjsonnet Scala Native (#821) | source-built jrsonnet | Result | | --- | ---: | ---: | --- | | `lazy_array_slice_remove` | 5.8 ms +/- 0.2 ms | 7.0 ms +/- 0.2 ms | sjsonnet 1.21 +/- 0.05x faster | JIT / GC review: - `SliceArr` preserves indexed laziness: `eval(i)` returns an `Eval`; `value(i)` forces only the requested element. - Materializing a slice releases the source reference, so long-lived fully-consumed slices do not keep the original array alive. - Large slices avoid allocating and copying `Array[Eval]`; small slices still copy to avoid source-retention overhead. - `std.remove` / `std.removeAt` reuse slice and concat views, avoiding large intermediate arrays. - Concat depth remains bounded. Rollback boundary: - This PR only changes slice/remove array representation. - If a retained-source workload regresses, the slice threshold is the rollback lever without changing the public API. ## References - Builds on #814 lazy array architecture. - Follow-up stack: #822 adds shared `Arr.copyEvalTo`; #823 presizes selected copy consumers.

He-Pin · 2026-05-08T05:17:35Z

Closing obsolete/conflicting stack PR. This was infrastructure for array-copy synthetic/sjsonnet_suite workloads and is not a current high-priority docs/jrsonnet-gap PR.

Motivation: Several stdlib consumers fully copy array elements after the lazy-array work. Centralizing that path avoids repeated directBackingArray/range/view branches and lets concat, repeat, slice, range, and byte arrays expose cheap bulk Eval copies without forcing Val values. Modifications: - add Arr.copyEvalTo overloads for ArrayBuilder and preallocated Array[Eval] - teach concat materialization/eager concat to copy through the new API - add specialized copy implementations for repeat, slice, reversed lazy views, range, and byte arrays - route std.flattenArrays, array flatMap, and array-separator std.join through the API - add correctness coverage and an array_copy_views regression benchmark Results: - ./mill --no-server 'sjsonnet.jvm[3.3.7].compile' - ./mill --no-server 'sjsonnet.jvm[2.13.18].compile' - ./mill --no-server 'sjsonnet.jvm[2.12.21].compile' - ./mill --no-server 'sjsonnet.jvm[3.3.7].test.testOnly' sjsonnet.ValArrayViewTests sjsonnet.Std0150FunctionsTests - ./mill --no-server 'sjsonnet.jvm[3.3.7].test' - ./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat' - ./mill --no-server bench.checkFormat - ./mill --no-server 'sjsonnet.native[3.3.7].nativeLink' - JMH runRegressions vs slice baseline: array_copy_views 16.871 -> 13.937 ms/op - Scala Native hyperfine vs slice baseline: array_copy_views 26.1 ms -> 10.9 ms, 2.39x faster

He-Pin · 2026-05-08T05:21:34Z

Reopened and rebased onto current upstream/master. This remains draft: the Arr.copyEvalTo idea is useful infrastructure, but it needs fresh current benchmark attribution before ready.

He-Pin · 2026-05-08T05:25:17Z

Marking ready under the updated review bar: a PR can be ready when it has a clear positive result for at least one benchmark suite, even if it is not the primary docs/jrsonnet-gap case. Keep normal/noise-only benchmark movement out of the description.

He-Pin mentioned this pull request May 5, 2026

Presize array copy consumers #823

Open

He-Pin marked this pull request as draft May 5, 2026 09:14

He-Pin mentioned this pull request May 5, 2026

Add lazy slice array view #821

Merged

He-Pin closed this May 8, 2026

He-Pin reopened this May 8, 2026

He-Pin force-pushed the perf/arr-copy-eval-api branch from d6abb40 to 4f65897 Compare May 8, 2026 05:20

He-Pin marked this pull request as ready for review May 8, 2026 05:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add array eval copy API#822

Add array eval copy API#822
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/arr-copy-eval-api

He-Pin commented May 5, 2026 •

edited

Loading

Uh oh!

He-Pin commented May 8, 2026

Uh oh!

He-Pin commented May 8, 2026

Uh oh!

He-Pin commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

He-Pin commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modification

Result

References

Uh oh!

He-Pin commented May 8, 2026

Uh oh!

He-Pin commented May 8, 2026

Uh oh!

He-Pin commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

He-Pin commented May 5, 2026 •

edited

Loading