Skip to content

Add lazy slice array view#821

Merged
stephenamar-db merged 1 commit intodatabricks:masterfrom
He-Pin:perf/slice-array-view
May 7, 2026
Merged

Add lazy slice array view#821
stephenamar-db merged 1 commit intodatabricks:masterfrom
He-Pin:perf/slice-array-view

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented May 5, 2026

Motivation

After #814, sjsonnet has lazy array views for large stdlib arrays, but array slicing and std.remove / std.removeAt still had paths that eagerly allocated new Array[Eval] backing arrays.

This PR adds a focused slice view so large slices avoid copying unless fully materialized, while keeping JVM/Scala Native behavior conservative.

Constraints:

  • keep Jsonnet indexed laziness: length, eval(i), and value(i) stay O(1)
  • do not force elements while slicing
  • keep small slices eager to avoid retaining large sources
  • prevent deep concat trees
  • keep hot paths JIT/GC friendly

Modification

Add a lazy SliceArr view and route Arr.sliced(...) through it when the slice is large enough or the source is already compact/view-backed.

Changed behavior:

  • array slicing can return SliceArr instead of eagerly copying Array[Eval]
  • std.remove and std.removeAt reuse slice + concat views
  • compact sources (RangeArr, ByteArr, lazy indexed arrays, repeat, slice) can slice as O(1) views
  • flat/reversed/concat arrays only use a view when the slice is large enough to justify source retention
  • concat still caps tree depth by flattening when either side is already a concat view

Result

Verification passed:

  • ./mill --no-server 'sjsonnet.jvm[3.3.7].compile'
  • ./mill --no-server 'sjsonnet.jvm[2.13.18].compile'
  • ./mill --no-server 'sjsonnet.jvm[2.12.21].compile'
  • ./mill --no-server 'sjsonnet.jvm[3.3.7].test.testOnly' sjsonnet.ValArrayViewTests sjsonnet.Std0150FunctionsTests
  • ./mill --no-server 'sjsonnet.jvm[3.3.7].test'
  • ./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat'
  • ./mill --no-server bench.checkFormat
  • ./mill --no-server 'sjsonnet.native[3.3.7].nativeLink'

JMH, full JVM regression sweep, compared with upstream/master (lower is better, notable result only):

  • Baseline: upstream/master at b4c667d5
  • PR head: 3bde215b
  • Command: ./mill --no-server bench.runRegressions
  • Full sweep covered 45 regression inputs; non-target movement was noise-level, so only the targeted slice/remove case is listed.
Benchmark Before After Result
lazy_array_slice_remove 5.805 ms/op 1.088 ms/op 5.34x faster

Scala Native hyperfine, full regression-input sweep, compared with upstream/master (lower is better, notable result only):

  • Binaries: ./mill --no-server show 'sjsonnet.native[3.3.7].nativeLink'
  • Command shape: hyperfine --warmup 5 --min-runs 20 -N --output=null ...
  • Full sweep covered the same 45 regression inputs; bench.07 was run with ulimit -s 65520 for both sides because the native binary needs a larger process stack for that input.
Benchmark Before After Result
lazy_array_slice_remove 13.2 +/- 0.4 ms 5.89 +/- 0.32 ms 2.24x faster

External performance diff, against jrsonnet built from source at 80cd36a with cargo build --release -p jrsonnet (jrsonnet 0.5.0-pre98):

Benchmark sjsonnet Scala Native (#821) source-built jrsonnet Result
lazy_array_slice_remove 5.8 ms +/- 0.2 ms 7.0 ms +/- 0.2 ms sjsonnet 1.21 +/- 0.05x faster

JIT / GC review:

  • SliceArr preserves indexed laziness: eval(i) returns an Eval; value(i) forces only the requested element.
  • Materializing a slice releases the source reference, so long-lived fully-consumed slices do not keep the original array alive.
  • Large slices avoid allocating and copying Array[Eval]; small slices still copy to avoid source-retention overhead.
  • std.remove / std.removeAt reuse slice and concat views, avoiding large intermediate arrays.
  • Concat depth remains bounded.

Rollback boundary:

  • This PR only changes slice/remove array representation.
  • If a retained-source workload regresses, the slice threshold is the rollback lever without changing the public API.

References

Motivation:

Avoid copying large array slices and remove/removeAt intermediates after the lazy-array work. This follows jrsonnet's indexed slice-view idea while keeping JVM retention under control for small sub-slices.

Modifications:

- add Val.Arr.sliced and SliceArr for large or compact-source slices
- route array slicing and std.remove/removeAt through slice/concat views
- let large concat decisions use total length, with overflow protection
- add correctness coverage and a slice/remove benchmark resource

Results:

- ./mill --no-server 'sjsonnet.jvm[3.3.7].compile'
- ./mill --no-server 'sjsonnet.jvm[2.13.18].compile'
- ./mill --no-server 'sjsonnet.jvm[2.12.21].compile'
- ./mill --no-server 'sjsonnet.jvm[3.3.7].test.testOnly' sjsonnet.ValArrayViewTests sjsonnet.Std0150FunctionsTests
- ./mill --no-server 'sjsonnet.jvm[3.3.7].test'
- ./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat'
- ./mill --no-server bench.checkFormat
- JMH runRegressions: lazy_array_slice_remove 5.890 -> 1.089 ms/op
- hyperfine macro slice/remove: 498.6 ms -> 335.5 ms
@He-Pin He-Pin mentioned this pull request May 5, 2026
@stephenamar-db stephenamar-db merged commit 04e5707 into databricks:master May 7, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants