Skip to content

perf: optimize identity function composition#826

Draft
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/native-lazy-array-stack
Draft

perf: optimize identity function composition#826
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/native-lazy-array-stack

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented May 7, 2026

Motivation

bench/resources/cpp_suite/bench.07.jsonnet builds a lazy array of composed functions:

local f2(f) = function(x) f(f(x));
local id(x) = x;
local slowId = std.makeArray(20, function(i) if i == 0 then id else f2(slowId[i - 1]));
slowId[15](42)

On upstream master, Scala Native overflows the stack for this case even with --max-stack 100000. On JVM, the same identity-equivalent chain still materializes a deep nested function/lazy evaluation path.

Modification

  • Add a unary identity fast path in Val.Func.apply1.
  • Detect the exact ordinary, non-tailstrict function(x) f(f(x)) shape and return a lightweight identity-composition wrapper.
  • Keep the identity-chain probe iterative, so it is friendly to Scala Native stacks.
  • Use an allocation-free byte state machine for the base-identity probe: BaseIdentityProbeInProgress marks active probes, while BaseIdentityKnownIdentity / BaseIdentityKnownNonIdentity cache the final predicate result. Recursive composition cycles are treated as non-identity for this optimization and fall back to normal application/max-stack handling.
  • Preserve laziness and tailstrict semantics: constructing f2(error "...") stays lazy, calling it still forces the original error, and f2(error "...") tailstrict still forces eagerly.

Result

Compared revisions:

  • upstream/master: 8b67cb1
  • This PR: 515691b3
  • jrsonnet: 5b43fa8

JMH, bench/resources/cpp_suite/bench.07.jsonnet:

./mill -i bench.runRegressions bench/resources/cpp_suite/bench.07.jsonnet
Target Result
upstream/master 3.013 ms/op
this PR 0.037 ms/op
delta 81x faster

Scala Native CLI, same input:

hyperfine --shell=none --warmup 10 --runs 100 \
  --command-name 'sjsonnet native #826' \
  './out/sjsonnet/native/3.3.7/nativeLink.dest/out --max-stack 100000 bench/resources/cpp_suite/bench.07.jsonnet' \
  --command-name 'jrsonnet 5b43fa8' \
  '/path/to/jrsonnet -s 100000 bench/resources/cpp_suite/bench.07.jsonnet'
Target Result
upstream/master native StackOverflowError
this PR native 4.2 +/- 0.9 ms
jrsonnet 5b43fa8 13.1 +/- 1.3 ms
this PR vs jrsonnet 3.09x faster

The native CLI case is tiny and includes process startup, so the stronger signal is the JMH delta plus the master Native failure/pass change.

Verification

Local verification after the semantic state-name update:

git diff --check
./mill -i 'sjsonnet.jvm[3.3.7].test.testOnly' sjsonnet.EvaluatorTests
./mill -i 'sjsonnet.native[3.3.7].test.testOnly' sjsonnet.EvaluatorTests
./mill -i bench.runRegressions bench/resources/cpp_suite/bench.07.jsonnet
./mill -i 'sjsonnet.native[3.3.7].nativeLink'
./mill -i __.checkFormat

All passed.

Boundary Checks

  • The optimization only matches the exact ordinary apply shape function(x) f(f(x)).
  • tailstrict calls are not folded into the lazy identity-composition path.
  • Call-time forcing still preserves the original lazy error: local g = f2(error "..."); g(1).
  • Non-function bases still report the normal type error: f2(1)(1).
  • Non-identity composed functions still fall back to normal evaluation; covered by f2(function(x) x + 1)(1) == 3 and a multi-layer non-identity chain.
  • Recursive identity-composition cycles are covered by self-recursive and object-mutual-recursive tests; they now hit the normal max-stack path instead of hanging in the optimization probe.
  • TailCall-related implementation changes are intentionally not included in this PR.

@He-Pin He-Pin force-pushed the perf/native-lazy-array-stack branch from e3e480f to 19fb7ae Compare May 7, 2026 20:50
@He-Pin He-Pin marked this pull request as draft May 7, 2026 20:52
@He-Pin He-Pin force-pushed the perf/native-lazy-array-stack branch from 19fb7ae to 26bedd2 Compare May 8, 2026 05:17
@He-Pin He-Pin marked this pull request as ready for review May 8, 2026 05:17
@He-Pin He-Pin marked this pull request as draft May 8, 2026 06:08
@He-Pin He-Pin force-pushed the perf/native-lazy-array-stack branch from 26bedd2 to 53556cf Compare May 8, 2026 06:09
@He-Pin
Copy link
Copy Markdown
Contributor Author

He-Pin commented May 8, 2026

Safety update pushed in 53556cfd:

  • Added an IdentityChecking state so identity-composition probing detects recursive composition cycles and falls back to normal application instead of spinning in the optimization path.
  • Added self-recursive and object-mutual-recursive regression tests.

Validation:

  • ./mill -i sjsonnet.jvm[3.3.7].test.testOnly sjsonnet.EvaluatorTests
  • ./mill -i sjsonnet.native[3.3.7].test.testOnly sjsonnet.EvaluatorTests
  • ./mill -i bench.runRegressions bench/resources/cpp_suite/bench.07.jsonnet -> 0.039 ms/op
  • ./mill -i __.checkFormat

@He-Pin He-Pin marked this pull request as ready for review May 8, 2026 06:15
@He-Pin He-Pin force-pushed the perf/native-lazy-array-stack branch 2 times, most recently from 24dc899 to 8390d28 Compare May 8, 2026 06:32
Motivation:
bench.07 builds a deep chain of function(x) f(f(x)) over identity functions. Scala Native overflows the stack on this case with --max-stack 100000, and the JVM path creates tens of thousands of lazy values and function calls.

Modification:
Add an apply1 fast path for unary identity functions and recognize the exact non-tailstrict function(x) f(f(x)) shape. The wrapper preserves laziness, keeps explicit tailstrict eager semantics, and checks identity-composition chains iteratively instead of recursively.

Result:
bench.07 now passes on Scala Native, reduces the JVM debug counters from lazy_created=32786/function_calls=65550 to lazy_created=19/function_calls=16, and reports 0.036 ms/op in the single-case JMH run.
@He-Pin He-Pin force-pushed the perf/native-lazy-array-stack branch from 8390d28 to 515691b Compare May 8, 2026 06:39
@He-Pin He-Pin marked this pull request as draft May 8, 2026 06:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant