Skip to content

perf: speed up simple integer parsing#830

Open
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/simple-integer-parse
Open

perf: speed up simple integer parsing#830
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/simple-integer-parse

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented May 8, 2026

Motivation

The jrsonnet docs std.base64 (byte array) benchmark is a large literal array of small integer byte values. In sjsonnet that path spends meaningful time in the parser, and Parser.number currently routes every numeric literal through String.toDouble.

Plain unsigned integer literals do not need the full decimal/exponent parser.

Modification

  • Add a parser fast path for simple unsigned integer literals up to 18 digits.
  • Keep the existing path for literals with underscores, decimal points, exponents, or larger digit counts.
  • Preserve the existing leading-zero, underscore-placement, and finite-number checks.

The implementation is allocation-free beyond the existing parsed token string and uses a tight Long accumulation loop, so it is GC- and JIT-friendly.

Result

JMH, same machine/JDK, docs-aligned case:

benchmark upstream/master this PR delta
bench/resources/go_suite/base64_byte_array.jsonnet 0.801 ms/op 0.719 ms/op -10.2%

Native hyperfine was attempted but not reported: the local machine was under heavy system load and produced unusable outlier-heavy data (55.8 ± 51.0 ms for sjsonnet on this same docs case). I am intentionally excluding that result rather than making a noisy claim.

Verification

  • ./mill -i 'sjsonnet.jvm[3.3.7].test'
  • ./mill -i 'sjsonnet.jvm[3.3.7].reformat'
  • ./mill -i __.checkFormat
  • git diff --check
  • ./mill -i bench.runRegressions bench/resources/go_suite/base64_byte_array.jsonnet
  • ./mill --no-server 'sjsonnet.native[3.3.7]'.nativeLink

Boundary Checks

  • This targets docs byte-array parsing, not std.base64 encoding itself.
  • Decimal/exponent/underscore number forms still use the original String.toDouble path.
  • Integer literals longer than 18 digits still use the original String.toDouble path to avoid changing large-number rounding behavior.

Motivation:
The jrsonnet docs std.base64 byte-array benchmark is dominated by parsing a large literal array of small integer byte values. Parser.number currently sends every numeric literal through String.toDouble, which is unnecessary for plain unsigned integer literals.

Modification:
Add a tight parser fast path for simple unsigned integer literals up to 18 digits. Literals with underscores, decimal points, exponents, or larger digit counts keep the existing validation and String.toDouble path.

Result:
bench/resources/go_suite/base64_byte_array.jsonnet: upstream/master 0.801 ms/op, this change 0.719 ms/op (-10.2%).

Verification:
./mill -i 'sjsonnet.jvm[3.3.7].test'
./mill -i 'sjsonnet.jvm[3.3.7].reformat'
./mill -i __.checkFormat
git diff --check
./mill -i bench.runRegressions bench/resources/go_suite/base64_byte_array.jsonnet
./mill --no-server 'sjsonnet.native[3.3.7]'.nativeLink
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant