perf: speed up simple integer parsing by He-Pin · Pull Request #830 · databricks/sjsonnet

He-Pin · 2026-05-08T03:09:52Z

Motivation

The jrsonnet docs std.base64 (byte array) benchmark is a large literal array of small integer byte values. In sjsonnet that path spends meaningful time in the parser, and Parser.number currently routes every numeric literal through String.toDouble.

Plain unsigned integer literals do not need the full decimal/exponent parser.

Modification

Add a parser fast path for simple unsigned integer literals up to 18 digits.
Keep the existing path for literals with underscores, decimal points, exponents, or larger digit counts.
Preserve the existing leading-zero, underscore-placement, and finite-number checks.

The implementation is allocation-free beyond the existing parsed token string and uses a tight Long accumulation loop, so it is GC- and JIT-friendly.

Result

JMH, same machine/JDK, docs-aligned case:

benchmark	upstream/master	this PR	delta
`bench/resources/go_suite/base64_byte_array.jsonnet`	`0.801 ms/op`	`0.719 ms/op`	`-10.2%`

Native hyperfine was attempted but not reported: the local machine was under heavy system load and produced unusable outlier-heavy data (55.8 ± 51.0 ms for sjsonnet on this same docs case). I am intentionally excluding that result rather than making a noisy claim.

Verification

./mill -i 'sjsonnet.jvm[3.3.7].test'
./mill -i 'sjsonnet.jvm[3.3.7].reformat'
./mill -i __.checkFormat
git diff --check
./mill -i bench.runRegressions bench/resources/go_suite/base64_byte_array.jsonnet
./mill --no-server 'sjsonnet.native[3.3.7]'.nativeLink

Boundary Checks

This targets docs byte-array parsing, not std.base64 encoding itself.
Decimal/exponent/underscore number forms still use the original String.toDouble path.
Integer literals longer than 18 digits still use the original String.toDouble path to avoid changing large-number rounding behavior.

Motivation: The jrsonnet docs std.base64 byte-array benchmark is dominated by parsing a large literal array of small integer byte values. Parser.number currently sends every numeric literal through String.toDouble, which is unnecessary for plain unsigned integer literals. Modification: Add a tight parser fast path for simple unsigned integer literals up to 18 digits. Literals with underscores, decimal points, exponents, or larger digit counts keep the existing validation and String.toDouble path. Result: bench/resources/go_suite/base64_byte_array.jsonnet: upstream/master 0.801 ms/op, this change 0.719 ms/op (-10.2%). Verification: ./mill -i 'sjsonnet.jvm[3.3.7].test' ./mill -i 'sjsonnet.jvm[3.3.7].reformat' ./mill -i __.checkFormat git diff --check ./mill -i bench.runRegressions bench/resources/go_suite/base64_byte_array.jsonnet ./mill --no-server 'sjsonnet.native[3.3.7]'.nativeLink

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: speed up simple integer parsing#830

perf: speed up simple integer parsing#830
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/simple-integer-parse

He-Pin commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

He-Pin commented May 8, 2026

Motivation

Modification

Result

Verification

Boundary Checks

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant