perf: use hybrid sort for inline object order by He-Pin · Pull Request #855 · databricks/sjsonnet

He-Pin · 2026-05-13T11:58:32Z

Motivation:

computeSortedInlineOrder was originally optimized for evaluated inline objects with only a few fields. After strict JSON imports began constructing byte-parsed JSON objects as inline Val.Objs, kube-prometheus exposed a wider shape: imported JSON objects can have many visible fields, and the old insertion sort becomes quadratic.

Sampling a repeated kube-prometheus materialization showed Materializer.computeSortedInlineOrder as a real Scala Native hotspot. This PR keeps the small-object fast path but avoids quadratic sorting for larger inline objects.

Key Design Decision:

Use a hybrid in-place sort:

<= 16 visible fields keep insertion sort, preserving the current small-object path.
> 16 visible fields use in-place quicksort with insertion-sort cleanup for small partitions.
Sorting still uses Util.compareStringsByCodepoint, so Jsonnet key ordering semantics are unchanged.
The implementation sorts only a fresh Array[Int] index array; it does not mutate shared parsed object keys or members.

Modification:

Refactor Materializer.computeSortedInlineOrder to delegate sorting to sortInlineOrder.
Add insertion-sort and quicksort helpers over the index array.
Update local performance ledgers with accepted and rejected candidates from this optimization round.

Benchmark Results:

All numbers are local, single benchmark process, no concurrent benchmark agents.

Benchmark	Baseline	Candidate	Result
Native kube forward, vs #854 binary	`145.3 +/- 3.6 ms`	`140.0 +/- 3.2 ms`	`1.04x` faster
Native kube reverse, vs #854 binary	`151.6 +/- 10.2 ms`	`148.9 +/- 3.7 ms`	`1.02x` faster
Native `large_string_template` guard	`6.02 +/- 3.67 ms`	`4.52 +/- 1.27 ms`	no regression observed
JVM JMH `large_string_template` guard	candidate `0.770 ms/op`, repeat `0.723 ms/op`	guard only	no relevant regression signal
jrsonnet kube reference	`93.3 +/- 3.9 ms`	sjsonnet candidate `146.8 +/- 12.8 ms`	jrsonnet still faster

Additional profiling:

Repeated kube materialization sample before: computeSortedInlineOrder top-stack samples 164.
Repeated kube materialization sample after: computeSortedInlineOrder top-stack samples 63; sort-specific samples 75.

Analysis:

The change targets a structural mismatch introduced by wider inline objects from strict JSON imports. Insertion sort is optimal for 2-8 fields, but it is the wrong default for imported JSON objects with larger key sets. The hybrid threshold keeps existing small-object behavior and reduces wider-object ordering from O(n²) toward O(n log n) without boxing.

Correctness checks:

Output equality: kube-prometheus and large_string_template outputs are byte-identical to the perf: parse strict JSON imports from bytes #854 binary.
Focused tests: RendererTests and JsonImportFastPathTests passed.
Formatting: ./mill --no-server --ticker false --color false -j 1 __.checkFormat passed.
Full tests: ./mill --no-server --ticker false --color false -j 1 __.test passed (440/440).

References:

Stacked on perf: parse strict JSON imports from bytes #854: perf: parse strict JSON imports from bytes #854
Upstream base: cedc083
Local commit: 90eba49e perf: use hybrid sort for inline object order

Result:

Draft stacked PR. This should be reviewed after, or together with, the strict JSON byte import PR because the main win appears on the wider inline objects enabled by that prior optimization.

Motivation: PR databricks#840 introduced a strict JSON fast path for .json imports but still forces a full UTF-8 string decode for every cached file before handing the text to ujson.StringParser. Real-world workloads (e.g. kube-prometheus) import many .json files; decoding each one twice (once into String for parsing, again as cache content) is pure overhead. Key Design Decision: ujson 4.4.3 ships ByteArrayParser, which parses UTF-8 JSON directly from a byte array without an intermediate String. Cache small resolved files as raw bytes (already what we read from disk) and lazily decode text only when the importstr/parser-input path actually needs it. Preserve parse-cache content identity by hashing the cached bytes with SHA-256 (length + hex digest) so external ParseCache implementations keep the same collision resistance as the old full-string key. Modification: * Importer.scala: CachedResolver.parseJsonImport now calls ujson.ByteArrayParser.transform(content.readRawBytes(), visitor) instead of decoding the whole file to String first. * CachedResolvedFile.scala (JVM/Native): small files are cached as Array[Byte]; getParserInput / readString materialize the String lazily; readRawBytes returns the cached bytes directly; contentHash is length + SHA-256 over the cached bytes; binary imports still use StaticBinaryResolvedFile. * PreloaderTests.scala: tighten the strict-JSON fast-path coverage so it fails if the fast path ever falls back to readString(). Result: * Output equality vs upstream sjsonnet and jrsonnet preserved on kube-prometheus and large_string_template. * Native kube-prometheus hyperfine A/B (forward & reverse): clean 139.4 +/- 2.8 ms -> candidate 132.7 +/- 1.9 ms (forward) candidate 132.1 +/- 1.9 ms vs clean 140.3 +/- 2.6 ms (reverse) * Full ./mill __.test green. References: Follow-up to databricks#840

Motivation: Large inline objects produced by strict JSON imports can exceed the small-object shape that computeSortedInlineOrder was originally tuned for. Native sampling on kube-prometheus showed sorted inline-order computation as a materialization hotspot, and insertion sort becomes quadratic on those wider objects. Modification: Keep insertion sort for small inline objects, and use an in-place quicksort with insertion-sort cleanup for larger visible field sets. Record the accepted benchmark result and rejected parser/key-render micro-routes in the performance ledgers. Result: Kube-prometheus Native A/B improved on top of strict JSON byte imports, with forward mean 145.3ms -> 140.0ms and reverse mean 151.6ms -> 148.9ms. Formatting and the full test suite pass. References: Upstream-base: databricks/sjsonnet@cedc083 Prior optimization: 883fca5 perf: parse strict JSON imports from bytes

He-Pin added 2 commits May 13, 2026 15:43

He-Pin marked this pull request as ready for review May 13, 2026 12:26

He-Pin marked this pull request as draft May 13, 2026 12:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: use hybrid sort for inline object order#855

perf: use hybrid sort for inline object order#855
He-Pin wants to merge 2 commits into
databricks:masterfrom
He-Pin:perf/hybrid-inline-sort-order

He-Pin commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

He-Pin commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant