perf: use hybrid sort for inline object order#855
Draft
He-Pin wants to merge 2 commits into
Draft
Conversation
Motivation: PR databricks#840 introduced a strict JSON fast path for .json imports but still forces a full UTF-8 string decode for every cached file before handing the text to ujson.StringParser. Real-world workloads (e.g. kube-prometheus) import many .json files; decoding each one twice (once into String for parsing, again as cache content) is pure overhead. Key Design Decision: ujson 4.4.3 ships ByteArrayParser, which parses UTF-8 JSON directly from a byte array without an intermediate String. Cache small resolved files as raw bytes (already what we read from disk) and lazily decode text only when the importstr/parser-input path actually needs it. Preserve parse-cache content identity by hashing the cached bytes with SHA-256 (length + hex digest) so external ParseCache implementations keep the same collision resistance as the old full-string key. Modification: * Importer.scala: CachedResolver.parseJsonImport now calls ujson.ByteArrayParser.transform(content.readRawBytes(), visitor) instead of decoding the whole file to String first. * CachedResolvedFile.scala (JVM/Native): small files are cached as Array[Byte]; getParserInput / readString materialize the String lazily; readRawBytes returns the cached bytes directly; contentHash is length + SHA-256 over the cached bytes; binary imports still use StaticBinaryResolvedFile. * PreloaderTests.scala: tighten the strict-JSON fast-path coverage so it fails if the fast path ever falls back to readString(). Result: * Output equality vs upstream sjsonnet and jrsonnet preserved on kube-prometheus and large_string_template. * Native kube-prometheus hyperfine A/B (forward & reverse): clean 139.4 +/- 2.8 ms -> candidate 132.7 +/- 1.9 ms (forward) candidate 132.1 +/- 1.9 ms vs clean 140.3 +/- 2.6 ms (reverse) * Full ./mill __.test green. References: Follow-up to databricks#840
Motivation: Large inline objects produced by strict JSON imports can exceed the small-object shape that computeSortedInlineOrder was originally tuned for. Native sampling on kube-prometheus showed sorted inline-order computation as a materialization hotspot, and insertion sort becomes quadratic on those wider objects. Modification: Keep insertion sort for small inline objects, and use an in-place quicksort with insertion-sort cleanup for larger visible field sets. Record the accepted benchmark result and rejected parser/key-render micro-routes in the performance ledgers. Result: Kube-prometheus Native A/B improved on top of strict JSON byte imports, with forward mean 145.3ms -> 140.0ms and reverse mean 151.6ms -> 148.9ms. Formatting and the full test suite pass. References: Upstream-base: databricks/sjsonnet@cedc083 Prior optimization: 883fca5 perf: parse strict JSON imports from bytes
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation:
computeSortedInlineOrderwas originally optimized for evaluated inline objects with only a few fields. After strict JSON imports began constructing byte-parsed JSON objects as inlineVal.Objs, kube-prometheus exposed a wider shape: imported JSON objects can have many visible fields, and the old insertion sort becomes quadratic.Sampling a repeated kube-prometheus materialization showed
Materializer.computeSortedInlineOrderas a real Scala Native hotspot. This PR keeps the small-object fast path but avoids quadratic sorting for larger inline objects.Key Design Decision:
Use a hybrid in-place sort:
<= 16visible fields keep insertion sort, preserving the current small-object path.> 16visible fields use in-place quicksort with insertion-sort cleanup for small partitions.Util.compareStringsByCodepoint, so Jsonnet key ordering semantics are unchanged.Array[Int]index array; it does not mutate shared parsed object keys or members.Modification:
Materializer.computeSortedInlineOrderto delegate sorting tosortInlineOrder.Benchmark Results:
All numbers are local, single benchmark process, no concurrent benchmark agents.
145.3 +/- 3.6 ms140.0 +/- 3.2 ms1.04xfaster151.6 +/- 10.2 ms148.9 +/- 3.7 ms1.02xfasterlarge_string_templateguard6.02 +/- 3.67 ms4.52 +/- 1.27 mslarge_string_templateguard0.770 ms/op, repeat0.723 ms/op93.3 +/- 3.9 ms146.8 +/- 12.8 msAdditional profiling:
computeSortedInlineOrdertop-stack samples164.computeSortedInlineOrdertop-stack samples63; sort-specific samples75.Analysis:
The change targets a structural mismatch introduced by wider inline objects from strict JSON imports. Insertion sort is optimal for 2-8 fields, but it is the wrong default for imported JSON objects with larger key sets. The hybrid threshold keeps existing small-object behavior and reduces wider-object ordering from O(n²) toward O(n log n) without boxing.
Correctness checks:
large_string_templateoutputs are byte-identical to the perf: parse strict JSON imports from bytes #854 binary.RendererTestsandJsonImportFastPathTestspassed../mill --no-server --ticker false --color false -j 1 __.checkFormatpassed../mill --no-server --ticker false --color false -j 1 __.testpassed (440/440).References:
90eba49e perf: use hybrid sort for inline object orderResult:
Draft stacked PR. This should be reviewed after, or together with, the strict JSON byte import PR because the main win appears on the wider inline objects enabled by that prior optimization.