Skip to content

perf: preserve ASCII-safe simple format results#856

Draft
He-Pin wants to merge 4 commits into
databricks:masterfrom
He-Pin:perf/simple-format-ascii-safe
Draft

perf: preserve ASCII-safe simple format results#856
He-Pin wants to merge 4 commits into
databricks:masterfrom
He-Pin:perf/simple-format-ascii-safe

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented May 13, 2026

Motivation:
large_string_template still spent time re-encoding and re-scanning the huge string produced by simple named format interpolation, even when the static format literals and dynamic values made the final string JSON-string ASCII-safe. This keeps the focus on the largest remaining local gap from #666-style benchmarking.

Key Design Decision
Track ASCII safety as metadata on the compiled simple named format path instead of adding another renderer scan. The optimization is deliberately limited to all-simple %(key)s formats, where every emitted dynamic value goes through simpleStringValue and can be conservatively classified.

Modification:

  • Add staticAsciiSafe metadata to compiled format strings.
  • Return Val.Str.asciiSafe from Format.PartialApplyFmt when static literals and all simple named dynamic values are JSON-string ASCII-safe.
  • Keep unsafe strings, unsafe static literals, and mixed-key unsafe values conservative.
  • Add focused regression tests for safe numeric values, unsafe dynamic strings, unsafe static literals, and mixed-key safety.
  • Update the gap and sync ledgers.

Benchmark Results:
JVM JMH (bench.runRegressions bench/resources/cpp_suite/large_string_template.jsonnet):

  • Before: 0.683 ms/op
  • After: 0.677 ms/op

Scala Native hyperfine, large_string_template, before vs after:

  • Forward: 8.64 +/- 0.75 ms -> 8.01 +/- 0.52 ms
  • Reverse: 8.65 +/- 0.50 ms -> 8.17 +/- 0.48 ms

Scala Native hyperfine, source-built jrsonnet comparison:

  • sjsonnet after: 8.01-8.17 ms
  • jrsonnet: 6.0 +/- 1.2 ms
  • Remaining gap: about 1.34x

Guard benchmark, kube-prometheus:

  • Forward: 131.74 ms -> 129.14 ms
  • Reverse: 129.11 ms baseline vs 130.94 ms candidate
  • Interpreted as neutral/noisy, not a target regression.

Analysis:
The existing renderer already has an ASCII-safe direct byte path, but formatted strings lost that metadata and fell back to UTF-8 encoding plus escape scanning. This change preserves the metadata at the producer where the safety condition is known, avoiding an extra whole-string encoding/scan on large simple format outputs. The safety predicate is conservative: unknown complex values, unsafe strings, or unsafe literal text do not get the fast-path marker.

References:

Result:
large_string_template improves in both Native command orders, JVM JMH does not regress, output equality holds for large_string_template and kube-prometheus, and ./mill --no-server --ticker false --color false -j 1 __.test plus __.checkFormat pass.

He-Pin and others added 4 commits May 13, 2026 15:43
Motivation:
PR databricks#840 introduced a strict JSON fast path for .json imports but still
forces a full UTF-8 string decode for every cached file before handing
the text to ujson.StringParser. Real-world workloads (e.g. kube-prometheus)
import many .json files; decoding each one twice (once into String for
parsing, again as cache content) is pure overhead.

Key Design Decision:
ujson 4.4.3 ships ByteArrayParser, which parses UTF-8 JSON directly from
a byte array without an intermediate String. Cache small resolved files
as raw bytes (already what we read from disk) and lazily decode text
only when the importstr/parser-input path actually needs it. Preserve
parse-cache content identity by hashing the cached bytes with SHA-256
(length + hex digest) so external ParseCache implementations keep the
same collision resistance as the old full-string key.

Modification:
* Importer.scala: CachedResolver.parseJsonImport now calls
  ujson.ByteArrayParser.transform(content.readRawBytes(), visitor)
  instead of decoding the whole file to String first.
* CachedResolvedFile.scala (JVM/Native): small files are cached as
  Array[Byte]; getParserInput / readString materialize the String
  lazily; readRawBytes returns the cached bytes directly; contentHash
  is length + SHA-256 over the cached bytes; binary imports still use
  StaticBinaryResolvedFile.
* PreloaderTests.scala: tighten the strict-JSON fast-path coverage so
  it fails if the fast path ever falls back to readString().

Result:
* Output equality vs upstream sjsonnet and jrsonnet preserved on
  kube-prometheus and large_string_template.
* Native kube-prometheus hyperfine A/B (forward & reverse):
  clean 139.4 +/- 2.8 ms -> candidate 132.7 +/- 1.9 ms (forward)
  candidate 132.1 +/- 1.9 ms vs clean 140.3 +/- 2.6 ms (reverse)
* Full ./mill __.test green.

References:
Follow-up to databricks#840
Motivation:
Large inline objects produced by strict JSON imports can exceed the small-object shape that computeSortedInlineOrder was originally tuned for. Native sampling on kube-prometheus showed sorted inline-order computation as a materialization hotspot, and insertion sort becomes quadratic on those wider objects.

Modification:
Keep insertion sort for small inline objects, and use an in-place quicksort with insertion-sort cleanup for larger visible field sets. Record the accepted benchmark result and rejected parser/key-render micro-routes in the performance ledgers.

Result:
Kube-prometheus Native A/B improved on top of strict JSON byte imports, with forward mean 145.3ms -> 140.0ms and reverse mean 151.6ms -> 148.9ms. Formatting and the full test suite pass.

References:
Upstream-base: databricks/sjsonnet@cedc083
Prior optimization: 883fca5 perf: parse strict JSON imports from bytes
Motivation:
Keep the performance exploration ledger current so future optimization work does not repeat Native-negative or build-invalid routes.

Modification:
Record rejected short-string, ASCII-safe, inline sort-cache, path-only parse-cache, and Native GC configuration probes with the validation evidence that ruled them out.

Result:
No runtime code changes are retained; the branch documents the failed hypotheses and preserves the current accepted optimization stack.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Motivation:
large_string_template still spent time re-encoding and re-scanning the huge string produced by simple named format interpolation, even when the final result was known to be JSON-string ASCII-safe.

Modification:
Track whether compiled format literals are ASCII-safe and return Val.Str.asciiSafe from PartialApplyFmt when every simple named dynamic value is also safe. Add regression coverage for safe numeric values, unsafe string values, unsafe static literals, and mixed-key safety.

Result:
Native large_string_template improved in both command orders (8.64 -> 8.01 ms forward, 8.65 -> 8.17 ms reverse); JVM JMH stayed neutral-positive (0.683 -> 0.677 ms/op); full __.test and checkFormat pass.

References:
bench/reports/sjsonnet-vs-jrsonnet-gaps.md

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant