perf: chunk long string byte escaping#809
Merged
stephenamar-db merged 2 commits intodatabricks:masterfrom May 7, 2026
Merged
Conversation
Motivation: Split the JMH-positive long-string rendering piece out of databricks#776 without carrying over the broader Scala Native render-pipeline experiment. Modification: - Add CharSWAR.findFirstEscapeChar for byte arrays on JVM, JS, and Native. - Keep the existing UTF-8 byte array for long strings, but locate escape bytes and copy clean chunks with System.arraycopy. - Escape only the matching bytes inline. - Precompute the exact escaped output length before writing dirty strings so ByteBuilder does not grow repeatedly. Result: This keeps the change JDK17/JIT/GC friendly: straight byte-array loops, no internal JDK APIs, no extra temporary arrays beyond the existing UTF-8 encoding, and no regression on clean long strings.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation:
Split the JMH-positive, JDK17/JIT/GC-friendly long-string rendering piece out of #776. Keep this PR focused on byte rendering for long strings that contain JSON escapes; this does not include the broader format, stdlib, compareStrings, or Scala Native experiments from #776.
Modification:
CharSWAR.findFirstEscapeChar(byte[], from, to)on JVM, Scala.js, and Scala Native.BaseByteRenderer, keep the existing UTF-8 byte array for long strings, locate escape bytes, bulk-copy clean chunks withSystem.arraycopy, and escape only matching bytes inline.ByteBuilderonce, then write directly to the backing byte array. This removes repeatedensureLength/appendUnsafeCcalls from the dirty long-string loop.\u00XXcontrol escapes.JIT / GC shape:
whileloops,System.arraycopy, and small private helpers.large_string_templateandlarge_string_joinJMH.Notable results only:
JMH target run, same machine, same command shape on
upstream/masterand this branch:./mill -i bench.runRegressions bench/resources/cpp_suite/large_string_template.jsonnet bench/resources/cpp_suite/large_string_join.jsonnetlarge_string_templateScala Native hyperfine, release-full native binary, 20 runs:
large_string_templatelarge_string_joinwas rechecked as a guardrail and stayed neutral, so it is intentionally omitted from the result tables.Verification:
./mill -i 'sjsonnet.jvm[3.3.7].compile'./mill -i 'sjsonnet.jvm[3.3.7].test'./mill -i 'sjsonnet.js[3.3.7].compile' 'sjsonnet.native[3.3.7].compile'./mill -i 'sjsonnet.native[3.3.7].nativeLink'./mill -i __.checkFormatgit diff --checkReferences:
b4c667d55d82d7c50c2103db967c33bebb0c2c98ff70b63e