Skip to content

perf: Large string template (% operator) is 2.71× slower than jrsonnet #847

@He-Pin

Description

@He-Pin

Tracking issue for a specific perf gap found while comparing sjsonnet (native, master) against jrsonnet (master). Parent comparison: #666. Biggest single gap in the comparison — worth prioritizing.

Observation

Large string template (% format operator on a multi-KB text block) is 2.90× slower than jrsonnet.

Scenario: bench/resources/cpp_suite/large_string_template.jsonnet — applies |||...||| % { x: 3 } on a ~7.8k-line text block of mostly ASCII.

mean min
sjsonnet (native) 11.3 ± 0.7 ms 10.5 ms
jrsonnet 3.9 ± 0.7 ms 3.0 ms

Repro:

hyperfine --warmup 2 --runs 10 -N \
  "sjsonnet bench/resources/cpp_suite/large_string_template.jsonnet" \
  "jrsonnet bench/resources/cpp_suite/large_string_template.jsonnet"

Code

Two hot paths:

  1. sjsonnet/src/sjsonnet/Format.scala% operator builds the formatted string char-by-char into a StringBuilder.
  2. sjsonnet/src/sjsonnet/BaseByteRenderer.scala:309-348visitLongString renders the final string into JSON. Calls str.getBytes(UTF_8), runs SWAR findFirstEscapeChar, then copies chunks between escapes.

Since x has only one occurrence and the template contains mostly literal text with sparse \n, the format engine is essentially a giant memcpy — jrsonnet manages this with roughly zero copies.

Hypothesis

  • Double conversion: jsonnet string is UTF-16 String. Format.scala builds into StringBuilder (UTF-16). Then JSON render does str.getBytes(UTF_8) — a full UTF-8 encode pass. That's the conversion cost base64 encode/decode is ~6x slower than jrsonnet on large payloads #779 describes, paid once on an ~N KB output.
  • Format engine scans every character even when there are no format specifiers in a long literal run.
  • Large string literal parse/alloc: the |||...||| block is a ~600 KB literal. Parser allocates it once, but if the format engine then concatenates the unchanged literal text into a new StringBuilder, that's an extra allocation.

Directions

  • Short-term: In Format.scala, detect long literal runs between format specifiers and use StringBuilder.append(String, start, end) (which avoids per-char virtual dispatch) or bulk arraycopy.
  • Medium-term: When Val.Str is asciiSafe (tracked via Val.Str.asciiSafe), skip the getBytes(UTF_8) in BaseByteRenderer.visitLongString and reuse the char-to-byte fast path already used by renderAsciiSafeString. This is the single biggest lever against the real-world kube-prometheus gap (which also emits large manifests of mostly-ASCII strings).
  • Longer-term: Consider a byte-backed Val.Str variant for pre-decoded strings read from disk or already known to be ASCII/UTF-8 bytes — avoids the UTF-16 round-trip entirely. Overlaps with base64 encode/decode is ~6x slower than jrsonnet on large payloads #779.

Part of the jrsonnet-parity effort tracked in #666.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions