perf: Large string template (`%` operator) is 2.71× slower than jrsonnet

Tracking issue for a specific perf gap found while comparing sjsonnet (native, master) against jrsonnet (master). Parent comparison: #666. **Biggest single gap in the comparison** — worth prioritizing.

## Observation

Large string template (`%` format operator on a multi-KB text block) is **2.90× slower** than jrsonnet.

Scenario: `bench/resources/cpp_suite/large_string_template.jsonnet` — applies `|||...||| % { x: 3 }` on a ~7.8k-line text block of mostly ASCII.

| | mean | min |
|---|---:|---:|
| sjsonnet (native) | 11.3 ± 0.7 ms | 10.5 ms |
| jrsonnet | 3.9 ± 0.7 ms | 3.0 ms |

Repro:

```bash
hyperfine --warmup 2 --runs 10 -N \
  "sjsonnet bench/resources/cpp_suite/large_string_template.jsonnet" \
  "jrsonnet bench/resources/cpp_suite/large_string_template.jsonnet"
```

## Code

Two hot paths:
1. `sjsonnet/src/sjsonnet/Format.scala` — `%` operator builds the formatted string char-by-char into a `StringBuilder`.
2. `sjsonnet/src/sjsonnet/BaseByteRenderer.scala:309-348` — `visitLongString` renders the final string into JSON. Calls `str.getBytes(UTF_8)`, runs SWAR `findFirstEscapeChar`, then copies chunks between escapes.

Since `x` has only one occurrence and the template contains mostly literal text with sparse `\n`, the format engine is essentially a giant memcpy — jrsonnet manages this with roughly zero copies.

## Hypothesis

- **Double conversion:** jsonnet string is UTF-16 `String`. `Format.scala` builds into `StringBuilder` (UTF-16). Then JSON render does `str.getBytes(UTF_8)` — a full UTF-8 encode pass. That's the conversion cost #779 describes, paid once on an ~N KB output.
- **Format engine scans every character** even when there are no format specifiers in a long literal run.
- **Large string literal parse/alloc:** the `|||...|||` block is a ~600 KB literal. Parser allocates it once, but if the format engine then concatenates the unchanged literal text into a new `StringBuilder`, that's an extra allocation.

## Directions

- **Short-term:** In `Format.scala`, detect long literal runs between format specifiers and use `StringBuilder.append(String, start, end)` (which avoids per-char virtual dispatch) or bulk `arraycopy`.
- **Medium-term:** When `Val.Str` is `asciiSafe` (tracked via `Val.Str.asciiSafe`), skip the `getBytes(UTF_8)` in `BaseByteRenderer.visitLongString` and reuse the char-to-byte fast path already used by `renderAsciiSafeString`. This is the single biggest lever against the real-world kube-prometheus gap (which also emits large manifests of mostly-ASCII strings).
- **Longer-term:** Consider a byte-backed `Val.Str` variant for pre-decoded strings read from disk or already known to be ASCII/UTF-8 bytes — avoids the UTF-16 round-trip entirely. Overlaps with #779.

Part of the jrsonnet-parity effort tracked in #666.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Large string template (`%` operator) is 2.71× slower than jrsonnet #847

Observation

Code

Hypothesis

Directions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	mean	min
sjsonnet (native)	11.3 ± 0.7 ms	10.5 ms
jrsonnet	3.9 ± 0.7 ms	3.0 ms

perf: Large string template (% operator) is 2.71× slower than jrsonnet #847

Description

Observation

Code

Hypothesis

Directions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

perf: Large string template (`%` operator) is 2.71× slower than jrsonnet #847