Skip to content

perf: std.stripChars/lstripChars/rstripChars are 1.75× slower than jrsonnet #851

@He-Pin

Description

@He-Pin

Tracking issue for a specific perf gap found while comparing sjsonnet (native, master) against jrsonnet (master). Parent comparison: #666.

Observation

std.stripChars / std.lstripChars / std.rstripChars are collectively 1.58× slower than jrsonnet.

Scenario: bench/resources/cpp_suite/bench.09.jsonnet — strips a long ASCII string (~1 KB of 'e' + 'ok') with all three strip variants.

mean min
sjsonnet (native) 6.0 ± 0.6 ms 5.4 ms
jrsonnet 3.8 ± 0.7 ms 2.8 ms

Repro:

hyperfine --warmup 2 --runs 10 -N \
  "sjsonnet bench/resources/cpp_suite/bench.09.jsonnet" \
  "jrsonnet bench/resources/cpp_suite/bench.09.jsonnet"

Code

sjsonnet/src/sjsonnet/stdlib/StringModule.scala:270-420 — strip implementations. Scans char-by-char, checking each against the strip set (typically a String of chars to strip).

Hypothesis

  • The strip-set membership check is O(|strip_set|) per char on long inputs, unless already optimized.
  • Even when optimized, char-by-char iteration on a 1 KB input = ~1000 iterations × 3 variants.
  • jrsonnet works on UTF-8 &[u8] bytes with a bitmap-style ASCII check.

Directions

  • For ASCII-only strip sets (which cover the vast majority of real usage), build a 256-entry Array[Boolean] mask once and index it by byte. Should collapse the inner loop to a single load + compare.
  • If the input itself is asciiSafe (tracked on Val.Str), work on the byte array directly without String.charAt virtual dispatch.
  • Combine lstrip + rstrip in stripChars into a single pass from both ends instead of two full scans.

Part of the jrsonnet-parity effort tracked in #666.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions