perf: skip builders for single-spec formats#833
Merged
stephenamar-db merged 1 commit intodatabricks:masterfrom May 11, 2026
Merged
perf: skip builders for single-spec formats#833stephenamar-db merged 1 commit intodatabricks:masterfrom
stephenamar-db merged 1 commit intodatabricks:masterfrom
Conversation
Motivation: Short format strings such as %08d or %20s have exactly one dynamic value and no static literal text, so the generic format path was allocating and appending through a StringBuilder only to return that single formatted value. Modification: Detect the single-spec/no-static-literal case in Format.format and return the computed formatted value directly after preserving all existing arity checks. Result: The repeat_format regression improves from 0.190 ms/op on upstream master to 0.133 ms/op locally, while large_string_template remains effectively neutral and the full Mill test matrix passes. References: Source idea: databricks#776
stephenamar-db
approved these changes
May 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation:
PR #776 showed that format-heavy workloads benefit when the format path avoids unnecessary intermediate assembly. This split keeps only the smallest safe idea: short format strings with exactly one specifier and no static literal text (for example
%08d,%010x,%-20s,%20s) do not need aStringBuilderafter the formatted value has already been computed.Key Design Decision:
Keep the existing generic formatting implementation and all validation/arity checks. The optimization only bypasses appending to
StringBuilderafter the single formatted value is known, so format semantics and error behavior stay unchanged.Modification:
specBits.length == 1 && parsed.staticChars == 0inFormat.format.StringBuilderfor that case.Benchmark Results:
JMH (
./mill -j 1 bench.runRegressions ..., ms/op lower is better; ops/ms higher is better):repeat_formatlarge_string_templateguardScala Native hyperfine (
hyperfine --warmup 10 --min-runs 50 -N, ms lower is better):repeat_formatlarge_string_templateguardAnalysis:
The target case is dominated by many short format expressions. Returning the already computed formatted string removes a redundant builder allocation/append path on the JVM. The guard case does not use this single-spec/no-static-literal path and remains effectively unchanged within benchmark noise.
References:
Result:
./mill -j 1 __.reformat && ./mill -j 1 __.testpassed locally.