[SPARK-56983][SQL] DecimalAggregates widened-Cast peel must preserve query semantics#56036
Open
cloud-fan wants to merge 3 commits into
Open
[SPARK-56983][SQL] DecimalAggregates widened-Cast peel must preserve query semantics#56036cloud-fan wants to merge 3 commits into
cloud-fan wants to merge 3 commits into
Conversation
…query semantics
The SPARK-56627 widened-Cast peel changed observable query semantics in two
ways. Fix both so the rewrite never diverges from the un-rewritten form
beyond what the existing un-widened arms already accept.
* SUM arm: replace
Cast(MakeDecimal(_, p + 10, s), bounded(pPrime + 10, s))
with
MakeDecimal(_, min(pPrime + 10, MAX_PRECISION), s).
The merged form narrowed MakeDecimal's overflow check to 10^(p+10),
rejecting long-fit sums in (10^(p+10), 10^min(pPrime+10, 38)) that the
un-rewritten expression would have accepted. For target precision > 18
Decimal.setOrNull falls into the BigDecimal branch and never rejects a
Long, so the cleaner form preserves the original overflow boundary.
* AVG arm: change the guard from `p <= 7` to `pPrime + 4 <= 15`. The merged
guard converted exact-Decimal AVG into Double-regime for pPrime > 11,
introducing rounding divergences. The new guard only fires inside the
existing AVG fast-path envelope, so cases the un-widened arm wouldn't
optimize stay on the slow-but-exact path.
TPC-DS q18 AVG aggregations (pPrime = 12) revert to the un-peeled form
under the new bound; plan-stability files regenerated accordingly.
Co-authored-by: Isaac
- DecimalAggregatesBenchmark.scala header: replace AVG_PEEL_MAX_INNER_PRECISION
reference with the actual `pPrime + 4 <= MAX_DOUBLE_DIGITS` guard; split the
case-design rationale per section so Section B is described accurately
("p+1 boundary plus pPrime upper bound", not "p+10-class wider").
- DecimalAggregatesSuite.scala: rename the (F1 guard) test and reword its
comment block so it stands on its own without the F-prefixed shorthand
carried over from the original SPARK-56627 PR.
Co-authored-by: Isaac
…peel - Couple SUM MakeDecimal target type directly to `DecimalType.bounded(pPrime + 10, s_scale)` instead of restating its precision-clamp formula, so the link to `Sum.resultType` is explicit. - Drop the now-tautological `retryUntil(...)` from the SUM PBT `peelGen`; the `Gen.choose` bounds already enforce the peel-firing predicate. - Refresh the stale `DecimalType.bounded(pPrime+4, s+4)` comment in the AVG numerical-equivalence PBT to match the rewrite's plain `DecimalType(...)`. Co-authored-by: Isaac
Contributor
Author
yaooqinn
approved these changes
May 21, 2026
Member
yaooqinn
left a comment
There was a problem hiding this comment.
Both semantics claims ground out:
- SUM arm: old
Cast(MakeDecimal(_, p+10, s), bounded(pPrime+10, s))narrows the long-fit acceptance window viaDecimal.setOrNull(unscaled, p+10, s)(Decimal.scala L96, long branch). ForpPrime+10 > 18un-rewritten accepts up to10^min(pPrime+10, 38); the newMakeDecimal(_, min(pPrime+10, 38), s)falls into the BigDecimal branch and matches that boundary. - AVG arm: old guard
p <= 7could fire when un-rewrittenpPrime + 4 > MAX_DOUBLE_DIGITSwas Decimal-exact, switching the result regime. New guardpPrime + 4 <= MAX_DOUBLE_DIGITSkeeps the rewrite inside the un-rewritten Double envelope. q18 plan-stability diff confirms the fourdecimal(7,2)→(12,2)AVG sites correctly revert to the un-peeled form.
Pattern ordering preserved (widened arm before un-widened, both Sum and Average); MakeDecimal nullOnOverflow path unchanged.
dongjoon-hyun
approved these changes
May 21, 2026
peter-toth
approved these changes
May 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Two semantic-preserving fixes to the SPARK-56627 widened-Cast peel in
DecimalAggregates:SUM arm: Replace
Cast(MakeDecimal(_, p + 10, s), bounded(pPrime + 10, s))with
MakeDecimal(_, min(pPrime + 10, 38), s). The merged form's innerMakeDecimalnarrowed the overflow check to10^(p+10), where theun-rewritten
SUM(Cast(x, dec(pPrime, s)))accepted up to10^min(pPrime+10, 38). ForpPrime + 10 > 18,Decimal.setOrNullfallsinto the BigDecimal branch and never rejects a Long, so the cleaner form
preserves the original overflow boundary for all Long-fit sums.
AVG arm: Change the guard from
p <= 7(
AVG_PEEL_MAX_INNER_PRECISION) topPrime + 4 <= 15(
MAX_DOUBLE_DIGITS). The merged guard switchedAVG(CAST(x AS DECIMAL(pPrime, s)))from full Decimal arithmetic to Double-regime whenever
p <= 7, includingthe
pPrime > 11band where un-rewritten was Decimal-exact. The new guardensures the rewrite only fires inside the existing AVG fast-path's Double
envelope.
Why are the changes needed?
A query rewrite should preserve the query's observable semantics. The
SPARK-56627 rule changed semantics in two ways:
MakeDecimal(_, p + 10, s)could return null (non-ANSI) orthrow (ANSI) for long-fit sums in `(10^(p+10), 10^min(pPrime+10, 38))`.
Example: `SUM(CAST(x AS DECIMAL(15, 2)))` where `x: DECIMAL(5, 2)`
rejected at `10^15` instead of `10^25` -- reachable around `~10^10`
rows of small-precision input.
computation into Double-regime, visible as last-digit rounding differences
at any input size. TPC-DS q18 (`p=7, pPrime=12`) was affected.
Does this PR introduce any user-facing change?
Yes -- restores the un-rewritten semantics for affected queries:
that the un-rewritten expression would have accepted.
is no longer peeled -- the un-rewritten Decimal-exact path is preserved.
TPC-DS q18 AVG aggregations revert to the un-peeled form (visible in the
regenerated `q18` plan-stability files).
How was this patch tested?
assertions, the property-test invariants for the new MakeDecimal-only
form, and the new AVG bound boundary tests.
under the new AVG generator domain (`pPrime in [p+1, 11]`).
(the new bound). Result files left stale relative to the code change --
the `benchmark.yml` GHA workflow can regenerate them.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.7