chore(audit): audit math expressions across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1#4486
Open
andygrove wants to merge 1 commit into
Open
chore(audit): audit math expressions across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1#4486andygrove wants to merge 1 commit into
andygrove wants to merge 1 commit into
Conversation
…, 4.1.1 Add per-version audit sub-bullets to all supported math SQL function names (around 60). Also add the 4.1.1 audit line to the already-audited `factorial`. `rand`/`randn` cross-reference the misc audit (apache#4474), and `shiftleft` cross-references the bitwise audit (apache#4479). Highlights from the cross-version review: - All trig and exp/log helpers are byte-for-byte stable across the four Spark versions; the only systemic change in 4.0 is the `NullIntolerant` trait -> `nullIntolerant: Boolean` field refactor. - 4.0 also adds the `DefaultStringProducingExpression` trait on `Bin` / `Hex` / `Unhex` and widens their `StringType` inputs to `StringTypeWithCollation`. - `UnaryPositive` becomes `RuntimeReplaceable` in 4.0; in 3.4/3.5 `+col` silently disables Comet for the enclosing projection because there is no `CometUnaryPositive` serde. - All arithmetic ops thread `EvalMode` through, but `CometRemainder` rejects `EvalMode.TRY`, so `try_mod` on Spark 4.0+ falls back (filed as apache#4484). - `Round` falls back for Float/Double because BigDecimal-via-toString rounding cannot be precisely matched. - `width_bucket` (Spark 3.5+) is wired via per-version `CometExprShim` rather than a `CometExpressionSerde`, bypassing the support-level framework (filed as apache#4485). - `hex` / `unhex` do not propagate Spark 4.x collation; covered by the umbrella apache#2190. The audit was driven by 4 parallel agents covering arithmetic ops, trig + constants, log/exp/power/rounding, and misc (bin/hex/etc). Tracking issues filed for the gaps found: - apache#4484 `try_mod` falls back because `CometRemainder` rejects `EvalMode.TRY`. - apache#4485 `width_bucket` bypasses `CometExpressionSerde` framework. No serde changes are included; the support-level antipattern fixes called out in the audit (mainly lifting convert-time `withInfo` fallbacks in `arithmetic.scala` into `getSupportLevel`) are deliberately out of scope for this doc-focused PR and can ride in follow-ups.
This was referenced May 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #.
Rationale for this change
Continuation of the per-category expression audit. Same pattern as #4483 (array), #4480 (predicate), #4479 (bitwise), #4478 (map), #4476 (hash), #4475 (conditional), #4474 (misc), #4473 (collection), #4470 (json), #4469 (struct), using the updated
audit-comet-expressionskill in #4468.What changes are included in this PR?
Support-doc audit notes
Add per-version audit sub-bullets to all 60 supported math SQL function names (
%,*,+,-,/,abs, all trig and inverse-trig, all log/exp/pow/sqrt/cbrt,bin,hex,unhex,greatest,least,ceil/ceiling/floor/round/rint/sign/signum,div/mod/negative/positive/pi,try_add/try_divide/try_multiply/try_subtract,width_bucket). Add 4.1.1 audit line to the already-auditedfactorial.rand/randncross-reference the misc audit (#4474);shiftleftcross-references the bitwise audit (#4479).Highlights from the cross-version review:
NullIntoleranttrait ->nullIntolerant: Booleanfield refactor.DefaultStringProducingExpressiontrait onBin/Hex/Unhexand widens theirStringTypeinputs toStringTypeWithCollation.UnaryPositivebecomesRuntimeReplaceablein 4.0; in 3.4/3.5+colsilently disables Comet for the enclosing projection because there is noCometUnaryPositiveserde.CometRemainderrejectsEvalMode.TRY, sotry_modon Spark 4.0+ falls back (filed as [Bug] try_mod falls back to Spark because CometRemainder rejects EvalMode.TRY #4484).Roundfalls back for Float / Double because BigDecimal-via-toString rounding cannot be precisely matched.width_bucket(Spark 3.5+) is wired via per-versionCometExprShimrather than aCometExpressionSerde, bypassing the support-level framework (filed as [Bug] width_bucket bypasses CometExpressionSerde framework #4485).hex/unhexdo not propagate Spark 4.x collation; covered by the umbrella [Spark 4.0] Add string collation support #2190.Support-level consistency fixes
None in this PR. The audit surfaced several
convert-timewithInfofallbacks inarithmetic.scala(CometAdd/CometSubtract/CometMultiply/CometDivide/CometIntegralDivide/CometRemainder/CometUnaryMinus/CometRoundrely on the defaultCompatible(None)and bail out fromconvertfor several cases). Lifting them intogetSupportLevel/getUnsupportedReasonsis mechanical but touches an unusually load-bearing file; deliberately deferred to follow-ups to keep this audit PR low-risk.Tracking issues filed for follow-up
try_modfalls back to Spark becauseCometRemainderrejectsEvalMode.TRY.width_bucketbypasses theCometExpressionSerdeframework.Audit process
Audited using the
audit-comet-expressionskill (4 Spark versions per #4468), driven by 4 parallel agents covering arithmetic ops, trig + constants, log/exp/power/rounding, and misc (bin/hex/etc).How are these changes tested?
make coresucceeds (no code changes; doc only).CometExpressionSuite,CometMathExpressionSuite,CometSqlFileTestSuite expressions/math/, andexpressions/string/{hex,unhex}.sqlremains unchanged.