[SPARK-56909][SQL] Simplify Cast to int/long codegen under ANSI mode#55934
Open
gengliangwang wants to merge 4 commits into
Open
[SPARK-56909][SQL] Simplify Cast to int/long codegen under ANSI mode#55934gengliangwang wants to merge 4 commits into
gengliangwang wants to merge 4 commits into
Conversation
Member
Author
Stack overview (SPARK-56908 umbrella)This PR is part of a stack of 8 PRs against SPARK-56908. Order:
PRs 1-4 are linearly stacked on each other (each branch is based on the previous one). PR 5 (decimal arithmetic) is stacked on top of PR 3 (cast decimal) since it uses |
This was referenced May 17, 2026
e10ad7a to
8eba972
Compare
### What changes were proposed in this pull request? Introduce `CastUtils.java` and use it from `Cast.scala` to collapse the multi-line ANSI overflow-check codegen for casts that target `int` and `long` into one-line static-method calls. Source and target `DataType` constants used in the overflow error message live as `private static final` fields on the helper class, so the happy path performs no per-row `references[]` lookups. Helpers added: * `longToIntExact(long)` for narrowing `long -> int`. * `floatToIntExact(float)`, `doubleToIntExact(double)` for fractional -> int. * `floatToLongExact(float)`, `doubleToLongExact(double)` for fractional -> long. `Cast.scala` changes: * `castIntegralTypeToIntegralTypeExactCode` and `castFractionToIntegralTypeCode` dispatch on the target type: `int` (and `long` for the fraction case) emit a `CastUtils.<...>Exact` call; byte/short targets keep the inline body (refactored in SPARK-56910). * Eval paths for `castToInt` add ANSI `LongType` / `FloatType` / `DoubleType` cases, and `castToLong` adds `FloatType` / `DoubleType` cases, both delegating to the new helpers. ### Why are the changes needed? Part of SPARK-56908. The current ANSI cast codegen emits 5-line inline overflow blocks per call site. Multiplied across the many cast paths in a TPC-DS plan, this contributes meaningfully to the generated source size and to Janino compile time, and pushes whole-stage methods closer to the 64KB JVM method limit. ### Does this PR introduce _any_ user-facing change? No. The compiled behavior is identical; only the emitted Java source text changes. ### How was this patch tested? `build/sbt "catalyst/testOnly *CastSuite *CastWithAnsiOnSuite *CastWithAnsiOffSuite *AnsiCastSuite *TryCastSuite *ExpressionClassIdentitySuite"` — 312/312 pass. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor 1.x
8eba972 to
7209218
Compare
…Numeric/DoubleExactNumeric directly, remove CastUtils.java
…bleExactNumeric in codegen
gengliangwang
commented
May 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
In
Cast.scala, the ANSI codegen for narrowing casts toint/longpreviously emitted a 5-line inline body per call site (bounds check + cast + throw). After this PR it emits a single static call into the existingLongExactNumeric/FloatExactNumeric/DoubleExactNumericobjects innumerics.scala, which already implement the same overflow check +castingCauseOverflowErrorthrow that this codegen needs.The rewrite uses the same
getClass.getCanonicalName.stripSuffix("$")pattern as the adjacentMathUtils/IntervalMathUtilscalls. The Scala compiler emitspublic staticforwarders on the companion class of top-level objects, so generated Java code can call e.g.org.apache.spark.sql.types.LongExactNumeric.toInt(v)directly.Touched
Cast.scalahelpers:castIntegralTypeToIntegralTypeExactCode: theinttarget branch now emitsLongExactNumeric.toInt($c)(byte/short narrowing stays inline; refactored in SPARK-56910).castFractionToIntegralTypeCode: theint/longtarget branches now emitFloatExactNumeric/DoubleExactNumerictoInt/toLong(byte/short narrowing stays inline; refactored in SPARK-56910).Primitive widening branches and the non-ANSI paths are untouched.
Why are the changes needed?
Part of SPARK-56908 (umbrella). The narrow-cast ANSI branches in
Cast.doGenCodeare some of the longer inline bodies still emitted per call site. Multiplied across the many cast paths in a TPC-DS plan, they contribute meaningfully to the generated source size and Janino compile time, and push whole-stage methods closer to the 64KB JVM method limit.Compared to v1 of this PR (which added a new
CastUtils.javawithlongToIntExact/floatToIntExact/ etc.), this version calls the existingLongExactNumeric.toInt/FloatExactNumeric.toInt/toLong/DoubleExactNumeric.toInt/toLongdirectly. Those are public static forwarders on top-level Scala objects that already implement the samecastingCauseOverflowError(v, FROM, TO)throw — no new helper class needed. (Applying the same lesson cloud-fan called out on #55938.)Does this PR introduce any user-facing change?
No.
How was this patch tested?
307/307 pass.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor 1.x