[SPARK-56745][SQL] Cache foldable ZoneId in ConvertTimezone to avoid per-row lookup#55714
Open
TongWei1105 wants to merge 1 commit intoapache:masterfrom
Open
[SPARK-56745][SQL] Cache foldable ZoneId in ConvertTimezone to avoid per-row lookup#55714TongWei1105 wants to merge 1 commit intoapache:masterfrom
TongWei1105 wants to merge 1 commit intoapache:masterfrom
Conversation
…per-row lookup
### What changes were proposed in this pull request?
`ConvertTimezone` now resolves foldable source/target timezone arguments once
(in `@transient lazy val` for the interpreted path, in `addMutableState` for
the codegen path) instead of calling `DateTimeUtils.getZoneId` on every row.
A new `ZoneId`-typed overload of `DateTimeUtils.convertTimestampNtzToAnotherTz`
is added so the cached zones can be passed through directly.
### Why are the changes needed?
The typical call site is `convert_timezone('UTC', 'America/Los_Angeles', ts_col)`
where both timezone arguments are constant literals. Previously every row went
through `getZoneId` twice -- zone-id normalization plus a `ZoneRulesProvider`
map lookup, which is not free even when the resulting `ZoneId` is cached. The
codegen paths of sibling expressions `FromUTCTimestamp` / `ToUTCTimestamp`
already cache the foldable `ZoneId` via `addMutableState`; this brings
`ConvertTimezone` in line.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
`DateExpressionsSuite` (which exercises both interpreted and codegen paths via
`checkEvaluation`, plus `checkConsistencyBetweenInterpretedAndCodegen` across
all foldable timezone pairs) -- all 76 tests pass. A new test
`SPARK-56745: convert_timezone with non-foldable timezone arguments` covers
the fallback path where source/target timezones are resolved per row,
complementing the foldable-cache path covered by SPARK-37552.
### Was this patch authored or co-authored using generative AI tooling?
Yes, drafted with assistance from Claude Code (Anthropic).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
ConvertTimezonenow resolves foldable source/target timezone arguments once (in@transient lazy valfor the interpreted path, inaddMutableStatefor the codegen path) instead of callingDateTimeUtils.getZoneIdon every row. A newZoneId-typed overload ofDateTimeUtils.convertTimestampNtzToAnotherTzis added so the cached zones can be passed through directly.Why are the changes needed?
The typical call site is
convert_timezone('UTC', 'America/Los_Angeles', ts_col)where both timezone arguments are constant literals. Previously every row went throughgetZoneIdtwice -- zone-id normalization plus aZoneRulesProvidermap lookup, which is not free even when the resultingZoneIdis cached. The codegen paths of sibling expressionsFromUTCTimestamp/ToUTCTimestampalready cache the foldableZoneIdviaaddMutableState; this bringsConvertTimezonein line.Does this PR introduce any user-facing change?
No.
How was this patch tested?
DateExpressionsSuite(which exercises both interpreted and codegen paths viacheckEvaluation, pluscheckConsistencyBetweenInterpretedAndCodegenacross all foldable timezone pairs) -- all 76 tests pass. A new testSPARK-56745: convert_timezone with non-foldable timezone argumentscovers the fallback path where source/target timezones are resolved per row, complementing the foldable-cache path covered by SPARK-37552.Was this patch authored or co-authored using generative AI tooling?
Yes, drafted with assistance from Claude Code (Anthropic).