chore(audit): audit misc expressions across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1#4474
Open
andygrove wants to merge 1 commit into
Open
chore(audit): audit misc expressions across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1#4474andygrove wants to merge 1 commit into
andygrove wants to merge 1 commit into
Conversation
…, 4.1.1 Add per-version audit sub-bullets to `monotonically_increasing_id`, `rand`, `randn`, `spark_partition_id`, and `user` in `docs/source/contributor-guide/spark_expressions_support.md`. `MonotonicallyIncreasingID`, `SparkPartitionID`, and `CurrentUser` are byte-for-byte identical across all four versions (`CurrentUser` is `Unevaluable` and resolved to a string literal by the analyzer, so no Comet serde is needed). `Rand` and `Randn` are refactored in Spark 4.0 (the `RDG` abstract class becomes a trait, with a new `NondeterministicUnaryRDG` base, and `expressionToSeed` rejects non-literal seeds at analysis time) with no runtime behaviour change. Apply one support-level consistency fix surfaced by the audit: - `CometRand` / `CometRandn`: lift the non-literal-seed fallback out of `convert` (where it was silently returning `None`) and into `getSupportLevel`, via a shared `nonLiteralSeedReason` constant on the new `CometRandCommonSerde` base. `getUnsupportedReasons` now documents the restriction. No correctness divergences were found, so no tracking issues are filed for this category.
This was referenced May 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #.
Rationale for this change
Continuation of the per-category expression audit. Same pattern as #4473 (collection), #4470 (json), #4469 (struct), #4461 (string), using the updated
audit-comet-expressionskill in #4468 (now also covers Spark 4.1.1).What changes are included in this PR?
Support-doc audit notes
Add per-version audit sub-bullets to
monotonically_increasing_id,rand,randn,spark_partition_id, anduserindocs/source/contributor-guide/spark_expressions_support.md.MonotonicallyIncreasingID,SparkPartitionID, andCurrentUserare byte-for-byte identical across all four versions.CurrentUserisUnevaluableand resolved to a string literal by the analyzer'sResolveCurrentLikerule before Comet sees the plan, so no Comet serde is needed foruser.RandandRandnare refactored in Spark 4.0 (theRDGabstract class becomes a trait, with a newNondeterministicUnaryRDGbase, andExpressionWithRandomSeed.expressionToSeedrejects non-literal seeds at analysis time withQueryCompilationErrors.invalidRandomSeedParameter) with no runtime behaviour change. 4.1.1 is identical to 4.0.Support-level consistency fix (in
nondetermenistic.scala)CometRand/CometRandn: lift the non-literal-seed fallback out ofconvert(where it was silently returningNone) and intogetSupportLevel, via a sharednonLiteralSeedReasonconstant on the newseedExprOfhook onCometRandCommonSerde.getUnsupportedReasonsnow documents the restriction. Pre-4.0 Spark would silently fail at runtime for a column-reference seed; 4.0+ rejects at analysis time.Tracking issues filed for follow-up
None. No correctness divergences were found.
Audit process
Audited directly using the
audit-comet-expressionskill (4 Spark versions per #4468). Five small/identical serdes, no parallel subagents needed.How are these changes tested?
./mvnw test -Dsuites="org.apache.comet.CometExpressionSuite rand expression" -Dtest=none(1 test passes)./mvnw test -Dsuites="org.apache.comet.CometExpressionSuite randn expression" -Dtest=none(1 test passes)./mvnw test -Dsuites="org.apache.comet.CometExpressionSuite spark_partition_id" -Dtest=none(1 test passes)./mvnw test -Dsuites="org.apache.comet.CometExpressionSuite monotonically_increasing_id" -Dtest=none(1 test passes)make coresucceeds with the serde change.