chore(audit): audit misc expressions across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1 by andygrove · Pull Request #4474 · apache/datafusion-comet

andygrove · 2026-05-27T22:57:53Z

Which issue does this PR close?

Closes #.

Rationale for this change

Continuation of the per-category expression audit. Same pattern as #4473 (collection), #4470 (json), #4469 (struct), #4461 (string), using the updated audit-comet-expression skill in #4468 (now also covers Spark 4.1.1).

What changes are included in this PR?

Support-doc audit notes

Add per-version audit sub-bullets to monotonically_increasing_id, rand, randn, spark_partition_id, and user in docs/source/contributor-guide/spark_expressions_support.md.

MonotonicallyIncreasingID, SparkPartitionID, and CurrentUser are byte-for-byte identical across all four versions. CurrentUser is Unevaluable and resolved to a string literal by the analyzer's ResolveCurrentLike rule before Comet sees the plan, so no Comet serde is needed for user.
Rand and Randn are refactored in Spark 4.0 (the RDG abstract class becomes a trait, with a new NondeterministicUnaryRDG base, and ExpressionWithRandomSeed.expressionToSeed rejects non-literal seeds at analysis time with QueryCompilationErrors.invalidRandomSeedParameter) with no runtime behaviour change. 4.1.1 is identical to 4.0.

Support-level consistency fix (in `nondetermenistic.scala`)

CometRand / CometRandn: lift the non-literal-seed fallback out of convert (where it was silently returning None) and into getSupportLevel, via a shared nonLiteralSeedReason constant on the new seedExprOf hook on CometRandCommonSerde. getUnsupportedReasons now documents the restriction. Pre-4.0 Spark would silently fail at runtime for a column-reference seed; 4.0+ rejects at analysis time.

Tracking issues filed for follow-up

None. No correctness divergences were found.

Audit process

Audited directly using the audit-comet-expression skill (4 Spark versions per #4468). Five small/identical serdes, no parallel subagents needed.

How are these changes tested?

./mvnw test -Dsuites="org.apache.comet.CometExpressionSuite rand expression" -Dtest=none (1 test passes)
./mvnw test -Dsuites="org.apache.comet.CometExpressionSuite randn expression" -Dtest=none (1 test passes)
./mvnw test -Dsuites="org.apache.comet.CometExpressionSuite spark_partition_id" -Dtest=none (1 test passes)
./mvnw test -Dsuites="org.apache.comet.CometExpressionSuite monotonically_increasing_id" -Dtest=none (1 test passes)
make core succeeds with the serde change.

…, 4.1.1 Add per-version audit sub-bullets to `monotonically_increasing_id`, `rand`, `randn`, `spark_partition_id`, and `user` in `docs/source/contributor-guide/spark_expressions_support.md`. `MonotonicallyIncreasingID`, `SparkPartitionID`, and `CurrentUser` are byte-for-byte identical across all four versions (`CurrentUser` is `Unevaluable` and resolved to a string literal by the analyzer, so no Comet serde is needed). `Rand` and `Randn` are refactored in Spark 4.0 (the `RDG` abstract class becomes a trait, with a new `NondeterministicUnaryRDG` base, and `expressionToSeed` rejects non-literal seeds at analysis time) with no runtime behaviour change. Apply one support-level consistency fix surfaced by the audit: - `CometRand` / `CometRandn`: lift the non-literal-seed fallback out of `convert` (where it was silently returning `None`) and into `getSupportLevel`, via a shared `nonLiteralSeedReason` constant on the new `CometRandCommonSerde` base. `getUnsupportedReasons` now documents the restriction. No correctness divergences were found, so no tracking issues are filed for this category.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(audit): audit misc expressions across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1#4474

chore(audit): audit misc expressions across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1#4474
andygrove wants to merge 1 commit into
apache:mainfrom
andygrove:worktree-audit-misc-funcs

andygrove commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andygrove commented May 27, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Support-doc audit notes

Support-level consistency fix (in nondetermenistic.scala)

Tracking issues filed for follow-up

Audit process

How are these changes tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Support-level consistency fix (in `nondetermenistic.scala`)