chore(audit): audit collection expressions across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1#4473
Open
andygrove wants to merge 1 commit into
Open
chore(audit): audit collection expressions across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1#4473andygrove wants to merge 1 commit into
andygrove wants to merge 1 commit into
Conversation
… 4.0.1, 4.1.1 Add per-version audit sub-bullets to `concat`, `reverse`, and `size` in `docs/source/contributor-guide/spark_expressions_support.md`. Spark `Concat` and `Reverse` widen StringType inputs to `StringTypeWithCollation` in 4.0; `Size` is byte-for-byte identical across all four versions. Apply one support-level consistency fix surfaced by the audit: - `CometConcat`: relabel the non-`StringType` branch from `Incompatible(Some(...))` to `Unsupported(Some(...))`. Spark accepts `BinaryType` and `ArrayType`, but Comet has no native path for either, so the user-observable effect is a fallback, not a wrong result. The reason string is now exposed via `getUnsupportedReasons` (rather than `getIncompatibleReasons`) and the constant is now `private val` for parity with other serdes. Tracking issues filed for the gaps found: - apache#4471 concat for BinaryType and ArrayType inputs (feature) - apache#4472 size for MapType inputs (feature) Existing apache#2763 covers `reverse` on array<binary>.
This was referenced May 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #.
Rationale for this change
Continuation of the per-category expression audit. Same pattern as #4470 (json), #4469 (struct), #4461 (string), and earlier audits, using the updated
audit-comet-expressionskill in #4468 (now also covers Spark 4.1.1).What changes are included in this PR?
Support-doc audit notes
Add per-version audit sub-bullets to
concat,reverse, andsizeindocs/source/contributor-guide/spark_expressions_support.md. SparkConcatandReversewidenStringTypeinputs toStringTypeWithCollation(supportsTrimCollation = true)in 4.0;Sizeis byte-for-byte identical across all four versions.Support-level consistency fix (in
strings.scala)CometConcat: relabel the non-StringTypebranch fromIncompatible(Some(...))toUnsupported(Some(...)). Spark acceptsBinaryTypeandArrayType, but Comet has no native path for either, so the user-observable effect is a fallback, not a wrong result. The reason string is now exposed viagetUnsupportedReasons(rather thangetIncompatibleReasons), and the constant is nowprivate valfor parity with other serdes.Tracking issues filed for follow-up
[Feature] support concat() for BinaryType and ArrayType inputs— Spark accepts both, Comet only supportsStringTypenatively.[Feature] support size() for MapType inputs— Spark accepts bothArrayTypeandMapType, Comet only supportsArrayType.The existing #2763 already covers
reverseonarray<binary>and is referenced from the doc.Audit process
Audited directly using the
audit-comet-expressionskill (4 Spark versions per #4468). One backing serde per function, no parallel subagents needed.How are these changes tested?
./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite string/concat" -Dtest=none(2 tests pass)./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite array/array_concat" -Dtest=none(1 test passes; uses the existingexpect_fallback(CONCAT supports only string input parameters)directive — the reason text is preserved by the relabel, so substring matching still holds)./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite array/size" -Dtest=none(1 test passes)./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite string/reverse" -Dtest=none(1 test passes)make coresucceeds with the serde change.