Skip to content

chore(audit): audit collection expressions across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1#4473

Open
andygrove wants to merge 1 commit into
apache:mainfrom
andygrove:worktree-audit-collection-funcs
Open

chore(audit): audit collection expressions across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1#4473
andygrove wants to merge 1 commit into
apache:mainfrom
andygrove:worktree-audit-collection-funcs

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #.

Rationale for this change

Continuation of the per-category expression audit. Same pattern as #4470 (json), #4469 (struct), #4461 (string), and earlier audits, using the updated audit-comet-expression skill in #4468 (now also covers Spark 4.1.1).

What changes are included in this PR?

Support-doc audit notes

Add per-version audit sub-bullets to concat, reverse, and size in docs/source/contributor-guide/spark_expressions_support.md. Spark Concat and Reverse widen StringType inputs to StringTypeWithCollation(supportsTrimCollation = true) in 4.0; Size is byte-for-byte identical across all four versions.

Support-level consistency fix (in strings.scala)

  • CometConcat: relabel the non-StringType branch from Incompatible(Some(...)) to Unsupported(Some(...)). Spark accepts BinaryType and ArrayType, but Comet has no native path for either, so the user-observable effect is a fallback, not a wrong result. The reason string is now exposed via getUnsupportedReasons (rather than getIncompatibleReasons), and the constant is now private val for parity with other serdes.

Tracking issues filed for follow-up

The existing #2763 already covers reverse on array<binary> and is referenced from the doc.

Audit process

Audited directly using the audit-comet-expression skill (4 Spark versions per #4468). One backing serde per function, no parallel subagents needed.

How are these changes tested?

  • ./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite string/concat" -Dtest=none (2 tests pass)
  • ./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite array/array_concat" -Dtest=none (1 test passes; uses the existing expect_fallback(CONCAT supports only string input parameters) directive — the reason text is preserved by the relabel, so substring matching still holds)
  • ./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite array/size" -Dtest=none (1 test passes)
  • ./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite string/reverse" -Dtest=none (1 test passes)
  • make core succeeds with the serde change.

… 4.0.1, 4.1.1

Add per-version audit sub-bullets to `concat`, `reverse`, and `size`
in `docs/source/contributor-guide/spark_expressions_support.md`. Spark
`Concat` and `Reverse` widen StringType inputs to
`StringTypeWithCollation` in 4.0; `Size` is byte-for-byte identical
across all four versions.

Apply one support-level consistency fix surfaced by the audit:

- `CometConcat`: relabel the non-`StringType` branch from
  `Incompatible(Some(...))` to `Unsupported(Some(...))`. Spark
  accepts `BinaryType` and `ArrayType`, but Comet has no native
  path for either, so the user-observable effect is a fallback, not a
  wrong result. The reason string is now exposed via
  `getUnsupportedReasons` (rather than `getIncompatibleReasons`)
  and the constant is now `private val` for parity with other serdes.

Tracking issues filed for the gaps found:

- apache#4471 concat for BinaryType and ArrayType inputs (feature)
- apache#4472 size for MapType inputs (feature)

Existing apache#2763 covers `reverse` on array<binary>.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant