Skip to content

chore(audit): audit map expressions across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1#4478

Open
andygrove wants to merge 1 commit into
apache:mainfrom
andygrove:worktree-audit-map-funcs
Open

chore(audit): audit map expressions across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1#4478
andygrove wants to merge 1 commit into
apache:mainfrom
andygrove:worktree-audit-map-funcs

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #.

Rationale for this change

Continuation of the per-category expression audit. Same pattern as #4476 (hash), #4475 (conditional), #4474 (misc), #4473 (collection), #4470 (json), #4469 (struct), using the updated audit-comet-expression skill in #4468.

What changes are included in this PR?

Support-doc audit notes

Add per-version audit sub-bullets to element_at, map_contains_key, map_entries, map_from_arrays, map_from_entries, map_keys, map_values, and str_to_map. Most of the category is byte-for-byte identical across the four versions. Spark 4.0 adds the usual collation widening (StringTypeNonCSAICollation on StringToMap) and the nullIntolerant: Boolean field refactor. Spark 4.1 adds a legacySplitTruncate flag to StringToMap.

Support-level consistency fixes (in maps.scala)

  • CometMapFromEntries: rephrase the BinaryType reasons to read consistently (`BinaryType` is not supported as a map key/value in `map_from_entries`) and backtick the type name and function name.
  • CometStrToMap: drop the no-op getSupportLevel override (it returned Compatible(None), which is also the default).

Update the affected test references in CometMapExpressionSuite (via the shared constants) and the expressions/map/map_from_entries.sql fallback substring matchers.

Tracking issues filed for follow-up

The previously closed #3327 (map_from_arrays null-input crash) is referenced from the map_from_arrays sub-bullet.

Audit process

Audited directly using the audit-comet-expression skill (4 Spark versions per #4468). Eight serdes plus existing test coverage.

How are these changes tested?

  • ./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite expressions/map/" -Dtest=none (9 tests pass after updating the substring matchers)
  • ./mvnw test -Dsuites="org.apache.comet.CometMapExpressionSuite" -Dtest=none (8 succeeded, 1 already-ignored)
  • make core succeeds.

… 4.1.1

Add per-version audit sub-bullets to `element_at`, `map_contains_key`,
`map_entries`, `map_from_arrays`, `map_from_entries`, `map_keys`,
`map_values`, and `str_to_map` in
`docs/source/contributor-guide/spark_expressions_support.md`.

Most of the category is byte-for-byte identical across the four
versions. Spark 4.0 adds the usual collation widening
(`StringTypeNonCSAICollation` on `StringToMap`) and the
`nullIntolerant` field refactor. Spark 4.1 adds a
`legacySplitTruncate` flag to `StringToMap` driven by
`spark.sql.legacy.truncateForEmptyRegexSplit` that the Comet native
impl does not honour.

Apply support-level consistency fixes surfaced by the audit:

- `CometMapFromEntries`: rephrase the BinaryType reasons to read
  consistently (`BinaryType` is not supported as a map key/value in
  `map_from_entries`) and backtick the type name and function name.
- `CometStrToMap`: drop the no-op `getSupportLevel` override (it
  returned `Compatible(None)`, which is also the default).

Update the affected test references in
`CometMapExpressionSuite` (via the shared constants) and the
`expressions/map/map_from_entries.sql` fallback substring matchers.

Tracking issue filed for the gap found:

- apache#4477 `str_to_map` does not honour Spark 4.1.1
  `legacy.truncateForEmptyRegexSplit` flag (referenced from the
  `str_to_map` sub-bullet).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

map_from_arrays() with NULL inputs causes native crash

1 participant