Skip to content

chore(audit): audit cast across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1#4493

Open
andygrove wants to merge 1 commit into
apache:mainfrom
andygrove:worktree-audit-cast
Open

chore(audit): audit cast across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1#4493
andygrove wants to merge 1 commit into
apache:mainfrom
andygrove:worktree-audit-cast

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #.

Rationale for this change

Completes the per-category expression audit started in earlier PRs (#4469 struct, #4470 json, #4473 collection, #4474 misc, #4475 conditional, #4476 hash, #4478 map, #4479 bitwise, #4480 predicate, #4483 array, #4486 math), using the updated audit-comet-expression skill in #4468.

cast is the only expression in conversion_funcs; the other entries (bigint, binary, boolean, ...) are SQL-syntactic shorthands for CAST(... AS X) and resolve to Cast.

What changes are included in this PR?

Support-doc audit notes

Add a per-version audit sub-bullet to cast in docs/source/contributor-guide/spark_expressions_support.md. Highlights:

  • Cast(child, dataType, timeZoneId, evalMode) signature is stable across 3.4 / 3.5 / 4.0 / 4.1; per-version EvalMode resolution lives in CometExprShim.
  • Comet uses a per-source-type support matrix in CometCast.scala that returns Compatible / Incompatible(reason) / Unsupported(reason) for each (from, to) pair.
  • Spark's numeric-to-numeric canCast matrix is unchanged across all four versions; Spark 4.0 adds VariantType and collated StringType; Spark 4.1 adds TimeType and geospatial types with their own arms. None of those new arms are in CometCast, so they fall back to Spark.

Support-level consistency fixes

None in this PR. The audit surfaced cosmetic inconsistencies (e.g. canCastFromFloat/Double/Decimal ignore evalMode; the static getUnsupportedReasons / getIncompatibleReasons don't enumerate the per-pair reasons returned by isSupported; one case _: Incompatible() with no reason in the array-of-binary branch) but these touch a load-bearing file and are deferred to follow-ups.

Tracking issues filed for follow-up

Existing #1371 (CAST(float|double AS decimal) rounding) and #2190 (Spark 4.0 collation umbrella) are also referenced from the doc sub-bullet.

Audit process

Audited using the audit-comet-expression skill (4 Spark versions per #4468), driven by 3 parallel agents covering high-level + numeric, string conversions, and datetime + complex types.

How are these changes tested?

  • make core succeeds (no code changes; doc only).
  • Existing test coverage in CometCastSuite and the cast-related SQL fixtures remains unchanged.

Add the per-version audit sub-bullet to `cast` in
`docs/source/contributor-guide/spark_expressions_support.md`.

Cast is wired through `CometCast`
(`spark/src/main/scala/org/apache/comet/expressions/CometCast.scala`)
with a per-source-type support matrix and per-eval-mode
(`LEGACY`/`ANSI`/`TRY`) handling. The native side implements
explicit narrowing-numeric arms with Spark-shaped error messages and
falls through to DataFusion `cast_with_options` for the rest.

Spark's `Cast.canCast` numeric-to-numeric matrix is unchanged across
3.4 / 3.5 / 4.0 / 4.1. Spark 4.0 adds `VariantType` and collation-
aware `StringType`; Spark 4.1 adds `TimeType` and geospatial types
with their own arms. None of those new arms are in `CometCast`, so
they fall back to Spark.

Tracking issues filed for the gaps found:

- apache#4488 `CAST(binary AS string)` uses unsafe `from_utf8_unchecked`.
- apache#4489 Spark 4.0 collated-string casts are implicitly unhandled, no test.
- apache#4490 Spark 4.1 `TimeType` casts have no explicit `Unsupported` arm.
- apache#4491 `CAST(map AS map)` falls back even though native `cast_map_to_map` exists.
- apache#4492 `spark.sql.legacy.castComplexTypesToString.enabled` is not honoured.

Existing apache#1371 (`CAST(float|double AS decimal)` rounding) and apache#2190
(Spark 4.0 collation umbrella) are also referenced.

The audit was driven by 3 parallel agents covering high-level + numeric,
string conversions, and datetime + complex.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant