Skip to content

[Bug] width_bucket bypasses CometExpressionSerde framework #4485

@andygrove

Description

@andygrove

Describe the bug

width_bucket (Spark 3.5+) is wired through CometExprShim.versionSpecificExprToProtoInternal rather than through a CometExpressionSerde registered in QueryPlanSerde.exprSerdeMap. As a result:

  • It bypasses the normal getSupportLevel / getUnsupportedReasons / getIncompatibleReasons hooks, so it cannot signal any incompat or unsupported branches.
  • It is invisible to the auto-generated compatibility doc (docs/source/user-guide/compatibility.md).
  • It is invisible to the per-expression spark.comet.expression.<Name>.{enabled,allowIncompatible} configs.
  • The wiring is duplicated across four shim files (spark-3.5, spark-4.0, spark-4.1, spark-4.2), so any future change has to be applied four times.

width_bucket also supports Spark's YearMonthIntervalType and DayTimeIntervalType, but Comet's tests only cover DoubleType. The native SparkWidthBucket declares the interval signatures, but the wiring gap means there is no way to mark them as Unsupported if a future bug is found.

Surfaced by the math-expressions audit (collection PR queue).

Expected behavior

Move width_bucket to a CometExpressionSerde[WidthBucket] registered in QueryPlanSerde.mathExpressions, matching the pattern used by every other math expression. The serde can either accept all types and forward to the native UDF, or branch on input types and call Unsupported for unsupported cases.

Additional context

  • Shim location: spark/src/main/spark-3.5/org/apache/comet/shims/CometExprShim.scala (plus spark-4.0, spark-4.1, spark-4.2)
  • Native UDF: datafusion-spark SparkWidthBucket, registered in native/core/src/execution/jni_api.rs
  • width_bucket is unsupported on Spark 3.4.3 (the function was added in 3.5).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions