Describe the bug
width_bucket (Spark 3.5+) is wired through CometExprShim.versionSpecificExprToProtoInternal rather than through a CometExpressionSerde registered in QueryPlanSerde.exprSerdeMap. As a result:
- It bypasses the normal
getSupportLevel / getUnsupportedReasons / getIncompatibleReasons hooks, so it cannot signal any incompat or unsupported branches.
- It is invisible to the auto-generated compatibility doc (
docs/source/user-guide/compatibility.md).
- It is invisible to the per-expression
spark.comet.expression.<Name>.{enabled,allowIncompatible} configs.
- The wiring is duplicated across four shim files (
spark-3.5, spark-4.0, spark-4.1, spark-4.2), so any future change has to be applied four times.
width_bucket also supports Spark's YearMonthIntervalType and DayTimeIntervalType, but Comet's tests only cover DoubleType. The native SparkWidthBucket declares the interval signatures, but the wiring gap means there is no way to mark them as Unsupported if a future bug is found.
Surfaced by the math-expressions audit (collection PR queue).
Expected behavior
Move width_bucket to a CometExpressionSerde[WidthBucket] registered in QueryPlanSerde.mathExpressions, matching the pattern used by every other math expression. The serde can either accept all types and forward to the native UDF, or branch on input types and call Unsupported for unsupported cases.
Additional context
- Shim location:
spark/src/main/spark-3.5/org/apache/comet/shims/CometExprShim.scala (plus spark-4.0, spark-4.1, spark-4.2)
- Native UDF: datafusion-spark
SparkWidthBucket, registered in native/core/src/execution/jni_api.rs
width_bucket is unsupported on Spark 3.4.3 (the function was added in 3.5).
Describe the bug
width_bucket(Spark 3.5+) is wired throughCometExprShim.versionSpecificExprToProtoInternalrather than through aCometExpressionSerderegistered inQueryPlanSerde.exprSerdeMap. As a result:getSupportLevel/getUnsupportedReasons/getIncompatibleReasonshooks, so it cannot signal any incompat or unsupported branches.docs/source/user-guide/compatibility.md).spark.comet.expression.<Name>.{enabled,allowIncompatible}configs.spark-3.5,spark-4.0,spark-4.1,spark-4.2), so any future change has to be applied four times.width_bucketalso supports Spark'sYearMonthIntervalTypeandDayTimeIntervalType, but Comet's tests only coverDoubleType. The nativeSparkWidthBucketdeclares the interval signatures, but the wiring gap means there is no way to mark them asUnsupportedif a future bug is found.Surfaced by the math-expressions audit (collection PR queue).
Expected behavior
Move
width_bucketto aCometExpressionSerde[WidthBucket]registered inQueryPlanSerde.mathExpressions, matching the pattern used by every other math expression. The serde can either accept all types and forward to the native UDF, or branch on input types and callUnsupportedfor unsupported cases.Additional context
spark/src/main/spark-3.5/org/apache/comet/shims/CometExprShim.scala(plusspark-4.0,spark-4.1,spark-4.2)SparkWidthBucket, registered innative/core/src/execution/jni_api.rswidth_bucketis unsupported on Spark 3.4.3 (the function was added in 3.5).