Describe the bug
Spark's concat(...) accepts StringType, BinaryType, and ArrayType arguments (Concat.allowedTypes = Seq(StringType, BinaryType, ArrayType) in collectionOperations.scala, widened to StringTypeWithCollation in Spark 4.0+). Comet's CometConcat only natively supports StringType children; for BinaryType or ArrayType it falls back to Spark.
Surfaced by the collection-expressions audit in the collection-expressions audit PR. The audit relabels the getSupportLevel branch from Incompatible to Unsupported (the fallback is a genuine "Comet does not support" case, not a wrong-result case), but the underlying coverage gap remains.
Steps to reproduce
-- BinaryType
SELECT concat(unhex('CAFE'), unhex('BEEF'));
-- ArrayType
CREATE TABLE t(a array<int>, b array<int>) USING parquet;
INSERT INTO t VALUES (array(1, 2), array(3, 4));
SELECT concat(a, b) FROM t;
Both queries currently fall back to Spark.
Expected behavior
Native support for concat over BinaryType (concatenate byte arrays) and ArrayType (concatenate arrays, equivalent to array_concat).
Additional context
- Serde:
CometConcat in spark/src/main/scala/org/apache/comet/serde/strings.scala
- DataFusion has
array_concat for the array case; Comet already wires it for the array_concat function. The work is to route Concat(<array>...) through the same native path.
- For BinaryType, DataFusion's
concat UDF is Utf8-only, so a Comet-side helper or upstream patch would be needed.
Describe the bug
Spark's
concat(...)acceptsStringType,BinaryType, andArrayTypearguments (Concat.allowedTypes = Seq(StringType, BinaryType, ArrayType)incollectionOperations.scala, widened toStringTypeWithCollationin Spark 4.0+). Comet'sCometConcatonly natively supportsStringTypechildren; forBinaryTypeorArrayTypeit falls back to Spark.Surfaced by the collection-expressions audit in the collection-expressions audit PR. The audit relabels the
getSupportLevelbranch fromIncompatibletoUnsupported(the fallback is a genuine "Comet does not support" case, not a wrong-result case), but the underlying coverage gap remains.Steps to reproduce
Both queries currently fall back to Spark.
Expected behavior
Native support for
concatoverBinaryType(concatenate byte arrays) andArrayType(concatenate arrays, equivalent toarray_concat).Additional context
CometConcatinspark/src/main/scala/org/apache/comet/serde/strings.scalaarray_concatfor the array case; Comet already wires it for the array_concat function. The work is to routeConcat(<array>...)through the same native path.concatUDF is Utf8-only, so a Comet-side helper or upstream patch would be needed.