Skip to content

[Feature] support concat() for BinaryType and ArrayType inputs #4471

@andygrove

Description

@andygrove

Describe the bug

Spark's concat(...) accepts StringType, BinaryType, and ArrayType arguments (Concat.allowedTypes = Seq(StringType, BinaryType, ArrayType) in collectionOperations.scala, widened to StringTypeWithCollation in Spark 4.0+). Comet's CometConcat only natively supports StringType children; for BinaryType or ArrayType it falls back to Spark.

Surfaced by the collection-expressions audit in the collection-expressions audit PR. The audit relabels the getSupportLevel branch from Incompatible to Unsupported (the fallback is a genuine "Comet does not support" case, not a wrong-result case), but the underlying coverage gap remains.

Steps to reproduce

-- BinaryType
SELECT concat(unhex('CAFE'), unhex('BEEF'));

-- ArrayType
CREATE TABLE t(a array<int>, b array<int>) USING parquet;
INSERT INTO t VALUES (array(1, 2), array(3, 4));
SELECT concat(a, b) FROM t;

Both queries currently fall back to Spark.

Expected behavior

Native support for concat over BinaryType (concatenate byte arrays) and ArrayType (concatenate arrays, equivalent to array_concat).

Additional context

  • Serde: CometConcat in spark/src/main/scala/org/apache/comet/serde/strings.scala
  • DataFusion has array_concat for the array case; Comet already wires it for the array_concat function. The work is to route Concat(<array>...) through the same native path.
  • For BinaryType, DataFusion's concat UDF is Utf8-only, so a Comet-side helper or upstream patch would be needed.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions