Skip to content

[Feature] support size() for MapType inputs #4472

@andygrove

Description

@andygrove

Describe the bug

Spark's size(expr) accepts both ArrayType and MapType inputs (Size.inputTypes = Seq(TypeCollection(ArrayType, MapType)) in collectionOperations.scala, identical across 3.4.3 / 3.5.8 / 4.0.1 / 4.1.1). Comet's CometSize only supports ArrayType; for MapType it returns Unsupported(Some("size does not support map inputs")) and falls back to Spark.

Surfaced by the collection-expressions audit in the collection-expressions audit PR.

Steps to reproduce

CREATE TABLE t(m map<string, int>) USING parquet;
INSERT INTO t VALUES (map('a', 1, 'b', 2));
SELECT size(m) FROM t;

Spark returns 2. Comet falls back to Spark for the entire plan node.

Expected behavior

Native support for size(<map>). Arrow's MapArray carries a length per row that can drive the same numElements semantics Spark uses, with the existing legacySizeOfNull config-driven null handling that CometSize already implements for arrays.

Additional context

  • Serde: CometSize in spark/src/main/scala/org/apache/comet/serde/arrays.scala (line ~640)
  • Native: routes through size scalar function in comet_scalar_funcs.rs; the size UDF would need a MapType branch.
  • Related: cardinality is an alias for size in Spark and would benefit from the same fix.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions