Describe the bug
Spark's size(expr) accepts both ArrayType and MapType inputs (Size.inputTypes = Seq(TypeCollection(ArrayType, MapType)) in collectionOperations.scala, identical across 3.4.3 / 3.5.8 / 4.0.1 / 4.1.1). Comet's CometSize only supports ArrayType; for MapType it returns Unsupported(Some("size does not support map inputs")) and falls back to Spark.
Surfaced by the collection-expressions audit in the collection-expressions audit PR.
Steps to reproduce
CREATE TABLE t(m map<string, int>) USING parquet;
INSERT INTO t VALUES (map('a', 1, 'b', 2));
SELECT size(m) FROM t;
Spark returns 2. Comet falls back to Spark for the entire plan node.
Expected behavior
Native support for size(<map>). Arrow's MapArray carries a length per row that can drive the same numElements semantics Spark uses, with the existing legacySizeOfNull config-driven null handling that CometSize already implements for arrays.
Additional context
- Serde:
CometSize in spark/src/main/scala/org/apache/comet/serde/arrays.scala (line ~640)
- Native: routes through
size scalar function in comet_scalar_funcs.rs; the size UDF would need a MapType branch.
- Related:
cardinality is an alias for size in Spark and would benefit from the same fix.
Describe the bug
Spark's
size(expr)accepts bothArrayTypeandMapTypeinputs (Size.inputTypes = Seq(TypeCollection(ArrayType, MapType))incollectionOperations.scala, identical across 3.4.3 / 3.5.8 / 4.0.1 / 4.1.1). Comet'sCometSizeonly supportsArrayType; forMapTypeit returnsUnsupported(Some("size does not support map inputs"))and falls back to Spark.Surfaced by the collection-expressions audit in the collection-expressions audit PR.
Steps to reproduce
Spark returns
2. Comet falls back to Spark for the entire plan node.Expected behavior
Native support for
size(<map>). Arrow'sMapArraycarries a length per row that can drive the samenumElementssemantics Spark uses, with the existinglegacySizeOfNullconfig-driven null handling thatCometSizealready implements for arrays.Additional context
CometSizeinspark/src/main/scala/org/apache/comet/serde/arrays.scala(line ~640)sizescalar function incomet_scalar_funcs.rs; the size UDF would need aMapTypebranch.cardinalityis an alias forsizein Spark and would benefit from the same fix.