Skip to content

[Bug] array_max and array_min disagree with Spark on NaN ordering #4482

@andygrove

Description

@andygrove

Describe the bug

Spark documents that array_max and array_min treat NaN as greater than any non-NaN value for Float/Double element arrays (Spark uses SQLOrderingUtil.compareFloats/compareDoubles). DataFusion's array_max/array_min go through Arrow's partial_cmp-based kernels, which produce IEEE semantics where NaN comparisons are unordered.

For arrays containing NaN, the two implementations produce different results:

  • array_max(array(double('NaN'), 1.0, 2.0)) returns NaN in Spark, may return 2.0 or NULL in Comet depending on kernel behaviour.
  • array_min(array(double('NaN'), 1.0, 2.0)) returns 1.0 in both, but the Comet path is fragile.

Surfaced by the array-expressions audit (collection PR queue). The single covering literal test in CometArrayExpressionSuite uses array(double('-Infinity'), 0.0, double('Infinity')) and does not contain a NaN, so the divergence is currently uncaught by CI.

Steps to reproduce

SELECT array_max(array(CAST('NaN' AS DOUBLE), 1.0, 2.0));
-- Spark:  NaN
-- Comet:  varies (likely 2.0 or NULL)

SELECT array_min(array(CAST('NaN' AS DOUBLE), 1.0, 2.0));
-- Spark:  1.0
-- Comet:  varies

Expected behavior

Either implement Spark's NaN ordering on the Comet side or downgrade array_max / array_min to Incompatible(Some(...)) for FloatType / DoubleType element arrays so they only run with spark.comet.expression.ArrayMax.allowIncompatible=true (and the matching ArrayMin flag).

Additional context

  • Comet serdes: CometArrayMax, CometArrayMin in spark/src/main/scala/org/apache/comet/serde/arrays.scala.
  • Spark reference: ArrayMax.evalInternal / ArrayMin.evalInternal in collectionOperations.scala; uses getInterpretedOrdering which routes through SQLOrderingUtil for floats and doubles.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions