feat: support sort_array expression by grorge123 · Pull Request #3706 · apache/datafusion-comet

grorge123 · 2026-03-15T03:50:57Z

Which issue does this PR close?

Closes #3953
Closes #3159

Rationale for this change

Currently, comet does not support sort_array expression, so using sort_array(...) would fall back to Spark. This PR adds sort_array support to achieve native acceleration.

The SortArray expression sorts the elements of an array in either ascending or descending order.

What changes are included in this PR?

Add CometSortArray in arrays.scala to serialize Spark SortArray as DataFusion array_sort.
Register SortArray in QueryPlanSerde.scala.
Preserve Spark sort semantics:
- sort_array(arr) / sort_array(arr, true) -> ascending with NULLS FIRST
- sort_array(arr, false) -> descending with NULLS LAST
Mark floating-point array sorting as Incompatible only when spark.comet.exec.strictFloatingPoint=true.
Explicitly reject unsupported nested complex cases such as array<array<struct<...>>> at planning time so they cleanly fall back to Spark instead of failing at runtime.
Update the supported-expression documentation in spark_expressions_support.md.

How are these changes tested?

Added SQL-file coverage in sort_array.sql for:
- array
- array
- array including NaN, -0.0, and 0.0
- array<decimal(10,0)>
- array
- array<struct<...>>
- array<array>
- array literal case
- empty arrays
- null arrays
- explicit ascending / descending paths
- literal and table-column inputs

Reference: https://github.com/apache/spark/blob/04b821c69e85be5f51a1270b3a9a4155afdb5334/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala#L706-L760

andygrove · 2026-03-16T13:16:49Z

+        true
+      case ArrayType(elementType, _) =>
+        canRank(elementType, nestedInArray = true)
+      case StructType(fields) if !nestedInArray =>


could you add a comment explaining why there is a restriction around structs in arrays?

Sure, I have added it. Besides, I found nulltype has a similar problem, I have fixed it.

andygrove

LGTM, with one question. Thanks @grorge123!

andygrove · 2026-03-20T14:15:58Z

@grorge123 Could you add a microbenchmark for this expression so that we can see how it performs relative to Spark? This could be a separate PR. See https://github.com/apache/datafusion-comet/tree/main/spark/src/test/scala/org/apache/spark/sql/benchmark for current benchmarks.

grorge123 · 2026-03-22T02:40:20Z

Ok, I will raise another PR to add it.

grorge123 · 2026-03-29T01:05:00Z

Hi @andygrove, just a follow-up on this PR.
Please let me know if there is anything else I should add or revise here. Thanks!

0lai0 · 2026-03-29T14:18:32Z

Thank @grorge123 ! LGTM

hsiang-c · 2026-04-13T21:54:50Z

+SELECT sort_array(arr, true) FROM test_sort_array_int
+
+query
+SELECT sort_array(arr, false) FROM test_sort_array_int


👍 This covers both cases mentioned in Spark's comment:

Null elements will be placed at the beginning of the returned array in ascending order or at the end of the returned array in descending order.

comphead · 2026-04-16T23:47:42Z

HI @grorge123 Thanks for the PR, I didn't notice this sort_array PR and created another one, let me go really quick through it

comphead · 2026-04-16T23:51:49Z

-      }
+  def supportedScalarSortElementType(dt: DataType): Boolean = {
+    dt match {
+      case _: ByteType | _: ShortType | _: IntegerType | _: LongType | _: FloatType |


can we please combine all true branches?

comphead

Thanks @grorge123 just a small nit on branches, the PR looks good to me.

Please add also incompatibility information to compatibility.md for Array Expressions

feat: support sort_array expression

adb4e57

andygrove reviewed Mar 16, 2026

View reviewed changes

fix: null type in nested array

0c3a13d

grorge123 mentioned this pull request Mar 22, 2026

feat: add sort_array benchmark #3758

Merged

0lai0 reviewed Mar 29, 2026

View reviewed changes

Comment thread spark/src/main/scala/org/apache/comet/serde/arrays.scala Outdated

fix: reduce redundant match

d47f856

grorge123 requested a review from andygrove April 1, 2026 12:24

hsiang-c reviewed Apr 10, 2026

View reviewed changes

Comment thread spark/src/test/resources/sql-tests/expressions/array/sort_array.sql Outdated

hsiang-c reviewed Apr 10, 2026

View reviewed changes

Comment thread spark/src/test/resources/sql-tests/expressions/array/sort_array.sql Outdated

hsiang-c reviewed Apr 10, 2026

View reviewed changes

Comment thread spark/src/main/scala/org/apache/comet/serde/arrays.scala

refactor: reuse sort checker in supportedSortType

2a08799

hsiang-c reviewed Apr 13, 2026

View reviewed changes

Comment thread spark/src/main/scala/org/apache/comet/serde/arrays.scala

hsiang-c reviewed Apr 13, 2026

View reviewed changes

test: add non-boolean case

3721201

parthchandra reviewed Apr 15, 2026

View reviewed changes

Comment thread spark/src/test/resources/sql-tests/expressions/array/sort_array.sql

test: add date, timestamp, and binary case

8fea480

comphead mentioned this pull request Apr 16, 2026

feat: support sort_array #3962

Draft

comphead reviewed Apr 16, 2026

View reviewed changes

comphead approved these changes Apr 16, 2026

View reviewed changes

Conversation

grorge123 commented Mar 15, 2026 • edited by comphead Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

andygrove Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

grorge123 Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

andygrove left a comment

Choose a reason for hiding this comment

Uh oh!

andygrove commented Mar 20, 2026

Uh oh!

grorge123 commented Mar 22, 2026

Uh oh!

grorge123 commented Mar 29, 2026

Uh oh!

Uh oh!

0lai0 commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hsiang-c Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

comphead commented Apr 16, 2026

Uh oh!

comphead Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

comphead left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

grorge123 commented Mar 15, 2026 •

edited by comphead

Loading

0lai0 commented Mar 29, 2026 •

edited

Loading