Conversation
| true | ||
| case ArrayType(elementType, _) => | ||
| canRank(elementType, nestedInArray = true) | ||
| case StructType(fields) if !nestedInArray => |
There was a problem hiding this comment.
could you add a comment explaining why there is a restriction around structs in arrays?
There was a problem hiding this comment.
Sure, I have added it. Besides, I found nulltype has a similar problem, I have fixed it.
andygrove
left a comment
There was a problem hiding this comment.
LGTM, with one question. Thanks @grorge123!
|
@grorge123 Could you add a microbenchmark for this expression so that we can see how it performs relative to Spark? This could be a separate PR. See https://github.com/apache/datafusion-comet/tree/main/spark/src/test/scala/org/apache/spark/sql/benchmark for current benchmarks. |
|
Ok, I will raise another PR to add it. |
|
Hi @andygrove, just a follow-up on this PR. |
|
Thank @grorge123 ! LGTM |
| SELECT sort_array(arr, true) FROM test_sort_array_int | ||
|
|
||
| query | ||
| SELECT sort_array(arr, false) FROM test_sort_array_int |
There was a problem hiding this comment.
👍 This covers both cases mentioned in Spark's comment:
Null elements will be placed at the beginning of the returned array in ascending order or at the end of the returned array in descending order.
|
HI @grorge123 Thanks for the PR, I didn't notice this |
| } | ||
| def supportedScalarSortElementType(dt: DataType): Boolean = { | ||
| dt match { | ||
| case _: ByteType | _: ShortType | _: IntegerType | _: LongType | _: FloatType | |
There was a problem hiding this comment.
can we please combine all true branches?
comphead
left a comment
There was a problem hiding this comment.
Thanks @grorge123 just a small nit on branches, the PR looks good to me.
Please add also incompatibility information to compatibility.md for Array Expressions
Which issue does this PR close?
Closes #3953
Closes #3159
Rationale for this change
Currently, comet does not support
sort_arrayexpression, so usingsort_array(...)would fall back to Spark. This PR adds sort_array support to achieve native acceleration.The SortArray expression sorts the elements of an array in either ascending or descending order.
What changes are included in this PR?
How are these changes tested?
- array
- array
- array including NaN, -0.0, and 0.0
- array<decimal(10,0)>
- array
- array<struct<...>>
- array<array>
- array literal case
- empty arrays
- null arrays
- explicit ascending / descending paths
- literal and table-column inputs
Reference: https://github.com/apache/spark/blob/04b821c69e85be5f51a1270b3a9a4155afdb5334/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala#L706-L760