feat(python/sedonadb): add DataFrame.sort with composable SortExpr by jiayuasu · Pull Request #859 · apache/sedona-db

jiayuasu · 2026-05-19T05:01:55Z

Redesigned per @paleolimbot's review on the earlier draft — replacing pandas-style sort_values(by=, ascending=) with the composable API pattern from DataFusion-python / DuckDB / Ibis.

Continues Phase P2 of #791.

API

from sedonadb.expr import col, sort_expr

# Common cases — Expr.asc()/.desc() return SortExpr
df.sort("x")                          # str auto-promotes to ascending
df.sort(col("x").desc())
df.sort(col("x"), col("y").desc())    # varargs, multi-key
df.sort(col("x") + col("y"))          # arbitrary Expr key

# Full control via the factory
df.sort(sort_expr(col("x"), asc=False, nulls_first=True))

DataFrame.sort accepts each key as str | Expr | SortExpr. Strings and bare Expr auto-promote to ascending keys with nulls last; Expr.asc() / Expr.desc() cover the common direction switch; sedonadb.expr.sort_expr(expr, asc=..., nulls_first=...) exposes the full DataFusion SortExpr knobs.

What's gone vs. the earlier draft

sort_values() — removed entirely. Per the discussion: no pandas-compat duplicate at this layer; that can come later in a dedicated GeoPandas-compat surface.
by= / ascending= keyword args — replaced by varargs (formats better with Ruff and matches the three reference interfaces).

Null placement

Expr.asc() and Expr.desc() default to nulls_first=False (nulls last) regardless of direction — matches the previous draft's pandas-style default and what the rest of the SedonaDB Python API ships. SQL-style nulls-first-on-descending is available via sort_expr(col("x"), asc=False, nulls_first=True).

Implementation

File	Change
`python/sedonadb/src/expr.rs`	New `PySortExpr` PyO3 class wrapping `datafusion_expr::SortExpr`. `Expr.asc(nulls_first)` / `.desc(nulls_first)` methods. `expr_sort_expr(expr, asc, nulls_first)` factory.
`python/sedonadb/src/dataframe.rs`	`InternalDataFrame::sort(Vec<PySortExpr>)`; old `sort_by_keys` gone.
`python/sedonadb/src/lib.rs`	Register `PySortExpr` + `expr_sort_expr`.
`python/sedonadb/python/sedonadb/expr/expression.py`	Python `SortExpr` class, `sort_expr()` factory, `Expr.asc/.desc` methods.
`python/sedonadb/python/sedonadb/expr/__init__.py`	Re-export `SortExpr`, `sort_expr`.
`python/sedonadb/python/sedonadb/dataframe.py`	`DataFrame.sort(*keys)`. Module-level `SortExpr` import (no lazy imports, per the policy locked on #852).

Tests

tests/expr/test_dataframe_sort.py — 14 tests covering string/Expr/SortExpr keys, mixed-direction multi-key, computed-Expr keys, nulls-last default in both directions, sort_expr(nulls_first=True), lazy return type (isinstance(out, DataFrame)), and the empty / bad-type / unknown-column error paths.
tests/expr/test_expression.py — 9 new tests pinning the exact repr() of SortExpr for asc / desc / both null placements / the factory / both type-guard rejections.

Local: 97 unit + 29 doctests + ruff format + ruff check all clean.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

paleolimbot

Since Pandas invented sort_values() there have been more elegant/composable ways to handle this that have evolved.

For our purposes, I think we should:

Expose all the bells and whistles of DataFusion's SortExpr via sedonadb.expr.sort_expr(expr, <options like asc/dsc/nulls>)
Add methods to Expr (.asc(nulls), .desc(nulls)) that return SortExpr
Accept str, Expr, or SortExpr in sort_values
Name it either sort() (DataFusion python, DuckDB) or order_by() (Ibis)
Accept multiple arguments instead of a list (auto formats with Ruff into multiple lines better, all three newer interfaces do this)

Some more elegant interfaces for reference:

jiayuasu · 2026-05-20T18:44:35Z

Should we keep both sort_values and sort to maintain some compatibility with Pandas?

paleolimbot · 2026-05-21T19:50:27Z

Should we keep both sort_values and sort to maintain some compatibility with Pandas?

I don't think so. The GeoPandas compatibility layer is separate where we can do that thing and do that thing well. We can always add later...I'd prefer to start without duplicate "convenience" until we're sure they're actually convenient.

Replace the pandas-style `sort_values(by=, ascending=)` proposed in the earlier draft with the composable API pattern preferred in DataFusion-python / DuckDB / Ibis: a `sort(*keys)` varargs method plus a first-class `SortExpr` type users build via `Expr.asc/desc()` or `sedonadb.expr.sort_expr()`. API: df.sort("x") df.sort(col("x").desc()) df.sort(col("x"), col("y").desc()) # varargs df.sort(sort_expr(col("x") + col("y"), asc=False, nulls_first=False)) # full knobs `DataFrame.sort` accepts each key as `str`, `Expr`, or `SortExpr`. Strings and bare `Expr` auto-promote to ascending keys with nulls placed last. Use `Expr.asc()` / `Expr.desc()` for the common direction switch, and `sedonadb.expr.sort_expr(expr, asc=..., nulls_first=...)` for full control (e.g. SQL-style nulls-first on descending). Rust side: - New `PySortExpr` PyO3 class wrapping `datafusion_expr::SortExpr`, exposed to Python as `_lib.InternalSortExpr`. - `PyExpr.asc(nulls_first=false)` / `PyExpr.desc(nulls_first=false)` return `PySortExpr`. - `expr_sort_expr(expr, asc=true, nulls_first=false)` factory. - `InternalDataFrame::sort_by_keys` removed in favor of `InternalDataFrame::sort(Vec<PySortExpr>)`. Python side: - New `SortExpr` user-facing class in `sedonadb.expr.expression`. - Top-level `sort_expr()` factory plus `Expr.asc/desc` methods. - `sedonadb.expr.SortExpr` and `sedonadb.expr.sort_expr` re-exported. - `DataFrame.sort_values` removed; `DataFrame.sort(*keys)` added. Module-level import of `SortExpr` keeps the lazy-import policy consistent with the rest of `dataframe.py`. Tests: 14 in `test_dataframe_sort.py` covering string/Expr/SortExpr keys, mixed-direction multi-key, computed-Expr keys, nulls-last default in both directions, `sort_expr(nulls_first=True)`, lazy return type, and the empty/bad-type/unknown-column error paths. 9 new tests in `test_expression.py` lock the exact `repr()` of `SortExpr` for asc/desc, both null placements, the factory, and the type-guard rejections. Closes the design discussion on apache#859.

jiayuasu · 2026-05-22T06:34:26Z

@paleolimbot — redesigned per your review. Force-pushed d88b91f6 over the earlier sort_values commit. Branch name is stale (kept it to preserve this PR thread); the PR title and description are updated.

All five of your asks landed:

sedonadb.expr.sort_expr(expr, asc=True, nulls_first=False) — full-knob factory.
Expr.asc(nulls_first=False) / Expr.desc(nulls_first=False) — common-case sugar, return SortExpr.
DataFrame.sort accepts str | Expr | SortExpr — strings and bare Expr auto-promote to ascending keys.
Named sort() (matching DataFusion-python and DuckDB) rather than order_by().
Varargs — df.sort(col("x"), col("y").desc()), no list.

sort_values() is gone — no pandas-compat duplicate at this layer, per the thread.

The Expr.asc/.desc shortcuts default to nulls_first=False (nulls last) in both directions, matching what the rest of the SedonaDB Python API already ships. SQL-style nulls-first-on-descending is reachable via sort_expr(col("x"), asc=False, nulls_first=True).

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

paleolimbot · 2026-05-22T14:35:21Z

+        coerced: List = []
+        for k in keys:
+            if isinstance(k, SortExpr):
+                coerced.append(k._impl)
+            elif isinstance(k, Expr):
+                # Default direction is ascending, nulls last.
+                coerced.append(k.asc()._impl)
+            elif isinstance(k, str):
+                coerced.append(_col(k).asc()._impl)
+            else:
+                raise TypeError(
+                    f"sort() expects str, Expr, or SortExpr arguments, "
+                    f"got {type(k).__name__}"
+                )


I think you're fine here 🙂

paleolimbot

Thank you!

paleolimbot · 2026-05-22T14:22:16Z

 from sedonadb._lib import InternalExpr as _InternalExpr
+from sedonadb._lib import InternalSortExpr as _InternalSortExpr
 from sedonadb._lib import expr_binary as _expr_binary
 from sedonadb._lib import expr_col as _expr_col
 from sedonadb._lib import expr_lit as _expr_lit
 from sedonadb._lib import expr_not as _expr_not
+from sedonadb._lib import expr_sort_expr as _expr_sort_expr


Can you consense these into a single import? Ruff's "format imports" might do this for you nicely.

Combined in d17eccc — single from sedonadb._lib import (..., ...) block. Didn't enable a workspace-wide isort rule in this PR; happy to do it separately if you'd like.

paleolimbot · 2026-05-22T14:35:21Z

+        coerced: List = []
+        for k in keys:
+            if isinstance(k, SortExpr):
+                coerced.append(k._impl)
+            elif isinstance(k, Expr):
+                # Default direction is ascending, nulls last.
+                coerced.append(k.asc()._impl)
+            elif isinstance(k, str):
+                coerced.append(_col(k).asc()._impl)
+            else:
+                raise TypeError(
+                    f"sort() expects str, Expr, or SortExpr arguments, "
+                    f"got {type(k).__name__}"
+                )


I think you're fine here 🙂

Replace the pandas-style `sort_values(by=, ascending=)` proposed in the earlier draft with the composable API pattern preferred in DataFusion-python / DuckDB / Ibis: a `sort(*keys)` varargs method plus a first-class `SortExpr` type users build via `Expr.asc/desc()` or `sedonadb.expr.sort_expr()`. API: df.sort("x") df.sort(col("x").desc()) df.sort(col("x"), col("y").desc()) # varargs df.sort(sort_expr(col("x") + col("y"), asc=False, nulls_first=False)) # full knobs `DataFrame.sort` accepts each key as `str`, `Expr`, or `SortExpr`. Strings and bare `Expr` auto-promote to ascending keys with nulls placed last. Use `Expr.asc()` / `Expr.desc()` for the common direction switch, and `sedonadb.expr.sort_expr(expr, asc=..., nulls_first=...)` for full control (e.g. SQL-style nulls-first on descending). Rust side: - New `PySortExpr` PyO3 class wrapping `datafusion_expr::SortExpr`, exposed to Python as `_lib.InternalSortExpr`. - `PyExpr.asc(nulls_first=false)` / `PyExpr.desc(nulls_first=false)` return `PySortExpr`. - `expr_sort_expr(expr, asc=true, nulls_first=false)` factory. - `InternalDataFrame::sort_by_keys` removed in favor of `InternalDataFrame::sort(Vec<PySortExpr>)`. Python side: - New `SortExpr` user-facing class in `sedonadb.expr.expression`. - Top-level `sort_expr()` factory plus `Expr.asc/desc` methods. - `sedonadb.expr.SortExpr` and `sedonadb.expr.sort_expr` re-exported. - `DataFrame.sort_values` removed; `DataFrame.sort(*keys)` added. Module-level import of `SortExpr` keeps the lazy-import policy consistent with the rest of `dataframe.py`. Tests: 14 in `test_dataframe_sort.py` covering string/Expr/SortExpr keys, mixed-direction multi-key, computed-Expr keys, nulls-last default in both directions, `sort_expr(nulls_first=True)`, lazy return type, and the empty/bad-type/unknown-column error paths. 9 new tests in `test_expression.py` lock the exact `repr()` of `SortExpr` for asc/desc, both null placements, the factory, and the type-guard rejections. Closes the design discussion on apache#859.

jiayuasu requested a review from Copilot May 19, 2026 05:15

Copilot started reviewing on behalf of jiayuasu May 19, 2026 05:15 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

github-actions Bot requested a review from prantogg May 19, 2026 05:51

paleolimbot reviewed May 19, 2026

View reviewed changes

jiayuasu force-pushed the feature/df-sort-values branch from 5176de9 to d88b91f Compare May 22, 2026 06:33

jiayuasu changed the title ~~feat(python/sedonadb): add DataFrame.sort_values~~ feat(python/sedonadb): add DataFrame.sort with composable SortExpr May 22, 2026

jiayuasu requested a review from Copilot May 22, 2026 06:35

Copilot started reviewing on behalf of jiayuasu May 22, 2026 06:35 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

paleolimbot approved these changes May 22, 2026

View reviewed changes

jiayuasu force-pushed the feature/df-sort-values branch from d88b91f to d17eccc Compare May 22, 2026 19:47

jiayuasu marked this pull request as ready for review May 22, 2026 19:47

jiayuasu merged commit fdd0d84 into apache:main May 23, 2026
5 checks passed

jiayuasu mentioned this pull request May 23, 2026

feat(python/sedonadb): add DataFrame.drop #871

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(python/sedonadb): add DataFrame.sort with composable SortExpr#859

feat(python/sedonadb): add DataFrame.sort with composable SortExpr#859
jiayuasu merged 1 commit into
apache:mainfrom
jiayuasu:feature/df-sort-values

jiayuasu commented May 19, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

paleolimbot left a comment •

edited

Loading

Uh oh!

jiayuasu commented May 20, 2026

Uh oh!

paleolimbot commented May 21, 2026

Uh oh!

jiayuasu commented May 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

paleolimbot May 22, 2026

Uh oh!

paleolimbot left a comment

Uh oh!

paleolimbot May 22, 2026

Uh oh!

jiayuasu May 22, 2026

Uh oh!

paleolimbot May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jiayuasu commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

API

What's gone vs. the earlier draft

Null placement

Implementation

Tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

paleolimbot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jiayuasu commented May 20, 2026

Uh oh!

paleolimbot commented May 21, 2026

Uh oh!

jiayuasu commented May 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

paleolimbot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

paleolimbot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

jiayuasu May 22, 2026

Choose a reason for hiding this comment

Uh oh!

paleolimbot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jiayuasu commented May 19, 2026 •

edited

Loading

paleolimbot left a comment •

edited

Loading