Commit 8ba06e4
Update datafusion dependency to latest in preparation for DF54 (#1532)
* feat: upgrade upstream DataFusion 53 → main (pre-54)
Bump workspace deps to apache/datafusion@3d06bedc (git pin) in
preparation for the 54.0.0 release. Workspace package version moves
to 54.0.0 to track the upstream major convention.
Compile fixes:
- Drop as_any impls (trait now has Any as supertrait) and use the
upstream-provided downcast_ref helper on dyn trait objects.
- Reconcile FFI provider From conversions to drop redundant `+ Send`
on Arc<dyn ...> bounds.
- Cast/TryCast: data_type → field.data_type() (FieldRef rename).
- Stub match arms for new Expr::HigherOrderFunction / Lambda /
LambdaVariable and ScalarValue::ListView / LargeListView variants;
proper exposure deferred to PR 3 audit.
- DatasetExec: partition_statistics returns Arc<Statistics>; add
required apply_expressions trait method.
- Suppress TableFunctionImpl::call deprecation pending call_with_args
refactor that needs Session plumbing.
User-facing test updates for upstream behavior changes:
- median / approx_median / approx_percentile_cont now return Float64.
- String functions (concat_ws, lower, upper, repeat, reverse,
split_part, translate) return StringView when given StringView.
- overlay appends past end-of-string rather than replacing the input.
- arrays_zip / list_zip struct field names "c0"/"c1" → "1"/"2".
- Filter on mismatched cast types now errors (was 0 matches).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: expose DataFrame.alias and tidy public API after DF53→54 audit
Companion to the upstream DataFusion 53 → main bump. The
check-upstream audit (PR 3 of dev/release/upstream-sync.md) surfaced a
small set of trivial wins; this commit ships them.
Trivial wins:
- DataFrame.alias(name) — wraps the logical plan in a SubqueryAlias.
- functions.__all__: add `instr` and `position` (both were defined as
public defs but missing from `__all__`, so they didn't show up in
`from datafusion.functions import *` or generated docs).
- top-level `datafusion.__all__`: re-export `TableProviderFactory` and
`TableProviderFactoryExportable` (previously only reachable via the
`datafusion.catalog` submodule).
Non-trivial gaps surfaced by the audit (DataFrame.registry,
into_*/task_ctx, SessionContext extensibility surface, distinct-aware
aggregate variants, TableFunctionImpl::call_with_args migration, FFI
Protocol pipeline gaps) are deferred — each warrants its own design
and PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* taplo fmt
* Update unit test to go along with apache/datafusion#22133
* docs: demonstrate alias via self-join in DataFrame.alias example
Prior example called alias("t") then to_pydict(), which did not show
the qualifier effect. Replace with a self-join that uses col("l.val")
and col("r.val") so the disambiguation behavior is visible.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: wrap higher-order, lambda, and lambda-variable Expr variants
DataFusion 54 introduces Expr::HigherOrderFunction, Expr::Lambda, and
Expr::LambdaVariable. PyExpr::to_variant previously errored on each
with py_unsupported_variant_err. Add PyHigherOrderFunction, PyLambda,
and PyLambdaVariable wrappers, register them in the expr pymodule and
re-export from python/datafusion/expr.py, and dispatch to_variant to
the new wrappers.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: wire rex_type and rex_call_operands for new Expr variants
Map HigherOrderFunction and Lambda to RexType::Call; LambdaVariable to
RexType::Reference. In rex_call_operands return the args for
HigherOrderFunction, the body for Lambda, and self for LambdaVariable
(mirroring Column). In rex_call_operator return the underlying UDF
name for HigherOrderFunction and the literal "lambda" for Lambda.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: support LargeList/ListView/LargeListView in map_from_scalar_to_arrow
These ScalarValue variants all wrap Arc<...Array>, exposing the outer
DataType via Array::data_type(), so we can mirror the existing
ScalarValue::List arm instead of returning PyNotImplementedError. This
makes Expr.types() work for plans that round-trip through SQL or proto
where these scalar variants surface.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor: switch PyTableFunction to non-deprecated call_with_args
DataFusion 53.0.0 deprecated TableFunctionImpl::call in favor of
call_with_args(args: TableFunctionArgs), which threads a Session
reference alongside the exprs. Implement call_with_args on
PyTableFunction (delegating to the FFI variant's call_with_args, or
ignoring the session for the pure-Python variant which doesn't use it)
and have __call__ build a TableFunctionArgs from the global session.
Drops both #[allow(deprecated)] attributes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* build: revert workspace version to 53.0.0 and move DF overrides to [patch.crates-io]
The workspace version was prematurely bumped to 54.0.0 in the
DF53→pre-54 upgrade. Restore it to 53.0.0 until we are actually
ready to cut the 54 release.
The same change had moved every datafusion-* dependency from a
crates.io version constraint to a direct git dep in
[workspace.dependencies]. Switch them back to "version = \"53\"" and
move the git rev overrides into [patch.crates-io] so the published
manifest will be patch-free.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* taplo format
* test: sort FFI test results by partition key before equality compare
Multi-partition `collect()` returns batches in execution-scheduling
order, which is non-deterministic and differs between local and CI
runners. Sort by the first value of column 0 (unique per partition in
each affected test) so the expected/actual comparison is stable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Bump datafusion main commit
* test: cover new DF54 expr wrappers, catalog factories, and DataFrame.alias
Add module-metadata checks for HigherOrderFunction, Lambda, LambdaVariable
and the top-level TableProviderFactory / TableProviderFactoryExportable
re-exports, plus a self-join regression test exercising the new
DataFrame.alias() qualifier-based selection path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent baef8f0 commit 8ba06e4
35 files changed
Lines changed: 865 additions & 622 deletions
File tree
- .ai/skills/check-upstream
- crates/core/src
- common
- expr
- examples/datafusion-ffi-example
- python/tests
- src
- python
- datafusion
- tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
32 | 55 | | |
33 | 56 | | |
34 | 57 | | |
| |||
173 | 196 | | |
174 | 197 | | |
175 | 198 | | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
176 | 221 | | |
177 | 222 | | |
178 | 223 | | |
| |||
0 commit comments