Skip to content

feat: custom Rust UDFs via arrow-ffi (alternative to #4283) [experimental]#4459

Draft
andygrove wants to merge 1 commit into
apache:mainfrom
andygrove:feat/rust-udfs-arrow-ffi
Draft

feat: custom Rust UDFs via arrow-ffi (alternative to #4283) [experimental]#4459
andygrove wants to merge 1 commit into
apache:mainfrom
andygrove:feat/rust-udfs-arrow-ffi

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Companion / alternative to #4283. Both PRs are drafts intended for comparison, not for merging.

Rationale for this change

This PR adds custom Rust scalar UDF support using only arrow-stable FFI surfaces, as suggested in feedback on #4283 by @paleolimbot and @timsaucer. The goal is to compare two ABI strategies side-by-side:

  • feat: custom Rust UDFs [experimental] [skip ci] #4283 — bespoke C ABI with custom type tags + IPC bytes + comet_udf_describe

  • this PR — arrow-only FFI, with two flavors offered side-by-side:

    1. C ABI (sedona-style) — pure C-callable struct of function pointers, parameterized only by Arrow's C Data Interface (FFI_ArrowSchema / FFI_ArrowArray). No DataFusion types appear in the FFI surface, so user libraries are decoupled from datafusion versions and the ABI is portable to C/C++. Modeled on apache/sedona-db's SedonaCScalarKernel.

    2. datafusion-ffi (FFI_ScalarUDF) — wraps the user's ScalarUDFImpl as FFI_ScalarUDF. The user's library inherits the full ScalarUDFImpl surface (variadic signatures, type coercion, metadata-aware return types) for free, at the cost of a major-version pin against datafusion-ffi.

A single library may export either or both flavors; the host loader walks both discovery functions. The test cdylib exposes add_one_c via the C ABI and add_one_df via datafusion-ffi, and the end-to-end Spark suite drives both.

The scope is intentionally minimal vs #4283 — scalar-only, happy-path e2e tests only — to keep the diff focused on the ABI strategy comparison. JVM API surface mirrors #4283 exactly so the JVM side is apples-to-apples.

What changes are included in this PR?

  • native/comet-udf-sdk — public SDK with both ABI flavors behind cargo features (c-abi, df-abi, both default-on). Provides CometCScalarUdf trait + comet_c_udf_export! macro for the C flavor; comet_df_udf_export! for datafusion-ffi.
  • native/comet-test-udfs — test cdylib exposing one UDF per ABI.
  • native/core/src/execution/rust_udfloader (libloading + ABI version probe + dual-discovery), process-wide cache, ImportedCScalarUdf adapter wrapping the C ABI as ScalarUDFImpl. The datafusion-ffi side needs no adapter — Arc<dyn ScalarUDFImpl>::from(&FFI_ScalarUDF) is built-in.
  • RustUdfCall proto in expr.proto (field 71) + planner branch in create_expr resolving (library_path, name) against the cache and dispatching through whichever ABI registered the kernel.
  • JNI bridge (CometRustUdfBridge / comet_rust_udf_bridge.rs) for driver-side validateLibrary / listUdfs.
  • Scala API (CometRustUDF.register, CometRustUdfRegistry, typed exception classes). The QueryPlanSerde path: CometScalaUDF.convert first checks the registry — if the udfName is registered, it emits RustUdfCall; otherwise it falls back to the existing JVM codegen dispatcher.

The Scala/Java files live under spark/.../udf/ (rather than common/.../udf/ as in #4283) to avoid the NativeBase-relocation churn. This is the only intentional structural divergence from #4283.

How are these changes tested?

  • SDK unit tests — 3 tests covering ABI version, C ABI adapter roundtrip (export → import via FFI), and datafusion-ffi list build + round-trip via From<&FFI_ScalarUDF>.
  • Native loader tests — 5 tests covering library load, missing-path error, and verifying both ABI flavors are discovered correctly.
  • End-to-end Spark suite (CometRustUdfSuite) — 2 tests, one per ABI: add_one_c (C ABI) and add_one_df (datafusion-ffi). Both pass, gated on -Dcomet.test.udfs.lib=<path>.
$ DYLD_LIBRARY_PATH=$JAVA_HOME/lib/server cargo test -p datafusion-comet --lib rust_udf
running 5 tests
test execution::rust_udf::loader::tests::missing_path_errors_open ... ok
test execution::rust_udf::loader::tests::load_test_udfs_succeeds ... ok
test execution::rust_udf::loader::tests::c_and_df_abi_have_expected_flavors ... ok
test execution::rust_udf::cache::tests::same_path_returns_same_arc ... ok
test execution::rust_udf::cache::tests::missing_path_propagates_error ... ok
test result: ok. 5 passed; 0 failed

$ ./mvnw -q -Pspark-3.5 -pl spark test -DwildcardSuites=org.apache.comet.CometRustUdfSuite \
    -Dtest=none -Denforcer.skip=true \
    -Dcomet.test.udfs.lib=$PWD/native/target/debug/libcomet_test_udfs.dylib
- C ABI: add_one_c returns id + 1 for a range (6 seconds, 465 milliseconds)
- datafusion-ffi: add_one_df returns id + 1 for a range (69 milliseconds)
Run completed in 9 seconds, 43 milliseconds.
All tests passed.

What this PR is not

  • Not a full replacement for feat: custom Rust UDFs [experimental] [skip ci] #4283 — half the test coverage, no user guide, no struct-typed/registerAll/error/panic paths. The point is to make the ABI surfaces comparable, not to ship a finished feature. Once a direction is picked, the loser of the comparison can be deleted and the winner extended.

…ip ci]

Adds custom Rust scalar UDF support using arrow-only FFI surfaces, as
suggested by paleolimbot and timsaucer in feedback on apache#4283. This is an
alternative implementation for comparison; see apache#4283 for the bespoke-ABI
version.

Two ABI flavors are provided side-by-side so reviewers can compare:

  1. C ABI (sedona-style): pure C-callable struct of function pointers,
     parameterized only by Arrow C Data Interface (FFI_ArrowSchema /
     FFI_ArrowArray). Decoupled from datafusion versions; future-portable
     to C/C++. Modeled on apache/sedona-db's SedonaCScalarKernel header.

  2. datafusion-ffi (FFI_ScalarUDF): wraps user's ScalarUDFImpl as
     FFI_ScalarUDF. Inherits full ScalarUDFImpl surface (variadic
     signatures, type coercion, metadata-aware return types) for free,
     at the cost of a major-version pin against datafusion-ffi.

A single library may export either or both; loader walks both discovery
functions. `comet-test-udfs` exposes `add_one_c` (C ABI) and `add_one_df`
(datafusion-ffi) and the e2e suite drives both through Spark.

Scope is intentionally minimal for comparison: scalar-only, happy-path
e2e tests only (no panic/error/signature-mismatch coverage). The Scala
JVM API mirrors apache#4283 exactly so the comparison is apples-to-apples.

Pieces:
- native/comet-udf-sdk: SDK with both ABI flavors + export macros
- native/comet-test-udfs: cdylib exposing one UDF per ABI
- native/core/src/execution/rust_udf: loader, cache, ImportedCScalarUdf
- native/core/src/comet_rust_udf_bridge.rs: JNI for validateLibrary/listUdfs
- native/proto: RustUdfCall message
- spark/.../udf: CometRustUDF.register, registry, exception classes,
  JNI bridge stub
- spark/.../serde/CometScalaUDF: dispatch ScalaUDF to RustUdfCall
  when udfName is in the registry
@andygrove andygrove changed the title feat: custom Rust UDFs via arrow-ffi (alternative to #4283) [experimental] [skip ci] feat: custom Rust UDFs via arrow-ffi (alternative to #4283) [experimental] May 27, 2026
Copy link
Copy Markdown
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for putting this together!

Comment on lines +544 to +545
// Filesystem path of the cdylib.
string library_path = 2;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the motivation for having this primarily live as shared objects! I am not sure if there's an opportunity / whether it's any easier to bundle these UDFs as .jars or Python packages but that could be implemented as a future field of this struct.

Comment on lines +548 to +549
// Expected return type, declared at register time on the JVM side.
DataType return_type = 4;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a fixed return type a requirement? (DataFusion lets you compute this on demand but perhaps there's a limiation here)

Comment on lines +130 to +135
/// Per-execution instance produced by [`CometCScalarKernel::new_impl`].
///
/// Not thread-safe; the caller must serialize access. Typically used on
/// one thread for one batch then dropped.
#[repr(C)]
pub struct CometCScalarKernelImpl {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

Comment on lines +18 to +23
//! datafusion-ffi flavor.
//!
//! Discovery returns a list of `FFI_ScalarUDF` values produced by
//! `datafusion_ffi`. The host imports each via
//! `ForeignScalarUDF::try_from`, yielding a `ScalarUDFImpl` it can plug
//! straight into its existing planner — no further adaptation needed.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also cool! (And hard to argue with the compact implementation!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants