Skip to content

Commit 0879309

Browse files
timsaucerclaude
andcommitted
docs: strip implementation jargon from Expr pickle docstrings
Previous wording for `Expr.to_bytes`, `Expr.__reduce__`, and the `datafusion.ipc` module header referenced ``PythonLogicalCodec`` and ``cloudpickle`` to explain what survives the wire. Neither name is importable from Python and the mechanism is irrelevant to the end user — only the resulting contract matters. Reword each docstring to describe the user-facing guarantee directly: * Python scalar UDFs travel inside the pickle / serialized blob, no pre-registration needed on the receiver. * Aggregate UDFs, window UDFs, and FFI-capsule UDFs travel by name only and require the receiver to have them registered (typically via `set_worker_ctx`). The implementation can change underneath without invalidating these docs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent fd46c94 commit 0879309

2 files changed

Lines changed: 16 additions & 15 deletions

File tree

python/datafusion/expr.py

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -436,9 +436,9 @@ def variant_name(self) -> str:
436436
def to_bytes(self, ctx: SessionContext | None = None) -> bytes:
437437
"""Serialize this expression to protobuf bytes.
438438
439-
Python scalar UDFs are cloudpickled inline by
440-
:class:`PythonLogicalCodec`, so the returned blob is
441-
self-contained for scalar UDFs. Aggregate / window / FFI UDFs
439+
Python scalar UDFs are inlined into the returned bytes — the
440+
receiver does not need to pre-register them. Aggregate UDFs,
441+
window UDFs, and UDFs imported via the FFI capsule protocol
442442
are stored by name only; the receiver must have them
443443
registered.
444444
@@ -467,13 +467,14 @@ def from_bytes(cls, buf: bytes, ctx: SessionContext | None = None) -> Expr:
467467
def __reduce__(self) -> tuple:
468468
"""Pickle protocol hook.
469469
470-
:class:`PythonLogicalCodec` cloudpickles referenced Python
471-
scalar UDFs directly into the proto wire format, so the
472-
returned blob is self-contained. On unpickle the bytes are
473-
decoded against the worker context set via
470+
Python scalar UDFs referenced by the expression are inlined
471+
into the pickle blob, so the receiver does not need to
472+
pre-register them. On unpickle the bytes are decoded against
473+
the worker context set via
474474
:func:`datafusion.ipc.set_worker_ctx` (or a fresh
475-
:class:`SessionContext` if none) for any remaining
476-
registry-resolved references.
475+
:class:`SessionContext` if none) for any registry-resolved
476+
references — aggregate UDFs, window UDFs, UDFs imported via
477+
the FFI capsule protocol.
477478
"""
478479
return (Expr._reconstruct, (self.to_bytes(),))
479480

python/datafusion/ipc.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -31,12 +31,12 @@
3131
... # register Rust-backed UDFs / aggregates / window functions here
3232
... set_worker_ctx(ctx)
3333
34-
Python scalar UDFs do not need pre-registration: their definitions are
35-
cloudpickled into the proto wire format by ``PythonLogicalCodec`` and
36-
reconstructed on the receiver automatically. The worker context is only
37-
needed when the expression references aggregate / window UDFs, table
38-
providers, or Rust-side function registrations the receiver wouldn't
39-
otherwise have.
34+
Python scalar UDFs do not need pre-registration: their definitions
35+
travel inside the pickled expression and are reconstructed on the
36+
receiver automatically. The worker context is only needed when the
37+
expression references aggregate UDFs, window UDFs, table providers,
38+
or UDFs imported via the FFI capsule protocol — anything the
39+
receiver would otherwise resolve from its registered functions.
4040
"""
4141

4242
from __future__ import annotations

0 commit comments

Comments
 (0)