Skip to content

Commit 9b4bbd6

Browse files
timsaucerclaude
andcommitted
docs(distributing-expressions): link to pickle docs, generalize UDF kinds
Two fixes in the intro paragraph: * Link to the standard library pickle docs rather than relying on the reader's familiarity with `pickle.dumps` / `pickle.loads`. * "Python scalar UDFs ride along" only covered scalar UDFs. With aggregate and window UDFs now also traveling inline, the line is reworded to call out all three kinds. Also updates the inline code comment in the worker-pool example. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 71760c5 commit 9b4bbd6

1 file changed

Lines changed: 5 additions & 3 deletions

File tree

docs/source/user-guide/io/distributing_expressions.rst

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,10 @@ framework with a per-worker initialization hook), and have each worker
2525
evaluate the expression against its own slice of data.
2626

2727
DataFusion expressions support this directly: they can be sent through
28-
:py:mod:`pickle` like any other Python object. Python scalar UDFs ride along
29-
inside the pickled bytes — the receiver does not need to pre-register them.
28+
Python's standard `pickle <https://docs.python.org/3/library/pickle.html>`_
29+
module like any other Python object. Python UDFs — scalar, aggregate, and
30+
window — travel inside the pickled bytes; the receiver does not need to
31+
pre-register them.
3032

3133
Basic worker-pool example
3234
-------------------------
@@ -42,7 +44,7 @@ Basic worker-pool example
4244
4345
def evaluate(blob_and_batch):
4446
blob, batch = blob_and_batch
45-
expr = pickle.loads(blob) # Python scalar UDFs ride along inline.
47+
expr = pickle.loads(blob) # Python UDFs travel inside the bytes.
4648
ctx = SessionContext()
4749
df = ctx.from_pydict({"a": batch})
4850
return df.with_column("result", expr).select("result").to_pydict()["result"]

0 commit comments

Comments
 (0)