Skip to content

Commit d7a1ff4

Browse files
timsaucerclaude
andcommitted
docs: pickle module security warning link + move user-facing prose
to Python wrapper Two changes: * Reference link to the pickle module's official security warning in `https://docs.python.org/3/library/pickle.html#module-pickle`. Added in the user guide ("Disabling Python UDF inlining" note and the Security warning block) and in the Python `SessionContext.with_python_udf_inlining` docstring. The unqualified phrase "pickle is unsafe on untrusted input" assumed reader background that not every datafusion-python user has. * Strip the user-facing prose docstring from the Rust `PySessionContext::with_python_udf_inlining` method. Python wrappers are what users see via `help()` and Sphinx; the Rust doc-comment duplicated the same text and risked drifting from the Python version. Matches the surrounding methods (`with_logical_extension_codec`, `with_physical_extension_codec`) which carry no Rust doc-comment for the same reason. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 977e88c commit d7a1ff4

3 files changed

Lines changed: 16 additions & 15 deletions

File tree

crates/core/src/context.rs

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1408,12 +1408,6 @@ impl PySessionContext {
14081408
})
14091409
}
14101410

1411-
/// Toggle inline encoding of Python-defined UDFs on this session's
1412-
/// codec stack. Disable when producing bytes that must round-trip
1413-
/// through a non-Python decoder, or when reconstructing bytes from
1414-
/// an untrusted source via `Expr.from_bytes` (cloudpickle.loads
1415-
/// will not be invoked on the receiver). Pickle remains unsafe on
1416-
/// untrusted input regardless of this flag.
14171411
pub fn with_python_udf_inlining(&self, enabled: bool) -> Self {
14181412
let logical_codec = Arc::new(
14191413
PythonLogicalCodec::new(Arc::clone(self.logical_codec.inner()))

docs/source/user-guide/io/distributing_work.rst

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -201,7 +201,10 @@ into ``cloudpickle.loads``.
201201
Note that :py:func:`pickle.loads` itself remains unsafe on untrusted
202202
input regardless of this setting — an attacker producing the outer
203203
pickle envelope can execute arbitrary code before the codec ever
204-
sees the bytes. The toggle only protects the
204+
sees the bytes (see the
205+
`pickle module security warning
206+
<https://docs.python.org/3/library/pickle.html#module-pickle>`_ in
207+
the Python standard library docs). The toggle only protects the
205208
:py:meth:`Expr.from_bytes` API surface.
206209

207210
Security
@@ -211,12 +214,14 @@ Security
211214

212215
Reconstructing an expression containing a Python UDF executes
213216
arbitrary Python code on the receiver — pickle is doing the work
214-
under the hood and pickle is unsafe on untrusted input. Only
215-
accept expressions from trusted sources. For untrusted-source
216-
workflows, disable Python UDF inlining (see above), restrict
217-
senders to built-in functions and pre-registered Rust-side UDFs,
218-
and avoid :py:func:`pickle.loads` on externally supplied bytes
219-
entirely.
217+
under the hood and pickle is unsafe on untrusted input (see the
218+
`pickle module security warning
219+
<https://docs.python.org/3/library/pickle.html#module-pickle>`_
220+
in the Python standard library docs). Only accept expressions
221+
from trusted sources. For untrusted-source workflows, disable
222+
Python UDF inlining (see above), restrict senders to built-in
223+
functions and pre-registered Rust-side UDFs, and avoid
224+
:py:func:`pickle.loads` on externally supplied bytes entirely.
220225

221226
Query-level distribution via datafusion-distributed
222227
---------------------------------------------------

python/datafusion/context.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1787,8 +1787,10 @@ def with_python_udf_inlining(self, enabled: bool) -> SessionContext:
17871787
source — ``cloudpickle.loads`` will not be invoked.
17881788
17891789
``pickle.loads`` on untrusted bytes remains unsafe regardless of
1790-
this setting; only the ``to_bytes`` / ``from_bytes`` API is
1791-
affected.
1790+
this setting (see the `pickle module security warning
1791+
<https://docs.python.org/3/library/pickle.html#module-pickle>`_
1792+
in the Python standard library docs). Only the
1793+
``to_bytes`` / ``from_bytes`` API is affected.
17921794
"""
17931795
new_internal = self.ctx.with_python_udf_inlining(enabled)
17941796
new = SessionContext.__new__(SessionContext)

0 commit comments

Comments
 (0)