Skip to content

Commit 08312bb

Browse files
committed
Updated the Arrow streaming documentation to describe incremental execution, remove the note block, and highlight lazy batch retrieval when using __arrow_c_stream__
1 parent 8c23587 commit 08312bb

File tree

1 file changed

+5
-10
lines changed

1 file changed

+5
-10
lines changed

docs/source/user-guide/io/arrow.rst

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -60,16 +60,11 @@ Exporting from DataFusion
6060
DataFusion DataFrames implement ``__arrow_c_stream__`` PyCapsule interface, so any
6161
Python library that accepts these can import a DataFusion DataFrame directly.
6262

63-
.. note::
64-
Invoking ``__arrow_c_stream__`` still triggers execution of the underlying
65-
query, but batches are yielded incrementally rather than materialized all at
66-
once in memory. Consumers can process the stream as it arrives, avoiding the
67-
memory overhead of a full
68-
:py:func:`datafusion.dataframe.DataFrame.collect`.
69-
70-
For an example of this streamed execution and its memory safety, see the
71-
``test_arrow_c_stream_large_dataset`` unit test in
72-
:mod:`python.tests.test_io`.
63+
Invoking ``__arrow_c_stream__`` triggers execution of the underlying query, but
64+
batches are yielded incrementally rather than materialized all at once in memory.
65+
Consumers can process the stream as it arrives, avoiding the memory overhead of a
66+
full :py:func:`datafusion.dataframe.DataFrame.collect`. The stream executes lazily,
67+
letting downstream readers pull batches on demand.
7368

7469

7570
.. ipython:: python

0 commit comments

Comments
 (0)