[SPARK-56854][PYTHON] Filter None values in DataFrame[Stream]Reader/Writer .option(s) by lavanv11 · Pull Request #55867 · apache/spark

lavanv11 · 2026-05-13T22:14:14Z

What changes were proposed in this pull request?

Filter None values in Classic PySpark's DataFrameReader, DataFrameWriter, DataFrameWriterV2, DataStreamReader, and DataStreamWriter .option(key, value) and .options(**kwargs) methods. After this change, option(key, None) is a no-op and options(**{key: None, ...}) drops the None entries before forwarding to the JVM. The loop-style methods mirror the shape of OptionUtils._set_opts at python/pyspark/sql/readwriter.py:41-53: for k, v in options.items(): if v is not None: ....

Why are the changes needed?

Classic and Spark Connect Python currently disagree on what option(key, None) means. Classic forwards Python None to the JVM as Java null, which several data sources interpret differently from "unset". For example, with spark.read.options(nullValue=None).schema("a STRING, b STRING").csv(path) and a row "",val, Classic produces [Row(a='', b='val')] while Connect produces [Row(a=None, b='val')] because Connect drops the None, the default nullValue of "" stays in effect, and the quoted empty cell matches it. This PR aligns Classic with Connect (which has filtered None since SPARK-49263) and with the long-standing OptionUtils._set_opts convention.

Does this PR introduce any user-facing change?

Yes. option(k, None) and options(**{k: None}) were previously forwarded to the JVM as null; they are now no-ops. A migration-guide entry under "Upgrading from PySpark 4.1 to 4.2" documents the change. To set an option to its default, omit it or pass None; to set it to an empty string, pass "" explicitly.

How was this patch tested?

New parity test test_option_none_is_filtered in ReadwriterTestsMixin pins the CSV nullValue=None case to [Row(a=None, b="val")] for both .option and .options. Because ReadwriterParityTests inherits the mixin, the regression test runs on Classic and on Spark Connect, giving cross-backend coverage automatically.

Additional defensive smoke tests guard the writer / V2 writer / streaming reader / streaming writer API contracts:

test_writer_option_none_chains_safely
test_v2_writer_option_none_chains_safely
test_stream_reader_option_none_chains_safely
test_stream_writer_option_none_chains_safely

Was this patch authored or co-authored using generative AI tooling?

Partially Generated-by: Claude Opus 4.7

…riter .option(s) Aligns Classic PySpark with the Spark Connect Python client (SPARK-49263) and OptionUtils._set_opts.

HyukjinKwon · 2026-05-14T05:06:41Z

Seems like one test is realted. could we take a quick look and fix?

[SPARK-56854][PYTHON] Filter None values in DataFrame[Stream]Reader/W…

46e11bc

…riter .option(s) Aligns Classic PySpark with the Spark Connect Python client (SPARK-49263) and OptionUtils._set_opts.

HyukjinKwon approved these changes May 13, 2026

View reviewed changes

[SPARK-56854][PYTHON][FOLLOWUP] retrigger CI

c4cfc51

Yicong-Huang approved these changes May 14, 2026

View reviewed changes

Add None filter to Connect-side writers/streaming reader

eaf796a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56854][PYTHON] Filter None values in DataFrame[Stream]Reader/Writer .option(s)#55867

[SPARK-56854][PYTHON] Filter None values in DataFrame[Stream]Reader/Writer .option(s)#55867
lavanv11 wants to merge 3 commits into
apache:masterfrom
lavanv11:pyspark_reader_inconsistencies

lavanv11 commented May 13, 2026

Uh oh!

HyukjinKwon commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lavanv11 commented May 13, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

HyukjinKwon commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants