Core, Data, Spark: Moving Spark to use the new FormatModel API by pvary · Pull Request #15328 · apache/iceberg

pvary · 2026-02-15T15:48:34Z

Part of: #12298
Implementation of the new API: #12774

SparkFormatModel and related changes

kevinjqliu · 2026-02-16T14:27:57Z

spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/source/SparkFileWriterFactory.java

+      return super.newPositionDeleteWriter(file, spec, partition);
+    } else {
+      LOG.info(
+          "Deprecated feature used. Position delete row schema is used to create the position delete writer.");


nit: should we mark this as @deprecated?

singhpk234 · 2026-02-16T16:26:31Z

can we run the benchmarks for spark to see how the benchmarks turns out to be post these : https://github.com/apache/iceberg/tree/main/spark/v4.1/spark/src/jmh/java/org/apache/iceberg/spark ?

pvary · 2026-02-16T23:07:40Z

can we run the benchmarks for spark to see how the benchmarks turns out to be post these : https://github.com/apache/iceberg/tree/main/spark/v4.1/spark/src/jmh/java/org/apache/iceberg/spark ?

Added some new tests for Parquet (readUsingRegistryReader, readWithProjectionUsingRegistryReader, readUsingRegistryReader, readWithProjectionUsingRegistryReader, writeUsingRegistryWriter, writeUsingRegistryWriter):

Benchmark                                                                          Mode  Cnt  Score   Error  Units
SparkParquetReadersFlatDataBenchmark.readUsingIcebergReader                          ss    5  0.311 ± 0.005   s/op
SparkParquetReadersFlatDataBenchmark.readUsingIcebergReaderUnsafe                    ss    5  0.396 ± 0.018   s/op
SparkParquetReadersFlatDataBenchmark.readUsingRegistryReader                         ss    5  0.326 ± 0.049   s/op
SparkParquetReadersFlatDataBenchmark.readUsingSparkReader                            ss    5  0.408 ± 0.008   s/op
SparkParquetReadersFlatDataBenchmark.readWithProjectionUsingIcebergReader            ss    5  0.185 ± 0.018   s/op
SparkParquetReadersFlatDataBenchmark.readWithProjectionUsingIcebergReaderUnsafe      ss    5  0.363 ± 0.018   s/op
SparkParquetReadersFlatDataBenchmark.readWithProjectionUsingRegistryReader           ss    5  0.213 ± 0.026   s/op
SparkParquetReadersFlatDataBenchmark.readWithProjectionUsingSparkReader              ss    5  0.273 ± 0.019   s/op
SparkParquetReadersNestedDataBenchmark.readUsingIcebergReader                        ss    5  0.184 ± 0.018   s/op
SparkParquetReadersNestedDataBenchmark.readUsingIcebergReaderUnsafe                  ss    5  0.219 ± 0.026   s/op
SparkParquetReadersNestedDataBenchmark.readUsingRegistryReader                       ss    5  0.179 ± 0.035   s/op
SparkParquetReadersNestedDataBenchmark.readUsingSparkReader                          ss    5  0.223 ± 0.015   s/op
SparkParquetReadersNestedDataBenchmark.readWithProjectionUsingIcebergReader          ss    5  0.077 ± 0.010   s/op
SparkParquetReadersNestedDataBenchmark.readWithProjectionUsingIcebergReaderUnsafe    ss    5  0.137 ± 0.007   s/op
SparkParquetReadersNestedDataBenchmark.readWithProjectionUsingRegistryReader         ss    5  0.080 ± 0.006   s/op
SparkParquetReadersNestedDataBenchmark.readWithProjectionUsingSparkReader            ss    5  0.103 ± 0.003   s/op
SparkParquetWritersFlatDataBenchmark.writeUsingIcebergWriter                         ss    5  2.602 ± 0.064   s/op
SparkParquetWritersFlatDataBenchmark.writeUsingRegistryWriter                        ss    5  2.593 ± 0.074   s/op
SparkParquetWritersFlatDataBenchmark.writeUsingSparkWriter                           ss    5  2.594 ± 0.054   s/op
SparkParquetWritersNestedDataBenchmark.writeUsingIcebergWriter                       ss    5  1.559 ± 0.022   s/op
SparkParquetWritersNestedDataBenchmark.writeUsingRegistryWriter                      ss    5  1.569 ± 0.043   s/op
SparkParquetWritersNestedDataBenchmark.writeUsingSparkWriter                         ss    5  1.595 ± 0.046   s/op

The differences are barely noticeable in any direction. There should not be any real difference as the resulting readers and writers are using the same code.

github-actions bot added spark core data labels Feb 15, 2026

kevinjqliu approved these changes Feb 16, 2026

View reviewed changes

pvary added 2 commits February 16, 2026 21:14

Core, Data, Spark: Moving Spark to use the new FormatModel API

0b97f29

Log message to WARN level

98b316f

pvary force-pushed the spark_model branch from 4ec4270 to 98b316f Compare February 16, 2026 20:17

JHM tests

03f35f1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core, Data, Spark: Moving Spark to use the new FormatModel API#15328

Core, Data, Spark: Moving Spark to use the new FormatModel API#15328
pvary wants to merge 3 commits intoapache:mainfrom
pvary:spark_model

pvary commented Feb 15, 2026

Uh oh!

kevinjqliu Feb 16, 2026

Uh oh!

singhpk234 commented Feb 16, 2026 •

edited

Loading

Uh oh!

pvary commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pvary commented Feb 15, 2026

Uh oh!

kevinjqliu Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

singhpk234 commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pvary commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

singhpk234 commented Feb 16, 2026 •

edited

Loading