Skip to content

feat: Native Parquet Iceberg Data File Writes In Comet#4487

Open
jordepic wants to merge 3 commits into
apache:mainfrom
jordepic:main
Open

feat: Native Parquet Iceberg Data File Writes In Comet#4487
jordepic wants to merge 3 commits into
apache:mainfrom
jordepic:main

Conversation

@jordepic
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #4322.

Rationale for this change

Comet, up until this point, has mainly been focused with accelerating reads from iceberg tables. However, a significant resources are being spent across various companies in order to rewrite iceberg data. Large tables need to be compacted to maintain their sort/Z order, and general data pipelines may write significant amounts of iceberg data. Having to do a large transpose between columnar and row-wise data is inefficient, and we'd much prefer to go directly from arrow-based column batches to parquet on disk.

What changes are included in this PR?

This change is split into three parts.

  1. Splitting the existing iceberg V2 spark write command into two: a "writer" spark operator and a "committer" spark operator.
  • This allows the iceberg data file write to be treated identically to a spark V1 parquet writing command, thereby allowing using similar code to handle the operator
  • Without it, both the data file writing and committing would live outside of the scope of spark adaptive query execution operator wrappers - we instead want the write operator to be within AQE so that as the data feeds into it gets re-planned we can determine whether the upstream data of the write is columnar
  1. Determining which operators should be converted to native code
  • This follows a simple philosophy: only convert writes to native code if they produce an "identical" outcome as the Java path
  • Disclaimer: nothing truly produces an identical result because parquet row group flushing is different
  • Besides that though, we can generally convert "normal" writes (parquet, default settings, some flexibility to change other settings as outlined in iceberg-writes.md, no delete files since iceberg-rust doesn't support positional deletes/DVs)
  1. Native iceberg write operator
  • The JVM is responsible for computing all iceberg writing settings and using a protobuf object to pass them to rust
  • Rust uses iceberg-rs in order to write the file and return avro-encoded dummy manifest bytes back to the JVM

How are these changes tested?

This change is tested extensively.

  1. Tests to ensure that iceberg writes are replaced by our "two-operator" structure
  2. Tests to ensure that the comet JVM properly serializes relevant data to protocol buffer form to go to the native layer
  3. Tests to ensure that native writes are only performed under very specific iceberg table properties
  4. Tests to ensure that native writes actually function as expected
  5. Tests to ensure that compaction/sorting/z-ordering can now be fully accelerated with native writing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Writes to Apache Iceberg Tables

1 participant