Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
875 changes: 774 additions & 101 deletions Cargo.lock

Large diffs are not rendered by default.

12 changes: 12 additions & 0 deletions TASK.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
❯ I have spark (a stripped down version for reference), iceberg (meant as a reference, don't change code hre), iceberg-rust (a work branch prepared) and datafusion (a work branch prepared) checked out here in ~/code.

Consider the spark program in @/Users/mattias.johansson/code/spark/programs/src/MaintenanceSingle.scala. I would like to replicate that in the datafusion repo.

- there is no datafusion--iceberg 'glue or adapter' (akin to @~/code/iceberg/spark/ as far as I know)
- support for 'maintenance primitives' in the rust ecosystem is rudimentary (at least compared to spark / iceberg java)
- add an integration test that does the same thing as the MaintenanceSingle job to the datafusion repo
- port whatever iceberg primitives needed from iceberg to iceberg-rust
- add whatever glue/adapter code necessary to express the maintenance operations succinctly
- if some part of it can not be done for whatever reason, document why in a README in the data fusion repo
- the integration test does NOT have to be SQL based, as a matter of fact I prefer if the entry points are typed Rust APIs
- you can use the local polaris (docker compose) catalog for testing and development. It already contains some tables, but feel free to create new ones as you see fit. (see @~/code/transform for how to integrate with it. Both the table updater and the polaris-query components talk to the local @~/code/transform/
6 changes: 6 additions & 0 deletions datafusion/core/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,12 @@ serde_json = { workspace = true }
sysinfo = "0.38.2"
test-utils = { path = "../../test-utils" }
tokio = { workspace = true, features = ["rt-multi-thread", "parking_lot", "fs"] }
iceberg = { path = "../../../iceberg-rust/crates/iceberg" }
iceberg-actions = { path = "../../../iceberg-rust/crates/actions" }
# arrow/parquet versions matching iceberg-rust's deps (v57) for integration test compatibility
arrow-array-iceberg = { package = "arrow-array", version = "57.1" }
arrow-schema-iceberg = { package = "arrow-schema", version = "57.1" }
parquet-iceberg = { package = "parquet", version = "57.1", features = ["async"] }

[package.metadata.cargo-machete]
ignored = ["datafusion-doc", "datafusion-macros", "dashmap"]
Expand Down
Loading