Skip to content

add support for GCS warehouses in iceberg#19137

Open
bsmithgall wants to merge 1 commit intoapache:masterfrom
bsmithgall:gcs-iceberg
Open

add support for GCS warehouses in iceberg#19137
bsmithgall wants to merge 1 commit intoapache:masterfrom
bsmithgall:gcs-iceberg

Conversation

@bsmithgall
Copy link
Contributor

Description

Add GoogleCloudStorageInputSourceFactory to allow reading Iceberg data files from gs:// paths with "warehouseSource": "google". Add iceberg-gcp and google-cloud-storage dependencies to iceberg extension.

Also adds relevant tests and documentation updates.

Release note

new: allow Druid to read iceberg files from GCS


Key changed/added classes in this PR
  • GoogleCloudStorageInputSourceFactory

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests. (would love to tackle Add Integration tests for Iceberg connector #18017 but am not entirely sure what you'd like me to get that started. Testcontainers?)

Add `GoogleCloudStorageInputSourceFactory` to allow reading Iceberg data
files from `gs://` paths with `"warehouseSource": "google"`. Add
`iceberg-gcp` and `google-cloud-storage` dependencies to iceberg
extension.

Also adds relevant tests and documentation updates.
@a2l007
Copy link
Contributor

a2l007 commented Mar 19, 2026

Thanks for the PR @bsmithgall ! Is this compatible with all the supported Iceberg catalogs?

@bsmithgall
Copy link
Contributor Author

As far as I understand it should be; this just includes wiring for GCS to be a warehouseSource and doesn't touch the related icebergCatalog code. In IcebergInputSource.retrieveIcebergDatafiles(), the input source is created and then referenced. So by catalogue then:

  • Hive: should be supported as outlined in the the documentation by setting the io-impl configuration to org.apache.iceberg.gcp.gcs.GCSFileIO.
  • Rest: should be supported since it abstracts away storage
  • Glue: probably not relevant since it's an AWS product, but I suppose it would probably still work in the same manner as the Hive catalog
  • Local: not relevant for our case I don't think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants