refactor(amber): stop hardcoding S3 in REST catalog init#4988
refactor(amber): stop hardcoding S3 in REST catalog init#4988mengw15 wants to merge 3 commits intoapache:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4988 +/- ##
============================================
- Coverage 42.72% 42.72% -0.01%
+ Complexity 2185 2184 -1
============================================
Files 1031 1031
Lines 38152 38146 -6
Branches 4004 4004
============================================
- Hits 16301 16298 -3
+ Misses 20831 20827 -4
- Partials 1020 1021 +1
*This pull request uses carry forward flags. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Thanks @mengw15 can you make the description a bit concise about why this change is needed? Also please add tests to confirm this change works. |
Thanks for the questions. Why the change is needed. When a Lakekeeper warehouse is created, the S3 settings (endpoint, region, credentials, path-style, etc.) are already registered against that warehouse on the server side. At REST-catalog init the client only needs the warehouse identifier and uri — Lakekeeper resolves and serves the S3 config from the warehouse record. The previously hardcoded s3.* properties from StorageConfig were therefore redundant on the client; deleting them lets each warehouse own its own storage settings instead of all warehouses being forced onto the system bucket. I'll tighten the PR description to say just this. About tests. End-to-end verification needs a running Lakekeeper, which CI doesn't have yet. #4276 (draft) adds Lakekeeper to CI; once that lands I'll layer an integration test on top of it that creates a warehouse with its own S3 settings, opens a REST catalog with only warehouse + uri, and round-trips a table. |
What changes were proposed in this PR?
Stop hardcoding
s3.endpoint,s3.region,s3.path-style-access,s3.access-key-idands3.secret-access-keyat REST-catalog init in bothIcebergUtil.createRestCatalog(Scala) andiceberg_utils.create_rest_catalog(Python). Both helpers now pass onlywarehouse+ cataloguri(and on the Scala side theFileIOimpl hint).Why: When a Lakekeeper warehouse is created, its S3 settings (endpoint, region, credentials, path-style) are registered against that warehouse on the server. At catalog init the client only needs
warehouse+uri— Lakekeeper resolves the S3 config from the warehouse record and serves it back. The hardcodedStorageConfig.s3*values on the client were redundant, and forcing them everywhere also pinned every warehouse to the single system bucket. Removing them lets each warehouse own its own storage settings.StorageConfig.s3*itself is kept —pytexera/storage/large_binary_manager.pystill uses it for the non-Icebergtexera-large-binariesbucket (R UDF large-binary support), which is out of scope.Any related issues, documentation, discussions?
Closes #4987
How was this PR tested?
sbt "WorkflowCore/compile"— passes; verifies no other Scala caller depends on the removed properties.ast.parse; the only caller (iceberg_catalog_instance.py) is updated to match the newcreate_rest_catalogsignature.End-to-end verification (warehouse with its own S3 settings → REST catalog opened with only
warehouse+uri→ table round-trip) requires a running Lakekeeper, which CI doesn't have today. #4276 (draft) wires Lakekeeper into CI; once that lands I'll add the integration test on top of it.Was this PR authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.7)