refactor(amber): stop hardcoding S3 in REST catalog init by mengw15 · Pull Request #4988 · apache/texera

mengw15 · 2026-05-08T20:49:40Z

What changes were proposed in this PR?

Stop hardcoding s3.endpoint, s3.region, s3.path-style-access, s3.access-key-id and s3.secret-access-key at REST-catalog init in both IcebergUtil.createRestCatalog (Scala) and iceberg_utils.create_rest_catalog (Python). Both helpers now pass only warehouse + catalog uri (and on the Scala side the FileIO impl hint).

Why: When a Lakekeeper warehouse is created, its S3 settings (endpoint, region, credentials, path-style) are registered against that warehouse on the server. At catalog init the client only needs warehouse + uri — Lakekeeper resolves the S3 config from the warehouse record and serves it back. The hardcoded StorageConfig.s3* values on the client were redundant, and forcing them everywhere also pinned every warehouse to the single system bucket. Removing them lets each warehouse own its own storage settings.

StorageConfig.s3* itself is kept — pytexera/storage/large_binary_manager.py still uses it for the non-Iceberg texera-large-binaries bucket (R UDF large-binary support), which is out of scope.

Any related issues, documentation, discussions?

Closes #4987

How was this PR tested?

sbt "WorkflowCore/compile" — passes; verifies no other Scala caller depends on the removed properties.
Python edits parse cleanly via ast.parse; the only caller (iceberg_catalog_instance.py) is updated to match the new create_rest_catalog signature.

End-to-end verification (warehouse with its own S3 settings → REST catalog opened with only warehouse + uri → table round-trip) requires a running Lakekeeper, which CI doesn't have today. #4276 (draft) wires Lakekeeper into CI; once that lands I'll add the integration test on top of it.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7)

codecov-commenter · 2026-05-08T20:52:51Z

Codecov Report

❌ Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 42.72%. Comparing base (deb0e2c) to head (754c29a).

Files with missing lines	Patch %	Lines
...ala/org/apache/texera/amber/util/IcebergUtil.scala	0.00%	3 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #4988      +/-   ##
============================================
- Coverage     42.72%   42.72%   -0.01%     
+ Complexity     2185     2184       -1     
============================================
  Files          1031     1031              
  Lines         38152    38146       -6     
  Branches       4004     4004              
============================================
- Hits          16301    16298       -3     
+ Misses        20831    20827       -4     
- Partials       1020     1021       +1

Flag	Coverage Δ		*Carryforward flag
access-control-service	`39.53% <ø> (ø)`
agent-service	`33.72% <ø> (ø)`		Carriedforward from 0609d3d
amber	`43.21% <0.00%> (-0.01%)`	⬇️
computing-unit-managing-service	`0.00% <ø> (ø)`
config-service	`0.00% <ø> (ø)`
file-service	`32.18% <ø> (ø)`
frontend	`33.08% <ø> (ø)`		Carriedforward from 0609d3d
python	`88.90% <ø> (ø)`
workflow-compiling-service	`47.72% <ø> (ø)`

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Yicong-Huang · 2026-05-08T23:48:03Z

Thanks @mengw15 can you make the description a bit concise about why this change is needed? Also please add tests to confirm this change works.

mengw15 · 2026-05-09T00:30:57Z

Thanks @mengw15 can you make the description a bit concise about why this change is needed? Also please add tests to confirm this change works.

Thanks for the questions.

Why the change is needed. When a Lakekeeper warehouse is created, the S3 settings (endpoint, region, credentials, path-style, etc.) are already registered against that warehouse on the server side. At REST-catalog init the client only needs the warehouse identifier and uri — Lakekeeper resolves and serves the S3 config from the warehouse record. The previously hardcoded s3.* properties from StorageConfig were therefore redundant on the client; deleting them lets each warehouse own its own storage settings instead of all warehouses being forced onto the system bucket. I'll tighten the PR description to say just this.

About tests. End-to-end verification needs a running Lakekeeper, which CI doesn't have yet. #4276 (draft) adds Lakekeeper to CI; once that lands I'll layer an integration test on top of it that creates a warehouse with its own S3 settings, opens a REST catalog with only warehouse + uri, and round-trips a table.

mengw15 added 2 commits May 8, 2026 13:16

change

25f6020

change

de6d9b2

github-actions Bot added python refactor Refactor the code common labels May 8, 2026

github-actions Bot assigned mengw15 May 8, 2026

mengw15 requested a review from bobbai00 May 8, 2026 20:50

Merge branch 'main' into refactor/remove-hardcoded-s3-from-rest-catalog

754c29a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(amber): stop hardcoding S3 in REST catalog init#4988

refactor(amber): stop hardcoding S3 in REST catalog init#4988
mengw15 wants to merge 3 commits intoapache:mainfrom
mengw15:refactor/remove-hardcoded-s3-from-rest-catalog

mengw15 commented May 8, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented May 8, 2026 •

edited

Loading

Uh oh!

Yicong-Huang commented May 8, 2026

Uh oh!

mengw15 commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mengw15 commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this PR?

Any related issues, documentation, discussions?

How was this PR tested?

Was this PR authored or co-authored using generative AI tooling?

Uh oh!

codecov-commenter commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Yicong-Huang commented May 8, 2026

Uh oh!

mengw15 commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mengw15 commented May 8, 2026 •

edited

Loading

codecov-commenter commented May 8, 2026 •

edited

Loading