Skip to content

ci: wire Lakekeeper and MinIO into GitHub Actions#4276

Draft
mengw15 wants to merge 3 commits intoapache:mainfrom
mengw15:Lakekeeper-CI-2
Draft

ci: wire Lakekeeper and MinIO into GitHub Actions#4276
mengw15 wants to merge 3 commits intoapache:mainfrom
mengw15:Lakekeeper-CI-2

Conversation

@mengw15
Copy link
Copy Markdown
Contributor

@mengw15 mengw15 commented Mar 9, 2026

What changes were proposed in this PR?

Add amber-rest and python-rest jobs to .github/workflows/build.yml
that boot Lakekeeper + MinIO and re-run the existing Scala/Python test
suites with STORAGE_ICEBERG_CATALOG_TYPE=rest. The existing amber and
python jobs continue to cover the Postgres-catalog path, so both
backends are now exercised end-to-end.

Job Mirrors Differs from sibling by
amber-rest amber (same DAO/PyBuilder/WorkflowCore/WorkflowOperator/WorkflowExecutionService jacoco set) adds MinIO + Lakekeeper services + REST/S3 env; coverage uploaded under amber-rest flag
python-rest python (pytest --cov) same services + env; coverage under python-rest flag; pinned to Python 3.12 (REST coverage is integration, not version-compat)

Two supporting changes outside the workflow YAML, both necessary for the
new jobs to actually run:

  • common/workflow-core/build.sbt: fork the test JVM only when
    STORAGE_ICEBERG_CATALOG_TYPE=rest is set, because iceberg-aws
    S3FileIO trips a ClassCastException under sbt's layered classloader.
    Postgres path is unaffected.
  • amber/src/test/python/.../test_iceberg_document.py and
    test_large_binary_manager.py: read catalog_type from the env var
    (default postgres), so the same fixtures drive both backends without
    hardcoded flips.

Any related issues, documentation, discussions?

Closes #4994 (sub-issue of #4126)

How was this PR tested?

CI itself — the new amber-rest and python-rest jobs run on every
push of this PR; failures (Lakekeeper boot, warehouse init, REST-path
test breakage) surface as red checks on those two job names.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7)

@mengw15 mengw15 marked this pull request as draft March 9, 2026 11:34
@github-actions github-actions Bot added engine dependencies Pull requests that update a dependency file ddl-change Changes to the TexeraDB DDL python ci changes related to CI service common labels Mar 9, 2026
@mengw15 mengw15 self-assigned this Apr 6, 2026
@mengw15 mengw15 changed the title feat: introduce Result Service using Lakekeeper as REST catalog for Iceberg - CI ci: wire Lakekeeper and MinIO into GitHub Actions May 9, 2026
Comment thread amber/requirements.txt
Comment on lines +47 to +49
s3fs==2025.9.0
aiobotocore==2.25.1
botocore==1.40.53
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these requirements are for pyamber. if those are dev-only requirements, please add to dev-requirements.txt

Comment thread common/workflow-core/build.sbt Outdated
Comment on lines +40 to +43
// Fork a separate JVM for tests to avoid sbt classloader conflicts
// (iceberg-aws S3FileIO hits ClassCastException with layered classloaders)
Test / fork := true
Test / baseDirectory := (ThisBuild / baseDirectory).value
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really have to bundle min io and its tests into workflow-core?

Comment on lines +50 to +52
s3_region="us-west-2",
s3_auth_username="texera_minio",
s3_auth_password="password",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are those changes intentional?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out! This PR is still a draft, and I’ll do some cleanup before marking it ready for review.

@mengw15 mengw15 force-pushed the Lakekeeper-CI-2 branch from 214c671 to a1a4d33 Compare May 9, 2026 01:13
@github-actions github-actions Bot removed engine ddl-change Changes to the TexeraDB DDL labels May 9, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 42.78%. Comparing base (2652315) to head (4d09d1e).

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #4276      +/-   ##
============================================
+ Coverage     42.72%   42.78%   +0.06%     
- Complexity     2185     2189       +4     
============================================
  Files          1031     1031              
  Lines         38152    38152              
  Branches       4004     4004              
============================================
+ Hits          16302    16325      +23     
+ Misses        20831    20809      -22     
+ Partials       1019     1018       -1     
Flag Coverage Δ
access-control-service 39.53% <ø> (ø)
agent-service 33.72% <ø> (ø)
amber 43.19% <ø> (-0.03%) ⬇️
amber-rest 43.23% <ø> (?)
computing-unit-managing-service 0.00% <ø> (ø)
config-service 0.00% <ø> (ø)
file-service 32.18% <ø> (ø)
frontend 33.08% <ø> (ø)
python 88.84% <ø> (-0.06%) ⬇️
python-rest 88.95% <ø> (?)
workflow-compiling-service 47.72% <ø> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Add `amber-rest` and `python-rest` jobs to build.yml that boot a
Lakekeeper REST catalog backed by MinIO and re-run the existing Scala
and Python test suites with `STORAGE_ICEBERG_CATALOG_TYPE=rest`. The
existing `amber` and `python` jobs continue to cover the Postgres
catalog path; both backends are now exercised end-to-end.

- `amber-rest`: starts MinIO + Lakekeeper (migrate -> serve -> health
  check -> init warehouse), runs `sbt "WorkflowCore/test"
  "WorkflowOperator/test" "WorkflowExecutionService/test"` against REST.
- `python-rest`: same service setup; runs `pytest -sv` with REST env.
- `common/workflow-core/build.sbt`: fork the test JVM only when
  `STORAGE_ICEBERG_CATALOG_TYPE=rest` is set, so iceberg-aws S3FileIO
  works under sbt's layered classloader. Postgres path unaffected.
- Two test fixtures now read `catalog_type` from the env var (default
  `postgres`) so the same suite drives both backends without code edits.

Closes apache#4994
@mengw15 mengw15 force-pushed the Lakekeeper-CI-2 branch from 1ca58ce to 33fb0d8 Compare May 9, 2026 01:33
mengw15 and others added 2 commits May 8, 2026 18:33
Lakekeeper requires both PG_DATABASE_URL_READ and PG_DATABASE_URL_WRITE
to be set (designed for primary + read-replica deployments). CI has a
single Postgres so both point at the same URL; declare them once via
the step env block and re-use across the migrate + serve docker
invocations instead of repeating the literal string four times per
job.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci changes related to CI common dependencies Pull requests that update a dependency file python service

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wire Lakekeeper and MinIO into GitHub Actions CI

3 participants