Skip to content

[SPARK-56535][BUILD] Fix CI & base image build issues#55432

Open
holdenk wants to merge 43 commits intoapache:branch-3.5from
holdenk:SPARK-56535-fix-base-image-build
Open

[SPARK-56535][BUILD] Fix CI & base image build issues#55432
holdenk wants to merge 43 commits intoapache:branch-3.5from
holdenk:SPARK-56535-fix-base-image-build

Conversation

@holdenk
Copy link
Copy Markdown
Contributor

@holdenk holdenk commented Apr 20, 2026

What changes were proposed in this pull request?

Update the base image build for the CI infra/docker file to a supported ubuntu and do automatic apt-get update on apt-get install failures.

Why are the changes needed?

Two reasons:

  1. Ubuntu focal is EOL, we're already using 22.04 in the GHA directly, so we need to migrate to a non-EOL Ubuntu version for testing. This means that currently the build fails if there is cache miss because it can not do an apt-get install
  2. Docker caching means that the apt-get update can be cached BUT be stale resulting in a subsequent install failing.

Does this PR introduce any user-facing change?

No, CI only.

How was this patch tested?

Running through CI

Was this patch authored or co-authored using generative AI tooling?

Auto-complete with copilot was turned on but none of it's suggestsions were useful except for some comments.

Claude was used to add adds resilient retry logic to Docker operations in the JDBC integration test suite to handle transient failures from Docker registries and daemons, which has been flaky during the test (added here instead of in 4 and backporting since the classes have been rewritten in 4).

Also used claude to suggest versions to pin back for roxygen issues during build.

sfc-gh-hkarau and others added 7 commits April 17, 2026 16:01
…o that we don't get a partial cache fetch error.

Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
@holdenk holdenk force-pushed the SPARK-56535-fix-base-image-build branch from 7bb3ffe to 737dd17 Compare April 20, 2026 18:53
sfc-gh-hkarau and others added 5 commits April 20, 2026 12:04
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
…k python packages that don't work in 3.9

Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
…it and building from src fails

Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
@holdenk holdenk changed the title [WIP][SPARK-56535][BUILD] Fix base image build [SPARK-56535][BUILD] Fix base image build May 1, 2026
@holdenk holdenk marked this pull request as ready for review May 1, 2026 18:53
@holdenk
Copy link
Copy Markdown
Contributor Author

holdenk commented May 1, 2026

CC @devin-petersohn who's probably got a good handle on old versions of Python does this look reasonable-ish?

Comment thread dev/infra/Dockerfile
# Image for building and testing Spark branches. Based on Ubuntu 22.04.
# See also in https://hub.docker.com/_/ubuntu
FROM ubuntu:focal-20221019
FROM ubuntu:jammy
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we pin this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going back and forth on this, given we do an apt-get update anyways personally I think pinning it is actually counter productive.

Co-authored-by: Holden Karau <holden@pigscanfly.ca>
@holdenk holdenk force-pushed the SPARK-56535-fix-base-image-build branch from 9abfefa to 6813916 Compare May 2, 2026 07:01
@holdenk holdenk changed the title [SPARK-56535][BUILD] Fix base image build [SPARK-56535][BUILD] Fix CI & base image build issues May 6, 2026
sfc-gh-hkarau and others added 12 commits May 6, 2026 13:19
Docker Hub occasionally returns transient 5xx responses (e.g. 502 Bad
Gateway on manifest HEAD requests), which currently aborts suites like
PostgresKrbIntegrationSuite. Wrap the pull/inspect calls in an
exponential-backoff retry so flaky GitHub CI runs survive these blips.

https://claude.ai/code/session_01Py9jZBMMdNaCBJ4vvHc3kd

Co-authored-by: Claude <noreply@anthropic.com>
mypy 0.991 (pinned for Python 3.8/3.9 support on this branch) crashes
during cache serialization on pydantic v2's recursive JsonValue type.
pydantic isn't a direct PySpark dep but gets pulled in transitively via
mlflow in the lint env. follow_imports = skip prevents mypy from
analyzing pydantic at all, sidestepping the assertion.

Co-authored-by: Claude <noreply@anthropic.com>
* Workaround roxygen2 'cannot set an attribute on a builtin' in create-rd.sh

When roxygen2 processes @family members for topics like dim.Rd, it
calls add_s3_metadata to mark s3 generics. For SparkR, the lookup
resolves to base R primitives (dim, nrow, ncol, ifelse, ...) that
SparkR registers S4 methods for. R disallows setting attributes on
builtins, so `class(val) <- c("s3generic", "function")` aborts with
"cannot set an attribute on a 'builtin'", failing the whole Rd build.

Monkey-patch roxygen2's internal add_s3_metadata in create-rd.sh to
swallow that specific error and return the primitive unchanged, so
documentation generation can proceed regardless of the installed
roxygen2 version.

* Skip cleanClosure for primitive functions in SparkR

When SparkR's RDD machinery wraps a user closure, cleanClosure() walks
the closure and calls environment(func) <- newEnv. For primitive
functions like `+`, `max`, `min`, recent R versions raise the warning
"setting environment(<primitive function>) is not possible and trying
it is deprecated", which can be promoted to an error and breaks
reduce/reduceByKey-style RDD ops (test_rdd.R count by values, maximum,
minimum).

Primitives have no R-level closure to clean, so return them unchanged.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants