[SPARK-56535][BUILD] Fix CI & base image build issues#55432
Open
holdenk wants to merge 43 commits intoapache:branch-3.5from
Open
[SPARK-56535][BUILD] Fix CI & base image build issues#55432holdenk wants to merge 43 commits intoapache:branch-3.5from
holdenk wants to merge 43 commits intoapache:branch-3.5from
Conversation
…o that we don't get a partial cache fetch error. Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
7bb3ffe to
737dd17
Compare
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
…k python packages that don't work in 3.9 Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
…it and building from src fails Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Contributor
Author
|
CC @devin-petersohn who's probably got a good handle on old versions of Python does this look reasonable-ish? |
| # Image for building and testing Spark branches. Based on Ubuntu 22.04. | ||
| # See also in https://hub.docker.com/_/ubuntu | ||
| FROM ubuntu:focal-20221019 | ||
| FROM ubuntu:jammy |
Contributor
There was a problem hiding this comment.
Should we pin this?
Contributor
Author
There was a problem hiding this comment.
I was going back and forth on this, given we do an apt-get update anyways personally I think pinning it is actually counter productive.
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
9abfefa to
6813916
Compare
…hich sort of comes under the WTF view of package management so lets do more explicit installs and also build up in such a way that the install actually works.
…ute on a 'builtin'
Docker Hub occasionally returns transient 5xx responses (e.g. 502 Bad Gateway on manifest HEAD requests), which currently aborts suites like PostgresKrbIntegrationSuite. Wrap the pull/inspect calls in an exponential-backoff retry so flaky GitHub CI runs survive these blips. https://claude.ai/code/session_01Py9jZBMMdNaCBJ4vvHc3kd Co-authored-by: Claude <noreply@anthropic.com>
mypy 0.991 (pinned for Python 3.8/3.9 support on this branch) crashes during cache serialization on pydantic v2's recursive JsonValue type. pydantic isn't a direct PySpark dep but gets pulled in transitively via mlflow in the lint env. follow_imports = skip prevents mypy from analyzing pydantic at all, sidestepping the assertion. Co-authored-by: Claude <noreply@anthropic.com>
* Workaround roxygen2 'cannot set an attribute on a builtin' in create-rd.sh When roxygen2 processes @family members for topics like dim.Rd, it calls add_s3_metadata to mark s3 generics. For SparkR, the lookup resolves to base R primitives (dim, nrow, ncol, ifelse, ...) that SparkR registers S4 methods for. R disallows setting attributes on builtins, so `class(val) <- c("s3generic", "function")` aborts with "cannot set an attribute on a 'builtin'", failing the whole Rd build. Monkey-patch roxygen2's internal add_s3_metadata in create-rd.sh to swallow that specific error and return the primitive unchanged, so documentation generation can proceed regardless of the installed roxygen2 version. * Skip cleanClosure for primitive functions in SparkR When SparkR's RDD machinery wraps a user closure, cleanClosure() walks the closure and calls environment(func) <- newEnv. For primitive functions like `+`, `max`, `min`, recent R versions raise the warning "setting environment(<primitive function>) is not possible and trying it is deprecated", which can be promoted to an error and breaks reduce/reduceByKey-style RDD ops (test_rdd.R count by values, maximum, minimum). Primitives have no R-level closure to clean, so return them unchanged. --------- Co-authored-by: Claude <noreply@anthropic.com>
… for releaswe we'll use conda anyways
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Update the base image build for the CI infra/docker file to a supported ubuntu and do automatic apt-get update on apt-get install failures.
Why are the changes needed?
Two reasons:
Does this PR introduce any user-facing change?
No, CI only.
How was this patch tested?
Running through CI
Was this patch authored or co-authored using generative AI tooling?
Auto-complete with copilot was turned on but none of it's suggestsions were useful except for some comments.
Claude was used to add adds resilient retry logic to Docker operations in the JDBC integration test suite to handle transient failures from Docker registries and daemons, which has been flaky during the test (added here instead of in 4 and backporting since the classes have been rewritten in 4).
Also used claude to suggest versions to pin back for roxygen issues during build.