[SPARK-56964][INFRA] Share Maven precompile artifact across maven_test matrix#55766
[SPARK-56964][INFRA] Share Maven precompile artifact across maven_test matrix#55766zhengruifeng wants to merge 6 commits into
Conversation
Follow-up to SPARK-56768. Adds a `precompile-maven` job to `maven_test.yml` that runs `mvn clean install -DskipTests` once and publishes the resulting `target/` trees plus `~/.m2/repository/org/apache/spark/` as a GitHub Actions artifact. Each of the 12 matrix entries now consumes that artifact instead of running its own `mvn clean install` from scratch. The Maven version of the optimization differs from the SBT one in two places: 1. We tar two pieces and upload as a single multi-file artifact: `compile-target.tar.gz` (workspace target/ trees) and `compile-m2-spark.tar.gz` (the Spark portion of the local Maven repository, needed for cross-module dependency resolution at `mvn -pl X test` time). 2. The artifact name is JDK-tagged `spark-maven-compile-<branch>-java<java>-<run_id>` because the build_maven_*.yml callers use different JDKs (17, 21, 25) and each produces non-interchangeable bytecode. Same optional/fallback design as SPARK-56768: - `precompile-maven` is `continue-on-error: true`; a failure does not fail the workflow run. - The matrix uses `if: (!cancelled())` so it runs even on precompile failure or cancellation. - The "Download precompiled artifact" step is gated on `needs.precompile-maven.result == 'success'` and has `continue-on-error: true`. - The "Extract precompiled artifact" step is gated on the download succeeding. - Inside the "Run tests" bash, the local `mvn clean install` is run only when `steps.extract-precompiled.outcome != 'success'`. Otherwise the artifact's classes/jars are used directly. Logs `Reusing precompiled artifact, skipping local Maven clean install.` for visibility. The hive-thriftserver special case (line ~228, "To avoid a compilation loop") still does its own `clean install` and is not touched by this PR; it does ~1 of 12 entries' worth of redundant work, which is acceptable. Estimated saving: roughly 11 of the 12 matrix entries skip ~25-40m of Maven clean install each; netting ~300m+ of CI compute saved per scheduled run, per JDK. Generated-by: Claude Code (Opus 4.7)
…connect Eleven of the 12 matrix entries wipe assembly/target/ immediately after extraction via the existing `mvn clean -pl assembly` step (SPARK-51628, which exists to keep the SPARK-51600 fix path covered by the daily Maven test). Including assembly in the artifact wastes upload + download bandwidth for those 11 entries. This commit: 1. Excludes `assembly/` from the find pattern in the precompile-maven "Package compile output" step. Uses `-prune` so any nested target/ dirs under assembly are also excluded. 2. Adds an explicit `mvn install -pl assembly` step in the matrix entry's bash, gated on `MODULES_TO_TEST = "connect"` and the artifact reuse path. The connect entry is the only one that needs the assembly built (SPARK-51628 leaves it out of the cleanup for that reason); now we build it on demand instead of carrying it around for entries that throw it away. The SPARK-51628 cleanup step (`mvn clean -pl assembly` for non-connect) still runs and is now a near-no-op for the reuse path; it remains a correctness guard for the fallback path that does run `clean install`. Generated-by: Claude Code (Opus 4.7)
Mirror the comment used on the existing matrix-job cache steps so a future maintainer knows the macOS gate on these new cache steps is a workaround for the upstream GHA hashFiles failure tracked in SPARK-54466 / actions/runner-images#13341, and can be removed once that issue is resolved. Generated-by: Claude Code (Opus 4.7)
The `mvn clean -pl assembly` step exists to wipe assembly/target/ so tests exercise the SPARK-51600 prepend fallback. On the precompile reuse path the assembly module is already excluded from the artifact, so the cleanup is a no-op (~5-10s of wasted Maven invocation per non-connect entry, ~50-100s per scheduled run). Move the cleanup into the fallback branch, where it's still needed. The reuse path's regression coverage is preserved by the artifact having no assembly to begin with. Generated-by: Claude Code (Opus 4.7)
REVERT BEFORE MERGE. Adds `push:` to the trigger list and removes the `if: github.repository == 'apache/spark'` job-level gate so each push to this branch on the fork fires build_maven.yml. This exercises maven_test.yml end-to-end with the precompile-maven changes from this PR. Generated-by: Claude Code (Opus 4.7)
c496b74 to
c43e36c
Compare
Measured CI time: before vs. afterComparing a recent scheduled Runs
Per-matrix-entry duration (sorted by "before")
Every entry drops by 28–53 min (≈40 min on average), matching the redundant Aggregate
On the wall-clock delta: the +1h 24m is mostly fork-runner queueing — in the after-run, matrix jobs started in a stagger between 10:38 and 11:47 (slowest entry waited ~1h 17m for a runner), whereas on apache/spark all 12 entries start within 3 s. Netting out the queue and looking at This also confirms the PR description's "~315–325m (~5h) net saved per run" estimate is actually conservative on this run (measured ~7h 25m). |
…s PR" This reverts commit c43e36c.
…t matrix ### What changes were proposed in this pull request? Follow-up to [SPARK-56768](https://issues.apache.org/jira/browse/SPARK-56768) (#55726), which introduced the same kind of shared-precompile pattern for the SBT-driven `build_and_test.yml`. This PR applies the analogous optimization to `.github/workflows/maven_test.yml` - the reusable workflow that the scheduled `build_maven*.yml` jobs call to run Maven-based scala tests across multiple JDK versions. Each of the 12 matrix entries today runs three steps back-to-back: 1. `mvn -DskipTests <profiles> clean install` (~25-40m of redundant compile, identical across all entries) 2. `mvn clean -pl assembly` (small cleanup, conditional on module) 3. `mvn -pl <TEST_MODULES> ... test` (the actual per-entry test phase) Step 1 is byte-equivalent across every matrix entry: same 9 Maven profiles, same `-DskipTests`, same `-Djava.version=<input>`. This PR factors it into a single `precompile-maven` job whose output every entry consumes. ### Concrete changes - New `precompile-maven` job runs `mvn -DskipTests <profiles> clean install` once on the same `runs-on: ${{ inputs.os }}` runner. The same shell wrapper, same `MAVEN_OPTS`, same profile set, same `JAVA_VERSION/-ea` substitution as the matrix entries use today. - The job tars two pieces and uploads them as a multi-file artifact: - `compile-target.tar.gz` - all `*/target/` directories from the workspace. - `compile-m2-spark.tar.gz` - `~/.m2/repository/org/apache/spark/`, needed by the matrix's `mvn -pl X test` to resolve cross-module Spark dependencies that aren't in the reactor. Artifact name: `spark-maven-compile-<branch>-java<java>-<run_id>`. The JDK is encoded in the name because `build_maven.yml`, `build_maven_java21.yml`, `build_maven_java25.yml` use different JDKs and bytecode is JDK-specific. - The `build` matrix job adds `precompile-maven` to `needs:` and uses `if: (!cancelled())` so the matrix runs even if precompile fails or is cancelled. - New "Download precompiled artifact" / "Extract precompiled artifact" steps with the same optional/fallback design as the SBT version: - `if: needs.precompile-maven.result == 'success'` on download. - `continue-on-error: true` on both steps. - `if: steps.download-precompiled.outcome == 'success'` on extract. - Inside the existing "Run tests" bash, the `mvn clean install` line is gated: ```bash if [ "${{ steps.extract-precompiled.outcome }}" = "success" ]; then echo "Reusing precompiled artifact, skipping local Maven clean install." else ./build/mvn ... clean install fi ``` The rest of the bash (the `clean -pl assembly` cleanup and the per-entry `test` invocations) is unchanged. ### Optional: graceful fallback if precompile fails Same pattern as the SBT extensions: - `precompile-maven` is `continue-on-error: true` - a failed or cancelled precompile does not fail the workflow. - Download/extract have `continue-on-error: true` and skip if the upstream step didn't succeed. - The bash runs the original `mvn clean install` whenever the artifact wasn't usable. So a precompile failure degrades to today's behavior, not a workflow failure. ### Why two artifact files Maven's `mvn -pl X test` resolves cross-module dependencies (other Spark modules) from `~/.m2/repository/org/apache/spark/` rather than from the workspace's `target/`. We need both: - `target/` so the matrix entry's main/test classes for module X are present (Maven sees they're up-to-date and skips re-compilation thanks to mtime preservation by `tar`). - `~/.m2/repository/org/apache/spark/` so the artifact resolution for inter-module Spark deps doesn't fall back to "module not found" or trigger a recursive build. The matrix entry extracts both into their respective locations (`./*/target/...` for the first, `~/.m2/repository/org/apache/spark/` for the second). ### Measured savings Comparing the apache/spark scheduled `build_maven.yml` run on 2026-05-17 ([25992372470](https://github.com/apache/spark/actions/runs/25992372470)) against the validation push of this PR on 2026-05-20 ([26153415924](https://github.com/zhengruifeng/spark/actions/runs/26153415924)), both JDK 17 / Scala 2.13 / Hadoop 3: | | Before | After | Δ | |---|---:|---:|---:| | Sum of 12 matrix entries | 17:58:04 | 9:44:11 | −8:13:53 | | + new `precompile-maven` job | | 0:49:24 | | | **Total CI compute per run** | **17:58:04** | **10:33:35** | **−7:24:29 (−41%)** | Every matrix entry drops by 28–53 min (≈40 min average), matching the redundant `mvn -DskipTests … clean install` (~25–40 min) that this PR removes from each entry. Multiplied across the three scheduled Maven workflows (JDK 17 / 21 / 25), the daily saving is ~22 h of org-shared CI capacity. See [this comment](#55766 (comment)) for the full per-entry breakdown and notes on the wall-clock trade-off (precompile + matrix is sequential, so end-to-end wall-clock grows by ~20 min on official infra; the much larger compute saving comes from removing the redundant compile from every matrix entry). The `sql/hive-thriftserver` matrix entry has a special case ("To avoid a compilation loop ... run `clean install` instead") that re-runs `clean install` regardless. In the measured run that entry still saved ~39 min, likely because the cached `~/.m2/repository/org/apache/spark/` from the precompile artifact shortens its re-run. ### Does this PR introduce _any_ user-facing change? No. CI infrastructure change only. ### How was this patch tested? Exercised end-to-end by validation run [26153415924](https://github.com/zhengruifeng/spark/actions/runs/26153415924) of `build_maven.yml` on the PR branch (JDK 17). Both expected log signatures appeared: - `precompile-maven` job: `[INFO] BUILD SUCCESS` from Maven, plus the `ls -lh compile-target.tar.gz compile-m2-spark.tar.gz` line. - Matrix entries' "Run tests" step: `Reusing precompiled artifact, skipping local Maven clean install.` The fallback path (full `mvn clean install` when the artifact is missing or extraction fails) is preserved by `continue-on-error: true` on the precompile job and the download/extract steps; on that path each matrix entry runs `mvn clean install` itself, identical to today's behavior. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Opus 4.7) Closes #55766 from zhengruifeng/share-precompile-maven-test. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com> (cherry picked from commit 74816d7) Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
…t matrix ### What changes were proposed in this pull request? Follow-up to [SPARK-56768](https://issues.apache.org/jira/browse/SPARK-56768) (#55726), which introduced the same kind of shared-precompile pattern for the SBT-driven `build_and_test.yml`. This PR applies the analogous optimization to `.github/workflows/maven_test.yml` - the reusable workflow that the scheduled `build_maven*.yml` jobs call to run Maven-based scala tests across multiple JDK versions. Each of the 12 matrix entries today runs three steps back-to-back: 1. `mvn -DskipTests <profiles> clean install` (~25-40m of redundant compile, identical across all entries) 2. `mvn clean -pl assembly` (small cleanup, conditional on module) 3. `mvn -pl <TEST_MODULES> ... test` (the actual per-entry test phase) Step 1 is byte-equivalent across every matrix entry: same 9 Maven profiles, same `-DskipTests`, same `-Djava.version=<input>`. This PR factors it into a single `precompile-maven` job whose output every entry consumes. ### Concrete changes - New `precompile-maven` job runs `mvn -DskipTests <profiles> clean install` once on the same `runs-on: ${{ inputs.os }}` runner. The same shell wrapper, same `MAVEN_OPTS`, same profile set, same `JAVA_VERSION/-ea` substitution as the matrix entries use today. - The job tars two pieces and uploads them as a multi-file artifact: - `compile-target.tar.gz` - all `*/target/` directories from the workspace. - `compile-m2-spark.tar.gz` - `~/.m2/repository/org/apache/spark/`, needed by the matrix's `mvn -pl X test` to resolve cross-module Spark dependencies that aren't in the reactor. Artifact name: `spark-maven-compile-<branch>-java<java>-<run_id>`. The JDK is encoded in the name because `build_maven.yml`, `build_maven_java21.yml`, `build_maven_java25.yml` use different JDKs and bytecode is JDK-specific. - The `build` matrix job adds `precompile-maven` to `needs:` and uses `if: (!cancelled())` so the matrix runs even if precompile fails or is cancelled. - New "Download precompiled artifact" / "Extract precompiled artifact" steps with the same optional/fallback design as the SBT version: - `if: needs.precompile-maven.result == 'success'` on download. - `continue-on-error: true` on both steps. - `if: steps.download-precompiled.outcome == 'success'` on extract. - Inside the existing "Run tests" bash, the `mvn clean install` line is gated: ```bash if [ "${{ steps.extract-precompiled.outcome }}" = "success" ]; then echo "Reusing precompiled artifact, skipping local Maven clean install." else ./build/mvn ... clean install fi ``` The rest of the bash (the `clean -pl assembly` cleanup and the per-entry `test` invocations) is unchanged. ### Optional: graceful fallback if precompile fails Same pattern as the SBT extensions: - `precompile-maven` is `continue-on-error: true` - a failed or cancelled precompile does not fail the workflow. - Download/extract have `continue-on-error: true` and skip if the upstream step didn't succeed. - The bash runs the original `mvn clean install` whenever the artifact wasn't usable. So a precompile failure degrades to today's behavior, not a workflow failure. ### Why two artifact files Maven's `mvn -pl X test` resolves cross-module dependencies (other Spark modules) from `~/.m2/repository/org/apache/spark/` rather than from the workspace's `target/`. We need both: - `target/` so the matrix entry's main/test classes for module X are present (Maven sees they're up-to-date and skips re-compilation thanks to mtime preservation by `tar`). - `~/.m2/repository/org/apache/spark/` so the artifact resolution for inter-module Spark deps doesn't fall back to "module not found" or trigger a recursive build. The matrix entry extracts both into their respective locations (`./*/target/...` for the first, `~/.m2/repository/org/apache/spark/` for the second). ### Measured savings Comparing the apache/spark scheduled `build_maven.yml` run on 2026-05-17 ([25992372470](https://github.com/apache/spark/actions/runs/25992372470)) against the validation push of this PR on 2026-05-20 ([26153415924](https://github.com/zhengruifeng/spark/actions/runs/26153415924)), both JDK 17 / Scala 2.13 / Hadoop 3: | | Before | After | Δ | |---|---:|---:|---:| | Sum of 12 matrix entries | 17:58:04 | 9:44:11 | −8:13:53 | | + new `precompile-maven` job | | 0:49:24 | | | **Total CI compute per run** | **17:58:04** | **10:33:35** | **−7:24:29 (−41%)** | Every matrix entry drops by 28–53 min (≈40 min average), matching the redundant `mvn -DskipTests … clean install` (~25–40 min) that this PR removes from each entry. Multiplied across the three scheduled Maven workflows (JDK 17 / 21 / 25), the daily saving is ~22 h of org-shared CI capacity. See [this comment](#55766 (comment)) for the full per-entry breakdown and notes on the wall-clock trade-off (precompile + matrix is sequential, so end-to-end wall-clock grows by ~20 min on official infra; the much larger compute saving comes from removing the redundant compile from every matrix entry). The `sql/hive-thriftserver` matrix entry has a special case ("To avoid a compilation loop ... run `clean install` instead") that re-runs `clean install` regardless. In the measured run that entry still saved ~39 min, likely because the cached `~/.m2/repository/org/apache/spark/` from the precompile artifact shortens its re-run. ### Does this PR introduce _any_ user-facing change? No. CI infrastructure change only. ### How was this patch tested? Exercised end-to-end by validation run [26153415924](https://github.com/zhengruifeng/spark/actions/runs/26153415924) of `build_maven.yml` on the PR branch (JDK 17). Both expected log signatures appeared: - `precompile-maven` job: `[INFO] BUILD SUCCESS` from Maven, plus the `ls -lh compile-target.tar.gz compile-m2-spark.tar.gz` line. - Matrix entries' "Run tests" step: `Reusing precompiled artifact, skipping local Maven clean install.` The fallback path (full `mvn clean install` when the artifact is missing or extraction fails) is preserved by `continue-on-error: true` on the precompile job and the download/extract steps; on that path each matrix entry runs `mvn clean install` itself, identical to today's behavior. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Opus 4.7) Closes #55766 from zhengruifeng/share-precompile-maven-test. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com> (cherry picked from commit 74816d7) Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
|
thanks, merged to master/4.x/4.2 |
|
late LGTM |
|
A new Maven job has been launched to verify the effectiveness: |
What changes were proposed in this pull request?
Follow-up to SPARK-56768 (#55726), which introduced the same kind of shared-precompile pattern for the SBT-driven
build_and_test.yml. This PR applies the analogous optimization to.github/workflows/maven_test.yml- the reusable workflow that the scheduledbuild_maven*.ymljobs call to run Maven-based scala tests across multiple JDK versions.Each of the 12 matrix entries today runs three steps back-to-back:
mvn -DskipTests <profiles> clean install(~25-40m of redundant compile, identical across all entries)mvn clean -pl assembly(small cleanup, conditional on module)mvn -pl <TEST_MODULES> ... test(the actual per-entry test phase)Step 1 is byte-equivalent across every matrix entry: same 9 Maven profiles, same
-DskipTests, same-Djava.version=<input>. This PR factors it into a singleprecompile-mavenjob whose output every entry consumes.Concrete changes
New
precompile-mavenjob runsmvn -DskipTests <profiles> clean installonce on the sameruns-on: ${{ inputs.os }}runner. The same shell wrapper, sameMAVEN_OPTS, same profile set, sameJAVA_VERSION/-easubstitution as the matrix entries use today.The job tars two pieces and uploads them as a multi-file artifact:
compile-target.tar.gz- all*/target/directories from the workspace.compile-m2-spark.tar.gz-~/.m2/repository/org/apache/spark/, needed by the matrix'smvn -pl X testto resolve cross-module Spark dependencies that aren't in the reactor.Artifact name:
spark-maven-compile-<branch>-java<java>-<run_id>. The JDK is encoded in the name becausebuild_maven.yml,build_maven_java21.yml,build_maven_java25.ymluse different JDKs and bytecode is JDK-specific.The
buildmatrix job addsprecompile-maventoneeds:and usesif: (!cancelled())so the matrix runs even if precompile fails or is cancelled.New "Download precompiled artifact" / "Extract precompiled artifact" steps with the same optional/fallback design as the SBT version:
if: needs.precompile-maven.result == 'success'on download.continue-on-error: trueon both steps.if: steps.download-precompiled.outcome == 'success'on extract.Inside the existing "Run tests" bash, the
mvn clean installline is gated:The rest of the bash (the
clean -pl assemblycleanup and the per-entrytestinvocations) is unchanged.Optional: graceful fallback if precompile fails
Same pattern as the SBT extensions:
precompile-maveniscontinue-on-error: true- a failed or cancelled precompile does not fail the workflow.continue-on-error: trueand skip if the upstream step didn't succeed.mvn clean installwhenever the artifact wasn't usable.So a precompile failure degrades to today's behavior, not a workflow failure.
Why two artifact files
Maven's
mvn -pl X testresolves cross-module dependencies (other Spark modules) from~/.m2/repository/org/apache/spark/rather than from the workspace'starget/. We need both:target/so the matrix entry's main/test classes for module X are present (Maven sees they're up-to-date and skips re-compilation thanks to mtime preservation bytar).~/.m2/repository/org/apache/spark/so the artifact resolution for inter-module Spark deps doesn't fall back to "module not found" or trigger a recursive build.The matrix entry extracts both into their respective locations (
./*/target/...for the first,~/.m2/repository/org/apache/spark/for the second).Measured savings
Comparing the apache/spark scheduled
build_maven.ymlrun on 2026-05-17 (25992372470) against the validation push of this PR on 2026-05-20 (26153415924), both JDK 17 / Scala 2.13 / Hadoop 3:precompile-mavenjobEvery matrix entry drops by 28–53 min (≈40 min average), matching the redundant
mvn -DskipTests … clean install(~25–40 min) that this PR removes from each entry. Multiplied across the three scheduled Maven workflows (JDK 17 / 21 / 25), the daily saving is ~22 h of org-shared CI capacity.See this comment for the full per-entry breakdown and notes on the wall-clock trade-off (precompile + matrix is sequential, so end-to-end wall-clock grows by ~20 min on official infra; the much larger compute saving comes from removing the redundant compile from every matrix entry).
The
sql/hive-thriftservermatrix entry has a special case ("To avoid a compilation loop ... runclean installinstead") that re-runsclean installregardless. In the measured run that entry still saved ~39 min, likely because the cached~/.m2/repository/org/apache/spark/from the precompile artifact shortens its re-run.Does this PR introduce any user-facing change?
No. CI infrastructure change only.
How was this patch tested?
Exercised end-to-end by validation run 26153415924 of
build_maven.ymlon the PR branch (JDK 17). Both expected log signatures appeared:precompile-mavenjob:[INFO] BUILD SUCCESSfrom Maven, plus thels -lh compile-target.tar.gz compile-m2-spark.tar.gzline.Reusing precompiled artifact, skipping local Maven clean install.The fallback path (full
mvn clean installwhen the artifact is missing or extraction fails) is preserved bycontinue-on-error: trueon the precompile job and the download/extract steps; on that path each matrix entry runsmvn clean installitself, identical to today's behavior.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.7)