[SPARK-54597][BUILD] Upgrade lz4-java to 1.10.0#53327
[SPARK-54597][BUILD] Upgrade lz4-java to 1.10.0#53327dongjoon-hyun wants to merge 4 commits intoapache:masterfrom
lz4-java to 1.10.0#53327Conversation
|
This is a dependency-only PR, cc @dbtsai , @HyukjinKwon , @LuciferYang , @yawkat , @SteNicholas . To be clear, the security issue is not a scope of this PR. |
|
Thank you, @HyukjinKwon . I'm going to add one more commit to ban this library explicitly. |
|
@dongjoon-hyun, does it still need to switch fastDecompressor to safeDecompressor after upgrade? |
Exactly, that's @dbtsai 's contribution, @SteNicholas . This PR doesn't aim to do that. He will rebase his PR after merging this independently. |
|
Thank you, @LuciferYang ! I'll update it. |
|
@dongjoon-hyun, I just confirm whether to switch fastDecompressor to safeDecompressor after upgrade to 1.10.0. |
@SteNicholas What I can say here is that it's beyond of this PR. Technically, we don't know what decision we are going to make eventually on the following yet because it's still |
|
Please don't get me wrong. I'm trying to help that PR move forward by reducing the gap. |
|
LGTM ~ |
|
Thank you all! Merged to master for Apache Spark 4.2.0 (for now) |
|
Just FYI, lz4 is famous for its ultra-fast speed, the upgrade is not free, my test shows it has perf impact - #53453 |
|
@pan3793 please make an issue on the lz4-java project for this as well. there may be room for improvement on our side, maybe some compiler flags. not sure if i'll have time to look at it, but maybe someone else will. |
|
@yawkat okay, will open an issue and forward this message. update: opened yawkat/lz4-java#30 |
|
thanks! |
|
@pan3793 Of course, it's one pf the key consideration points. |
… in SBT build ### What changes were proposed in this pull request? This PR is a followup of #53327 that explicitly exclude lz4-java in SBT build. ### Why are the changes needed? For some reasons, SBT still tries to look for it: ``` 2025-12-21T08:16:32.3447761Z [info] Jar hash: 61bb3bb74c3d32b7ae527652d9d8c46efa6d04fc 2025-12-21T08:16:33.2910680Z [error] lmcoursier.internal.shaded.coursier.error.FetchError$DownloadingArtifacts: Error fetching artifacts: 2025-12-21T08:16:33.2912312Z [error] file:/home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar: not found: /home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar 2025-12-21T08:16:33.2913430Z [error] 2025-12-21T08:16:33.2914325Z [error] at lmcoursier.internal.shaded.coursier.Artifacts$.$anonfun$fetchArtifacts$9(Artifacts.scala:365) 2025-12-21T08:16:33.2915570Z [error] at lmcoursier.internal.shaded.coursier.util.Task$.$anonfun$flatMap$extension$1(Task.scala:14) 2025-12-21T08:16:33.2916784Z [error] at lmcoursier.internal.shaded.coursier.util.Task$.$anonfun$flatMap$extension$1$adapted(Task.scala:14) 2025-12-21T08:16:33.2917884Z [error] at lmcoursier.internal.shaded.coursier.util.Task$.wrap(Task.scala:82) 2025-12-21T08:16:33.2918859Z [error] at lmcoursier.internal.shaded.coursier.util.Task$.$anonfun$flatMap$2(Task.scala:14) 2025-12-21T08:16:33.2919771Z [error] at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:307) 2025-12-21T08:16:33.2920635Z [error] at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:51) 2025-12-21T08:16:33.2921512Z [error] at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:74) 2025-12-21T08:16:33.2922869Z [error] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) 2025-12-21T08:16:33.2924071Z [error] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) 2025-12-21T08:16:33.2925145Z [error] at java.base/java.lang.Thread.run(Thread.java:840) 2025-12-21T08:16:33.2926563Z [error] Caused by: lmcoursier.internal.shaded.coursier.cache.ArtifactError$NotFound: not found: /home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar 2025-12-21T08:16:33.2928288Z [error] at lmcoursier.internal.shaded.coursier.cache.internal.Downloader.$anonfun$checkFileExists$1(Downloader.scala:603) 2025-12-21T08:16:33.2929450Z [error] at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) 2025-12-21T08:16:33.2930146Z [error] at scala.util.Success.$anonfun$map$1(Try.scala:255) 2025-12-21T08:16:33.2930723Z [error] at scala.util.Success.map(Try.scala:213) 2025-12-21T08:16:33.2931387Z [error] at scala.concurrent.Future.$anonfun$map$1(Future.scala:292) 2025-12-21T08:16:33.2932190Z [error] at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:42) 2025-12-21T08:16:33.2933052Z [error] at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:74) 2025-12-21T08:16:33.2934069Z [error] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) 2025-12-21T08:16:33.2938645Z [error] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) 2025-12-21T08:16:33.2939423Z [error] at java.base/java.lang.Thread.run(Thread.java:840) 2025-12-21T08:16:33.2940265Z [error] lmcoursier.internal.shaded.coursier.error.FetchError$DownloadingArtifacts: Error fetching artifacts: 2025-12-21T08:16:33.2941556Z [error] file:/home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar: not found: /home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar 2025-12-21T08:16:33.2942421Z [error] 2025-12-21T08:16:33.2943007Z [error] at lmcoursier.internal.shaded.coursier.Artifacts$.$anonfun$fetchArtifacts$9(Artifacts.scala:365) 2025-12-21T08:16:33.2944078Z [error] at lmcoursier.internal.shaded.coursier.util.Task$.$anonfun$flatMap$extension$1(Task.scala:14) 2025-12-21T08:16:33.2945450Z [error] at lmcoursier.internal.shaded.coursier.util.Task$.$anonfun$flatMap$extension$1$adapted(Task.scala:14) 2025-12-21T08:16:33.2946441Z [error] at lmcoursier.internal.shaded.coursier.util.Task$.wrap(Task.scala:82) 2025-12-21T08:16:33.2947312Z [error] at lmcoursier.internal.shaded.coursier.util.Task$.$anonfun$flatMap$2(Task.scala:14) 2025-12-21T08:16:33.2948105Z [error] at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:307) 2025-12-21T08:16:33.2948811Z [error] at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:51) 2025-12-21T08:16:33.2949547Z [error] at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:74) 2025-12-21T08:16:33.2950403Z [error] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) 2025-12-21T08:16:33.2951391Z [error] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) 2025-12-21T08:16:33.2952135Z [error] at java.base/java.lang.Thread.run(Thread.java:840) 2025-12-21T08:16:33.2953218Z [error] Caused by: lmcoursier.internal.shaded.coursier.cache.ArtifactError$NotFound: not found: /home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar 2025-12-21T08:16:33.2954841Z [error] at lmcoursier.internal.shaded.coursier.cache.internal.Downloader.$anonfun$checkFileExists$1(Downloader.scala:603) 2025-12-21T08:16:33.2955801Z [error] at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) 2025-12-21T08:16:33.2956376Z [error] at scala.util.Success.$anonfun$map$1(Try.scala:255) 2025-12-21T08:16:33.2956861Z [error] at scala.util.Success.map(Try.scala:213) 2025-12-21T08:16:33.2957389Z [error] at scala.concurrent.Future.$anonfun$map$1(Future.scala:292) 2025-12-21T08:16:33.2958305Z [error] at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:42) 2025-12-21T08:16:33.2959058Z [error] at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:74) 2025-12-21T08:16:33.2959915Z [error] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) 2025-12-21T08:16:33.2960919Z [error] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) 2025-12-21T08:16:33.2961677Z [error] at java.base/java.lang.Thread.run(Thread.java:840) 2025-12-21T08:16:33.2996977Z [error] (streaming-kafka-0-10 / update) lmcoursier.internal.shaded.coursier.error.FetchError$DownloadingArtifacts: Error fetching artifacts: 2025-12-21T08:16:33.2998744Z [error] file:/home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar: not found: /home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar 2025-12-21T08:16:33.3000432Z [error] (sql-kafka-0-10 / update) lmcoursier.internal.shaded.coursier.error.FetchError$DownloadingArtifacts: Error fetching artifacts: 2025-12-21T08:16:33.3002097Z [error] file:/home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar: not found: /home/spark-rm/.m2/repository/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar 2025-12-21T08:16:33.3032908Z [error] Total time: 361 s (0:06:01.0), completed Dec 21, 2025, 8:16:33 AM ``` which seems breaking the release build https://github.com/apache/spark/actions/workflows/release.yml ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I cannot reproduce properly in my local. This is the fix assuming from the log. I will monitor the build. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #53556 from HyukjinKwon/SPARK-54597-followup. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…gression ### What changes were proposed in this pull request? Previously, lz4-java was upgraded to 1.10.x to address CVEs, - #53327 - #53347 - #53971 while this casues significant performance drop, see the benchmark report at - #53453 this PR follows the [suggestion](#53290 (comment)) to migrate to safeDecompressor. ### Why are the changes needed? Mitigate performance regression. ### Does this PR introduce _any_ user-facing change? No, except for performance. ### How was this patch tested? GHA for functionality, [benchmark](#53453 (comment)) for performance. > TL;DR - my test results show lz4-java 1.10.1 is about 10~15% slower on lz4 compression than 1.8.0, and is about ~5% slower on lz4 decompression even with migrating to suggested safeDecompressor ### Was this patch authored or co-authored using generative AI tooling? No. Closes #53454 from pan3793/SPARK-54571. Lead-authored-by: Cheng Pan <chengpan@apache.org> Co-authored-by: pan3793 <pan3793@users.noreply.github.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
… CVE‐2025‐12183 and CVE-2025-66566 ### What changes were proposed in this pull request? - Bump lz4-java version from 1.8.0 to 1.10.4 to resolve CVE‐2025‐12183 and CVE-2025-66566. - `Lz4Decompressor` follows the [suggestion](apache/spark#53290 (comment)) to move from `fastDecompressor` to `safeDecompressor` to mitigate the performance. Backport: - apache/spark#53327 - apache/spark#53347 - apache/spark#53971 - apache/spark#53454 - apache/spark#54585 ### Why are the changes needed? - [CVE‐2025‐12183](https://sites.google.com/sonatype.com/vulnerabilities/cve-2025-12183): Various lz4-java compression and decompression implementations do not guard against out-of-bounds memory access. Untrusted input may lead to denial of service and information disclosure. Vulnerable Maven coordinates: org.lz4:lz4-java up to and including 1.8.0. - [CVE-2025-66566](GHSA-cmp6-m4wj-q63q): Insufficient clearing of the output buffer in Java-based decompressor implementations in lz4-java 1.10.0 and earlier allows remote attackers to read previous buffer contents via crafted compressed input. In applications where the output buffer is reused without being cleared, this may lead to disclosure of sensitive data. JNI-based implementations are not affected. Therefore, lz4-java version should upgrade to 1.10.4. ### Does this PR resolve a correctness bug? No. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI. Closes #3555 from SteNicholas/CELEBORN-2218. Lead-authored-by: SteNicholas <programgeek@163.com> Co-authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: SteNicholas <programgeek@163.com>
… CVE‐2025‐12183 and CVE-2025-66566 - Bump lz4-java version from 1.8.0 to 1.10.4 to resolve CVE‐2025‐12183 and CVE-2025-66566. - `Lz4Decompressor` follows the [suggestion](apache/spark#53290 (comment)) to move from `fastDecompressor` to `safeDecompressor` to mitigate the performance. Backport: - apache/spark#53327 - apache/spark#53347 - apache/spark#53971 - apache/spark#53454 - apache/spark#54585 - [CVE‐2025‐12183](https://sites.google.com/sonatype.com/vulnerabilities/cve-2025-12183): Various lz4-java compression and decompression implementations do not guard against out-of-bounds memory access. Untrusted input may lead to denial of service and information disclosure. Vulnerable Maven coordinates: org.lz4:lz4-java up to and including 1.8.0. - [CVE-2025-66566](GHSA-cmp6-m4wj-q63q): Insufficient clearing of the output buffer in Java-based decompressor implementations in lz4-java 1.10.0 and earlier allows remote attackers to read previous buffer contents via crafted compressed input. In applications where the output buffer is reused without being cleared, this may lead to disclosure of sensitive data. JNI-based implementations are not affected. Therefore, lz4-java version should upgrade to 1.10.4. No. No. CI. Closes #3555 from SteNicholas/CELEBORN-2218. Lead-authored-by: SteNicholas <programgeek@163.com> Co-authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: SteNicholas <programgeek@163.com> (cherry picked from commit dca3749) Signed-off-by: SteNicholas <programgeek@163.com>

What changes were proposed in this pull request?
This PR aims to upgrade
lz4-javato 1.10.0 and exclude the legacy groupID version.Why are the changes needed?
Since
lz4-javachanged its repository, we had better depend on the live repository for future maintenance.Does this PR introduce any user-facing change?
No Spark behavior change.
How was this patch tested?
Pass the CIs.
Was this patch authored or co-authored using generative AI tooling?
No.