[SPARK-54693][CORE][TESTS] Add LZ4TPCDSDataBenchmark#53453
[SPARK-54693][CORE][TESTS] Add LZ4TPCDSDataBenchmark#53453pan3793 wants to merge 9 commits intoapache:masterfrom
Conversation
|
TL;DR - my test results show lz4-java 1.10.1 is about 10~15% slower on lz4 compression than 1.8.0, and is about 5% slower on lz4 decompression even with migrating to suggested safeDecompressor (#53454) Here is my local test result: lz4-java 1.8.0 with fastDecompressor (state before aa65bda) lz4-java 1.10.1 with fastDecompressor (current master state, significant decompress perf drop!!!) lz4-java 1.10.1 with safeDecompressor (#53454) |
core/src/test/scala/org/apache/spark/io/LZ4TPCDSDataBenchmark.scala
Outdated
Show resolved
Hide resolved
core/src/test/scala/org/apache/spark/io/LZ4TPCDSDataBenchmark.scala
Outdated
Show resolved
Hide resolved
core/src/test/scala/org/apache/spark/io/LZ4TPCDSDataBenchmark.scala
Outdated
Show resolved
Hide resolved
core/src/test/scala/org/apache/spark/io/LZ4TPCDSDataBenchmark.scala
Outdated
Show resolved
Hide resolved
…21, Scala 2.13, split 1 of 1)
…17, Scala 2.13, split 1 of 1)
… (JDK 21, Scala 2.13, split 1 of 1)
… (JDK 17, Scala 2.13, split 1 of 1)
|
@LuciferYang, I did a refactor to make |
| /** | ||
| * Any code before running any benchmark, e.g., data preparation | ||
| */ | ||
| def beforeAll(): Unit = {} |
There was a problem hiding this comment.
My suggestion is not to add the beforeAll() method in the current pr If we find it to be widely applicable, we can add this method in a subsequent pr and also extract beforeAll() for more microbenchmark scenarios.
|
Merged into master. Thanks @pan3793 |
|
Although I didn't get a chance to take a look at the detail, thank you so much for adding additional benchmark, @pan3793 and @LuciferYang ! |
…gression ### What changes were proposed in this pull request? Previously, lz4-java was upgraded to 1.10.x to address CVEs, - #53327 - #53347 - #53971 while this casues significant performance drop, see the benchmark report at - #53453 this PR follows the [suggestion](#53290 (comment)) to migrate to safeDecompressor. ### Why are the changes needed? Mitigate performance regression. ### Does this PR introduce _any_ user-facing change? No, except for performance. ### How was this patch tested? GHA for functionality, [benchmark](#53453 (comment)) for performance. > TL;DR - my test results show lz4-java 1.10.1 is about 10~15% slower on lz4 compression than 1.8.0, and is about ~5% slower on lz4 decompression even with migrating to suggested safeDecompressor ### Was this patch authored or co-authored using generative AI tooling? No. Closes #53454 from pan3793/SPARK-54571. Lead-authored-by: Cheng Pan <chengpan@apache.org> Co-authored-by: pan3793 <pan3793@users.noreply.github.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
What changes were proposed in this pull request?
Add
LZ4TPCDSDataBenchmark, testLZ4CompressionCodecagainst TPCDScatalog_sales.dat(SF1), the size is about 283M.Why are the changes needed?
Add a benchmark to measure the perf impact of lz4 security upgrading.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added benchmark result. Since the change has a refactor that touched
ZSTDTPCDSDataBenchmark, also updated its benchmark results.Was this patch authored or co-authored using generative AI tooling?
No.