[SPARK-56748][TESTS] Unify workspace test helpers via TestEnvHelper trait#55710
Draft
zhengruifeng wants to merge 5 commits intoapache:masterfrom
Draft
[SPARK-56748][TESTS] Unify workspace test helpers via TestEnvHelper trait#55710zhengruifeng wants to merge 5 commits intoapache:masterfrom
zhengruifeng wants to merge 5 commits intoapache:masterfrom
Conversation
### What changes were proposed in this pull request? Introduces `org.apache.spark.util.SparkTestPaths` in `common-utils` test sources, holding the two helpers that are currently duplicated across module boundaries: - `getWorkspaceFilePath(parts...)` — resolve a path relative to `spark.test.home` / `SPARK_HOME`. - `regenerateGoldenFiles` — env-var check for `SPARK_GENERATE_GOLDEN_FILES=1`. The trait is mixed into: - `SparkTestSuite` (core) — drops local copies of both helpers. - `ConnectFunSuite` (connect client) — drops the local `getWorkspaceFilePath` (which carried a `// Borrowed from SparkFunSuite` comment). - `LogKeysSuite` (common-utils) — drops local copies of both. Also removes the now-redundant `private val regenerateGoldenFiles` in `PlanGenerationTestSuite` (it would have shadowed/clashed with the inherited `protected def`). Living in `common-utils` is required because some consumers (`LogKeysSuite` in `common-utils`, `ConnectFunSuite` in the shaded Spark Connect client) cannot reach `SparkFunSuite` / `SparkTestSuite` in `spark-core` due to module dependency direction. ### Why are the changes needed? Cleanup. The two helpers are byte-for-byte identical at three call sites (four counting the `regenerateGoldenFiles` clone), and the connect copy explicitly comments that it was borrowed from `SparkFunSuite`. Consolidating prevents drift. ### Does this PR introduce _any_ user-facing change? No. Test-only refactor. ### How was this patch tested? `build/sbt common-utils/Test/compile core/Test/compile connect-client-jvm/Test/compile sql/Test/compile` succeeds. No behavior change. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Opus 4.7)
The trait covers env-var-driven test plumbing — both the workspace path resolution (via `spark.test.home` / `SPARK_HOME`) and the golden-file regeneration flag (`SPARK_GENERATE_GOLDEN_FILES`). The earlier name was too narrow for the latter. Generated-by: Claude Code (Opus 4.7)
Drop the redundant `Spark` prefix; everything in `org.apache.spark.util` is already Spark-scoped. Generated-by: Claude Code (Opus 4.7)
Mixing `TestEnvHelper` (defined in `common-utils` test sources) into `ConnectFunSuite` puts it in the parent chain of every connect-client test class. When a closure from such a test is sent to the connect server (e.g. UDFs in `KeyValueGroupedDatasetE2ETestSuite`), the server JVM tries to load `TestEnvHelper.class` and fails because `common-utils` test JAR is not registered as an artifact (only `connect-client-jvm` test-classes is, via `RemoteSparkSession`). Restore `ConnectFunSuite`'s local `getWorkspaceFilePath` and `PlanGenerationTestSuite`'s local `regenerateGoldenFiles`. The `TestEnvHelper` trait remains shared between `SparkTestSuite` (core) and `LogKeysSuite` (common-utils), where the cross-classloader path does not apply. Generated-by: Claude Code (Opus 4.7)
HyukjinKwon
approved these changes
May 7, 2026
Re-extends `ConnectFunSuite` with `TestEnvHelper`, but this time also uploads `common-utils/target/<scala>/test-classes` to the connect server's artifact path so the server can resolve `TestEnvHelper.class` when deserializing client-side closures. Without this, every test class extending `ConnectFunSuite` had `TestEnvHelper` in its parent chain, and any closure sent to the server (UDFs, KV grouped operations) would fail with `SparkClassNotFoundException: Failed to load class: org.apache.spark.util.TestEnvHelper`. Generated-by: Claude Code (Opus 4.7)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Introduces
org.apache.spark.util.TestEnvHelperincommon-utilstest sources, holding the two env-var-driven helpers that are currently duplicated across module boundaries:getWorkspaceFilePath(parts...)— resolve a path relative tospark.test.home/SPARK_HOME.regenerateGoldenFiles— env-var check forSPARK_GENERATE_GOLDEN_FILES=1.The trait is mixed into:
SparkTestSuite(core) — drops local copies of both helpers.ConnectFunSuite(connect client) — drops the localgetWorkspaceFilePath(which carried a// Borrowed from SparkFunSuitecomment).LogKeysSuite(common-utils) — drops local copies of both.Also removes the now-redundant
private val regenerateGoldenFilesinPlanGenerationTestSuite(it would have shadowed/clashed with the inheritedprotected def).Living in
common-utilsis required because some consumers (LogKeysSuiteincommon-utils,ConnectFunSuitein the shaded Spark Connect client) cannot reachSparkFunSuite/SparkTestSuiteinspark-coredue to module dependency direction.Why are the changes needed?
Cleanup. The two helpers are byte-for-byte identical at three call sites (four counting the
regenerateGoldenFilesclone), and the connect copy explicitly comments that it was borrowed fromSparkFunSuite. Consolidating prevents drift.Does this PR introduce any user-facing change?
No. Test-only refactor.
How was this patch tested?
build/sbt common-utils/Test/compile core/Test/compile connect-client-jvm/Test/compile sql/Test/compilesucceeds. No behavior change.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.7)