Skip to content

[SPARK-56748][TESTS] Unify workspace test helpers via TestEnvHelper trait#55710

Draft
zhengruifeng wants to merge 5 commits intoapache:masterfrom
zhengruifeng:unify-workspace-test-helpers
Draft

[SPARK-56748][TESTS] Unify workspace test helpers via TestEnvHelper trait#55710
zhengruifeng wants to merge 5 commits intoapache:masterfrom
zhengruifeng:unify-workspace-test-helpers

Conversation

@zhengruifeng
Copy link
Copy Markdown
Contributor

@zhengruifeng zhengruifeng commented May 6, 2026

What changes were proposed in this pull request?

Introduces org.apache.spark.util.TestEnvHelper in common-utils test sources, holding the two env-var-driven helpers that are currently duplicated across module boundaries:

  • getWorkspaceFilePath(parts...) — resolve a path relative to spark.test.home / SPARK_HOME.
  • regenerateGoldenFiles — env-var check for SPARK_GENERATE_GOLDEN_FILES=1.

The trait is mixed into:

  • SparkTestSuite (core) — drops local copies of both helpers.
  • ConnectFunSuite (connect client) — drops the local getWorkspaceFilePath (which carried a // Borrowed from SparkFunSuite comment).
  • LogKeysSuite (common-utils) — drops local copies of both.

Also removes the now-redundant private val regenerateGoldenFiles in PlanGenerationTestSuite (it would have shadowed/clashed with the inherited protected def).

Living in common-utils is required because some consumers (LogKeysSuite in common-utils, ConnectFunSuite in the shaded Spark Connect client) cannot reach SparkFunSuite / SparkTestSuite in spark-core due to module dependency direction.

Why are the changes needed?

Cleanup. The two helpers are byte-for-byte identical at three call sites (four counting the regenerateGoldenFiles clone), and the connect copy explicitly comments that it was borrowed from SparkFunSuite. Consolidating prevents drift.

Does this PR introduce any user-facing change?

No. Test-only refactor.

How was this patch tested?

build/sbt common-utils/Test/compile core/Test/compile connect-client-jvm/Test/compile sql/Test/compile succeeds. No behavior change.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7)

### What changes were proposed in this pull request?

Introduces `org.apache.spark.util.SparkTestPaths` in `common-utils` test sources, holding the two helpers that are currently duplicated across module boundaries:

- `getWorkspaceFilePath(parts...)` — resolve a path relative to `spark.test.home` / `SPARK_HOME`.
- `regenerateGoldenFiles` — env-var check for `SPARK_GENERATE_GOLDEN_FILES=1`.

The trait is mixed into:
- `SparkTestSuite` (core) — drops local copies of both helpers.
- `ConnectFunSuite` (connect client) — drops the local `getWorkspaceFilePath` (which carried a `// Borrowed from SparkFunSuite` comment).
- `LogKeysSuite` (common-utils) — drops local copies of both.

Also removes the now-redundant `private val regenerateGoldenFiles` in `PlanGenerationTestSuite` (it would have shadowed/clashed with the inherited `protected def`).

Living in `common-utils` is required because some consumers (`LogKeysSuite` in `common-utils`, `ConnectFunSuite` in the shaded Spark Connect client) cannot reach `SparkFunSuite` / `SparkTestSuite` in `spark-core` due to module dependency direction.

### Why are the changes needed?

Cleanup. The two helpers are byte-for-byte identical at three call sites (four counting the `regenerateGoldenFiles` clone), and the connect copy explicitly comments that it was borrowed from `SparkFunSuite`. Consolidating prevents drift.

### Does this PR introduce _any_ user-facing change?

No. Test-only refactor.

### How was this patch tested?

`build/sbt common-utils/Test/compile core/Test/compile connect-client-jvm/Test/compile sql/Test/compile` succeeds. No behavior change.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7)
The trait covers env-var-driven test plumbing — both the workspace path
resolution (via `spark.test.home` / `SPARK_HOME`) and the golden-file
regeneration flag (`SPARK_GENERATE_GOLDEN_FILES`). The earlier name was
too narrow for the latter.

Generated-by: Claude Code (Opus 4.7)
@zhengruifeng zhengruifeng changed the title [MINOR][TESTS] Unify workspace test helpers via SparkTestPaths trait [MINOR][TESTS] Unify workspace test helpers via SparkTestEnvHelper trait May 6, 2026
Drop the redundant `Spark` prefix; everything in `org.apache.spark.util`
is already Spark-scoped.

Generated-by: Claude Code (Opus 4.7)
@zhengruifeng zhengruifeng changed the title [MINOR][TESTS] Unify workspace test helpers via SparkTestEnvHelper trait [MINOR][TESTS] Unify workspace test helpers via TestEnvHelper trait May 6, 2026
@zhengruifeng zhengruifeng changed the title [MINOR][TESTS] Unify workspace test helpers via TestEnvHelper trait [SPARK-56748][TESTS] Unify workspace test helpers via TestEnvHelper trait May 6, 2026
Mixing `TestEnvHelper` (defined in `common-utils` test sources) into
`ConnectFunSuite` puts it in the parent chain of every connect-client
test class. When a closure from such a test is sent to the connect
server (e.g. UDFs in `KeyValueGroupedDatasetE2ETestSuite`), the server
JVM tries to load `TestEnvHelper.class` and fails because
`common-utils` test JAR is not registered as an artifact (only
`connect-client-jvm` test-classes is, via `RemoteSparkSession`).

Restore `ConnectFunSuite`'s local `getWorkspaceFilePath` and
`PlanGenerationTestSuite`'s local `regenerateGoldenFiles`. The
`TestEnvHelper` trait remains shared between `SparkTestSuite` (core)
and `LogKeysSuite` (common-utils), where the cross-classloader path
does not apply.

Generated-by: Claude Code (Opus 4.7)
Re-extends `ConnectFunSuite` with `TestEnvHelper`, but this time also
uploads `common-utils/target/<scala>/test-classes` to the connect
server's artifact path so the server can resolve `TestEnvHelper.class`
when deserializing client-side closures.

Without this, every test class extending `ConnectFunSuite` had
`TestEnvHelper` in its parent chain, and any closure sent to the server
(UDFs, KV grouped operations) would fail with
`SparkClassNotFoundException: Failed to load class:
org.apache.spark.util.TestEnvHelper`.

Generated-by: Claude Code (Opus 4.7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants