[SPARK-56448][CONNECT] Fix NPE on Spark Connect client restart due to ammonite compile cache#55720
Open
yadavay-amzn wants to merge 1 commit intoapache:masterfrom
Open
[SPARK-56448][CONNECT] Fix NPE on Spark Connect client restart due to ammonite compile cache#55720yadavay-amzn wants to merge 1 commit intoapache:masterfrom
yadavay-amzn wants to merge 1 commit intoapache:masterfrom
Conversation
… ammonite compile cache The Spark Connect REPL uses Ammonite. Ammonite's default Storage.Folder persists compiled predef classes under ~/.ammonite/<version>/cache. On a subsequent REPL start from the same working directory, the cached CodePredef class is reloaded but its reference to the per-session ArgsPredef helper is stale, producing a NullPointerException during predef initialization. Use Storage.InMemory for the Connect REPL's compile cache so every session starts fresh. Extract the Main construction into a package- private helper to keep the test localised to unit-level. Regression test added: asserts that the ammonite.Main returned by ConnectRepl.newAmmoniteMain exposes no on-disk cache directory (storageBackend.dirOpt.isEmpty) and is an instance of Storage.InMemory. The dirOpt assertion is an observable behavioural check -- if the Storage.InMemory wiring is reverted, ammonite.Main falls back to Storage.Folder(~/.ammonite) and the test fails with a clear message rather than silently compiling.
attilapiros
requested changes
May 6, 2026
Contributor
attilapiros
left a comment
There was a problem hiding this comment.
I think this test should be refined to check for the actual NPE rather than the internal state of dirOpt (while this state currently triggers the crash in this Ammonite version, it's an indirect check).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Fixes SPARK-56448 (https://issues.apache.org/jira/browse/SPARK-56448).
The Spark Connect REPL uses Ammonite. Ammonite's default
Storage.Folderpersists compiled predef classes under
~/.ammonite/<version>/cache. On asubsequent REPL start from the same working directory, the cached
CodePredefclass is reloaded but its reference to the per-sessionArgsPredefhelper is stale, producing aNullPointerExceptionduringpredef initialization.
This PR switches the Connect REPL's compile cache to
Storage.InMemoryso every session starts fresh and no stale cache is carried across
restarts. The
ammonite.Mainconstruction is extracted into apackage-private helper (
ConnectRepl.newAmmoniteMain) to keep theregression test localised to unit-level.
Why are the changes needed?
The stale-cache failure is a user-visible crash on every second
invocation of
bin/spark-shell --remote sc://...from the same workingdirectory. Repro steps are on the JIRA.
Does this PR introduce any user-facing change?
There is one minor observable tradeoff: because the compile cache is
now in-memory rather than persisted, each REPL start recompiles the
predef instead of reading the cached classfiles. This adds ~a few
hundred milliseconds to subsequent REPL startups but eliminates the
NPE. We believe this is the correct tradeoff — a small startup cost
is preferable to a hard failure.
How was this patch tested?
Added
ConnectReplSuitewith a regression test that asserts theammonite.Mainreturned byConnectRepl.newAmmoniteMainexposes noon-disk cache directory (
storageBackend.dirOpt.isEmpty) and is aninstance of
Storage.InMemory.The
dirOptcheck is an observable behavioural assertion: if theStorage.InMemory()wiring is reverted,ammonite.Mainfalls back tothe default
Storage.Folder(Defaults.ammoniteHome)and the assertionfails with a clear message rather than silently compiling. I verified
this locally by temporarily reverting only the
Storage.InMemory()line and re-running the test; it fails with
Some(/home/<user>/.ammonite) was not empty. Restoring the fix makesthe test pass.
Targeted test run:
Also ran
./dev/scalastylelocally — passes.Was this patch authored or co-authored using generative AI tooling?
Yes