[SPARK-55092][SQL] Disable partition grouping in `KeyGroupedPartitioning` when not needed by peter-toth · Pull Request #53859 · apache/spark

peter-toth · 2026-01-19T18:53:45Z

What changes were proposed in this pull request?

Currently KeyGroupedPartitioning always groups partitions by key regardless if grouping is actually needed or not. This beahaviour decreases parallelism and can lead to slower performance.

This PR disables parition grouping of a scan with KeyGroupedPartitioning output partitioning if:

a shuffle is inserted above a scan, which means that grouping is not needed for the parent,
and grouping is not needed for the intermediate nodes either.

We can't disable partition grouping of a scan in a main query if it contributes the ouput partitioning of the query result because we don't know whether the query is cached/checkpointed and how the output of the query will be used later. The output must keep KeyGroupedPartitioning semantics in this case.
But we can disable partition grouping in subqueries when grouping is not needed for anything in the subquery plan. This is actually necessary to make sure broadcast exchange reuse happens correctly during dynamic partition pruning.

Why are the changes needed?

Improve performance.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New UT added.

Was this patch authored or co-authored using generative AI tooling?

No.

github-actions · 2026-01-19T18:53:55Z

JIRA Issue Information

=== Improvement SPARK-55092 ===
Summary: KeyGroupedPartitionig shouldn't group partitions when not needed
Assignee: None
Status: Open
Affected: ["4.2.0"]

This comment was automatically generated by GitHub Actions

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala

peter-toth · 2026-01-20T20:01:58Z

cc @szehon-ho , @sunchao, @viirya, @dongjoon-hyun

dongjoon-hyun · 2026-01-21T00:04:32Z

Thank you for pinging me, @peter-toth .

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala

dongjoon-hyun

Although we need to handle copyFromTag independently, the proposal itself sounds reasonable to me. Do you think you can share some supporting performance numbers based on the existing benchmark or from your production environment?

Why are the changes needed?

Improve performance.

peter-toth · 2026-01-21T14:32:06Z

Although we need to handle copyFromTag independently, the proposal itself sounds reasonable to me. Do you think you can share some supporting performance numbers based on the existing benchmark or from your production environment?

Numbers depend heavily on the usecase. In our case a customer would like to use SPJ, between table A and B. Both tables are storage partitoned, but B is storage partitioned by some columns that don't match the join condition of the query. In this case "one side shuffle" can help if spark.sql.sources.v2.bucketing.shuffle.enabled is enabled and only B will be shuffled, but the unecessary grouping of partitions still happens in case of B. And while storage partitioning of B helps in other queries, in this particular one it significantly decreases partitioning and slows down the stage before the shuffle.

The optimization in this PR is similar to what spark.sql.sources.v2.bucketing.partiallyClusteredDistribution.enabled does to keep one side partially clustered (ungrouped) in SPJ. But the optimization kicks in in case of "one side shuffle" SPJ and when there is no SPJ.

dongjoon-hyun

+1, LGTM. Thank you, @peter-toth .

BTW, this PR is not rebased to the master after merging #53884 . Did I understand correctly?

…s when not needed

…le partition grouping

peter-toth · 2026-01-22T09:03:02Z

BTW, this PR is not rebased to the master after merging #53884 . Did I understand correctly?

You are right. I've just rebased it. The diff should be ok now.

dongjoon-hyun · 2026-01-22T09:03:32Z

Thank you!

peter-toth · 2026-01-22T09:05:39Z

@szehon-ho , @sunchao , @viirya , do you have any concerns or comments?

szehon-ho · 2026-01-27T00:28:10Z

hi , this seems useful, i will try to review this week, but if it looks ok to @sunchao @viirya , go ahead

szehon-ho · 2026-01-28T18:52:19Z

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala

      sparkSession: SparkSession,
      adaptiveExecutionRule: Option[InsertAdaptiveSparkPlan] = None,
      subquery: Boolean): Seq[Rule[SparkPlan]] = {
+    val requiredDistribution = if (subquery) {


not sure i get this, if its not a subquery we pass in any requiredDistribution?

Yeah, let me change this tomorrow and pass in subquery directly into EnsureRequirements, that way this will be much cleaner.

Fixed in b04bb61 and added comments in c28fc3f.

szehon-ho · 2026-01-28T18:54:55Z

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala

+                val newChild = disableKeyGroupingIfNotNeeded(c)
+                ShuffleExchangeExec(newPartitioning, newChild, so, ps)
+              case _ =>
+                val newChild = disableKeyGroupingIfNotNeeded(child)


could we make a method createShuffleExchangeExec(..., disableGrouping: Boolean) to reduce duplication?

Done in b04bb61.

sql/core/src/main/scala/org/apache/spark/sql/execution/joins/StoragePartitionJoinParams.scala

szehon-ho · 2026-01-28T18:57:44Z

@chirag-s-db if you also want to take a look?

viirya · 2026-01-28T19:27:59Z

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala

+    }
+  }
+
+  private def populateNoGroupingPartitionInfo(plan: SparkPlan): SparkPlan = plan match {


This looks like can be done with transform api?

Yes, I can change this to use transform() APIs.
Wanted to make it similar to the other 2 populate...() methods. Shall I change those as well?

I modified all 3 populate...()s in b04bb61.

viirya · 2026-01-28T19:30:12Z

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala

        child, values, joinKeyPositions, reducers, applyPartialClustering, replicatePartitions))
  }

+  private def disableKeyGroupingIfNotNeeded(child: SparkPlan) = {


More detailed comments on this method would be good, e.g., the conditions under which grouping can be safely disabled, etc.

I added comments in b04bb61.

viirya · 2026-01-28T19:53:56Z

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala

      sparkSession: SparkSession,
      adaptiveExecutionRule: Option[InsertAdaptiveSparkPlan] = None,
      subquery: Boolean): Seq[Rule[SparkPlan]] = {
+    val requiredDistribution = if (subquery) {


Can we add more detailed comments here? It looks confusing without any context when looking code here.

I changed this and now passing in subquery and added comments in c28fc3f.

viirya

Looks reasonable improvement.

…en-not-needed

chirag-s-db · 2026-01-29T16:21:51Z

Could we also support the case where the KeyGroupedPartitioning is the output of the plan with the following approach?

By default, disableGrouping is true in both the scan and in a field in the KeyGroupedPartitioning.
With disableGrouping=true, a KeyGroupedPartitioning can only satisfy the requirements of an UnspecifiedDistribution (or any distribution if there is only a single partition). However, with disableGrouping=true, we don't allow a KeyGroupedPartitioning (w/ > 1 partition) to satisfy the requirements of a Clustered or Ordered distribution.
In EnsureRequirements, we add a new case here that checks if the KeyGroupedPartitioning could satisfy the requirement if it were to be grouped (essentially, if the current implementation of satisfies for KeyGroupedPartitioning that take into consideration whether grouping is enabled or disabled). If so, then we push down disableGrouping=false to both the scan and the KeyGroupedPartitioning, after which point we know that the partitioning has been used to satisfy some requirements, so we must do grouping. If we can't push down to the scan (for example, if the plan reporting KeyGroupedPartitioning is checkpointed), then we just add a shuffle as normal.

One advantage of this approach is that it allows us to avoid grouping for the (presumably not uncommon) case of a simple scan from a partitioned table, and it should still be safe for checkpointed scans (as the checkpointed scans would have a KeyGroupedPartitioning w/ disableGrouping=true, which would not satisfy most required distributions). This approach should also decrease the complexity of the EnsureRequirements changes (since we wouldn't have to catch all the cases in which a KeyGroupedPartitioning scan doesn't contribute to the output partitioning of the plan).

FYI @szehon-ho

peter-toth · 2026-01-29T18:05:14Z

One advantage of this approach is that it allows us to avoid grouping for the (presumably not uncommon) case of a simple scan from a partitioned table, and it should still be safe for checkpointed scans (as the checkpointed scans would have a KeyGroupedPartitioning w/ disableGrouping=true, which would not satisfy most required distributions). This approach should also decrease the complexity of the EnsureRequirements changes (since we wouldn't have to catch all the cases in which a KeyGroupedPartitioning scan doesn't contribute to the output partitioning of the plan).

My concern with this approach is that we can introduce an extra shuffle above the checkpointed (ungrouped) data.

szehon-ho · 2026-01-29T22:38:28Z

I really like @chirag-s-db 's idea, it is quite clean. Else we have to guard everywhere we make a Shuffle to disable it. But i also see the point about missing the opportunity to avoid shuffle for a checkpointed KeyGrouped RDD. Although i guess its not a very common case. So in short , no strong opinion either way. @sunchao @viirya wondering any opinion?

peter-toth · 2026-01-30T09:41:28Z

Please note that not only checkpointed RDDs, but cached RDDs would also need an extra shuffle.

Actually, I wonder if partition grouping by key is at the right place in BatchScanExec or it could be a new operator that does the grouping. The operator should reside between a consumer that requires ClusteredDistribution and a producer that provides "partitions with keys" partitioning. We could move spjParams to the new operator and its output partitioning would be KeyGroupedPartitioning. EnsureRequirements could insert the operator if needed, similarly to how it inserts exchanges now. As it is would be a new operator it could be inserted on the top of BatchScanExec and LogicalRDD (checkpointed plan) and InMemoryTableScanExec (cached plan) as well.

I’m happy to put together a POC PR; just let me know.

szehon-ho · 2026-02-01T03:46:36Z

yea that sounds like it would be cleaner, and cover the checkpoint/ cache case. im not sure the detail about the disableGrouping by default will work, but curious to see, thanks @peter-toth

peter-toth · 2026-02-15T20:21:06Z

Just a quick update, that I have opeped a draft PR: #54330 to implement the above idea and extract the partition grouping logic from BatchScanExec to a new operator. It is still draft, but feedback is always welcome.

github-actions bot added the SQL label Jan 19, 2026

peter-toth marked this pull request as draft January 19, 2026 18:53

peter-toth force-pushed the SPARK-55092-kgp-do-not-group-partitions-when-not-needed branch from 04eed9a to a0e1d99 Compare January 19, 2026 19:06

peter-toth changed the title ~~[SPARK-55092][SQL] KeyGroupedPartitioning don't group partitions when not needed~~ [WIP][SPARK-55092][SQL] KeyGroupedPartitioning don't group partitions when not needed Jan 19, 2026

peter-toth force-pushed the SPARK-55092-kgp-do-not-group-partitions-when-not-needed branch from a0e1d99 to 1dc157a Compare January 20, 2026 11:36

peter-toth changed the title ~~[WIP][SPARK-55092][SQL] KeyGroupedPartitioning don't group partitions when not needed~~ [WIP][SPARK-55092][SQL] Disable partition grouping in KeyGroupedPartitioning when not needed Jan 20, 2026

peter-toth changed the title ~~[WIP][SPARK-55092][SQL] Disable partition grouping in KeyGroupedPartitioning when not needed~~ [SPARK-55092][SQL] Disable partition grouping in KeyGroupedPartitioning when not needed Jan 20, 2026

peter-toth commented Jan 20, 2026

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala Outdated Show resolved Hide resolved

peter-toth marked this pull request as ready for review January 20, 2026 12:02

peter-toth force-pushed the SPARK-55092-kgp-do-not-group-partitions-when-not-needed branch from 1dc157a to f4fbeed Compare January 20, 2026 12:05

peter-toth marked this pull request as draft January 20, 2026 14:28

peter-toth changed the title ~~[SPARK-55092][SQL] Disable partition grouping in KeyGroupedPartitioning when not needed~~ [WIP][SPARK-55092][SQL] Disable partition grouping in KeyGroupedPartitioning when not needed Jan 20, 2026

peter-toth changed the title ~~[WIP][SPARK-55092][SQL] Disable partition grouping in KeyGroupedPartitioning when not needed~~ [SPARK-55092][SQL] Disable partition grouping in KeyGroupedPartitioning when not needed Jan 20, 2026

peter-toth marked this pull request as ready for review January 20, 2026 18:17

dongjoon-hyun reviewed Jan 21, 2026

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala Outdated Show resolved Hide resolved

dongjoon-hyun reviewed Jan 21, 2026

View reviewed changes

peter-toth mentioned this pull request Jan 21, 2026

[SPARK-55113][SQL] EnsureRequirements should copy tags #53884

Closed

dongjoon-hyun approved these changes Jan 22, 2026

View reviewed changes

peter-toth added 5 commits January 22, 2026 10:01

[SPARK-55092][SQL] KeyGroupedPartitioning shouldn't group partition…

c63f709

…s when not needed

Explicitly require UnspecifiedDistribution from subqueries to disab…

7f79f9e

…le partition grouping

add test for main query partitioning

b5ab00a

fix and elaborate on asserts

de7b287

handle case when one side SPJ is disabled

4fd8026

peter-toth force-pushed the SPARK-55092-kgp-do-not-group-partitions-when-not-needed branch from f63091b to 4fd8026 Compare January 22, 2026 09:02

szehon-ho reviewed Jan 28, 2026

View reviewed changes

viirya reviewed Jan 28, 2026

View reviewed changes

peter-toth added 4 commits January 29, 2026 10:54

fix review findings

b04bb61

add more comments

c28fc3f

Merge branch 'master' into SPARK-55092-kgp-do-not-group-partitions-wh…

dd62e72

…en-not-needed

fix subquery passing

2a882ec

fix subquery passing in AQE

77b78d9

peter-toth mentioned this pull request Feb 7, 2026

[SPARK-55411][SQL] SPJ may throw ArrayIndexOutOfBoundsException when join keys are less than cluster keys #54182

Closed

peter-toth mentioned this pull request Feb 15, 2026

[WIP][SPARK-55535][SQL] Refactor KeyGroupedPartitioning and Storage Partition Join #54330

Draft

Conversation

peter-toth commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

github-actions bot commented Jan 19, 2026

JIRA Issue Information

Uh oh!

Uh oh!

peter-toth commented Jan 20, 2026

Uh oh!

dongjoon-hyun commented Jan 21, 2026

Uh oh!

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Why are the changes needed?

Uh oh!

peter-toth commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

peter-toth commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Jan 22, 2026

Uh oh!

peter-toth commented Jan 22, 2026

Uh oh!

szehon-ho commented Jan 27, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peter-toth Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

szehon-ho commented Jan 28, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viirya left a comment

Choose a reason for hiding this comment

Uh oh!

chirag-s-db commented Jan 29, 2026

Uh oh!

peter-toth commented Jan 29, 2026

Uh oh!

szehon-ho commented Jan 29, 2026

Uh oh!

peter-toth commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

szehon-ho commented Feb 1, 2026

Uh oh!

peter-toth commented Jan 19, 2026 •

edited

Loading

peter-toth commented Jan 21, 2026 •

edited

Loading

peter-toth commented Jan 22, 2026 •

edited

Loading

peter-toth Jan 29, 2026 •

edited

Loading

peter-toth commented Jan 30, 2026 •

edited

Loading