Skip to content

[WIP][SPARK-55535][SQL] Refactor KeyGroupedPartitioning and Storage Partition Join#54330

Draft
peter-toth wants to merge 2 commits intoapache:masterfrom
peter-toth:SPARK-55535-refactor-kgp-and-spj
Draft

[WIP][SPARK-55535][SQL] Refactor KeyGroupedPartitioning and Storage Partition Join#54330
peter-toth wants to merge 2 commits intoapache:masterfrom
peter-toth:SPARK-55535-refactor-kgp-and-spj

Conversation

@peter-toth
Copy link
Contributor

@peter-toth peter-toth commented Feb 15, 2026

What changes were proposed in this pull request?

This is a work in progress PR to replace KeyGroupedPartitioning with KeyedPartitioning and separate partitiong grouping logic from BatchScanExec to a new GroupPartitionsExec operator. KeyedPartitioning represents a partitioning where partition keys are known. It can be grouped (clustered) or not by partition keys. When grouping is required the new operator can be inserted into a plan at any place (similary to how exchanges are inserted) and so creating the necessary grouped/replicated partitions by keys.

Why are the changes needed?

To solve the issue of unecessary partition grouping (#53859) and restore BatchScanExec cleanness and simplicity.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing UTs adjusted.

Was this patch authored or co-authored using generative AI tooling?

No.

…roupPartitionsExec` operator, remove old code
@peter-toth peter-toth force-pushed the SPARK-55535-refactor-kgp-and-spj branch from 5122de6 to 114aee5 Compare February 15, 2026 20:26
case h: HashPartitioningLike => expandOutputPartitioning(h)
case c: PartitioningCollection => expandOutputPartitioning(c)
case other => other
val expandedPartitioning = expandOutputPartitioning(streamedPlan.outputPartitioning)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BroadcastHashJoinExec related changes are extracted to a separate PR: #54335, since those changes are valid on their own.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant