Skip to content

[SPARK-55477] Add SequentialUnionManager for managing sequential source processing#54257

Open
ericm-db wants to merge 1 commit intoapache:masterfrom
ericm-db:pr1-sequential-union-manager
Open

[SPARK-55477] Add SequentialUnionManager for managing sequential source processing#54257
ericm-db wants to merge 1 commit intoapache:masterfrom
ericm-db:pr1-sequential-union-manager

Conversation

@ericm-db
Copy link
Contributor

What changes were proposed in this pull request?

This adds SequentialUnionManager, which manages state and lifecycle for SequentialStreamingUnion nodes during streaming execution. The manager:

  • Tracks which source is currently active in a sequential union
  • Manages transitions between sources when exhaustion is detected
  • Handles just-in-time preparation of sources with AvailableNow semantics
  • Provides serializable offset representation for checkpoint recovery

Why are the changes needed?

This component manages source activation for Sequential Union

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit tests

Was this patch authored or co-authored using generative AI tooling?

No

This adds SequentialUnionManager, which manages state and lifecycle for
SequentialStreamingUnion nodes during streaming execution. The manager:

- Tracks which source is currently active in a sequential union
- Manages transitions between sources when exhaustion is detected
- Handles just-in-time preparation of sources with AvailableNow semantics
- Provides serializable offset representation for checkpoint recovery

Key design points:
- Sequential draining: Each non-final source is prepared with AvailableNow,
  drained to exhaustion, then transitions to the next source
- Just-in-time preparation: Sources are prepared immediately before draining
  to capture the freshest bound point
- Checkpoint integration: State is serialized as SequentialUnionOffset for
  durability across restarts

This is a foundational component for the sequential union execution feature,
which enables backfill-to-live streaming scenarios.
@ericm-db ericm-db changed the title Add SequentialUnionManager for managing sequential source processing [SPARK-55477] Add SequentialUnionManager for managing sequential source processing Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant