[FLINK-37399][runtime][source] Buffer watermarks for watermark alignment #27589

pnowojski · 2026-02-11T15:30:36Z

This change depends on #27440, please ignore couple of first commits from Roman.

What is the purpose of the change

Befor this change, when watermark alignment is enabled, it can prevent backlogged jobs
from using all available resources. Inadvertently watermark alignment configured with
maxAllowedWatermarkDrift and updateInterval was de facto capping the backlog processing
speed to maxAllowedWatermarkDrift (event time) / updateInterval (wall clock). For example
when maxAllowedWatermarkDrift=30s and updateInterval=1s, backlog could not be processed
faster than 30s (event time) / 1s (wall clock). In that case, if job had 1 day of records
to process in the backlog (for example after 24h downtime), this backlog could not be
processed more quickly than in 48 minutes, regardless of available resources and number
of actual records.

This change adds SamplingWatermarksRingBuffer that will hide the latency between
SourceOperators and SourceCoordinator. For more information please look into the ticket.

Brief change log

please check individual commits

Verifying this change

PR adds new unit tests

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (yes / no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
The serializers: (yes / no / don't know)
The runtime per-record code paths (performance sensitive): (yes / no / don't know)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no ****/ don't know)
The S3 file system connector: (yes / no / don't know)

Documentation

Does this pull request introduce a new feature? (yes / no)
If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

…eived This allows to prioritize processing of recovered records (when recovering from an unaligned checkpoint)

…inting interval The check doesn't make sense because checkpointing might be disabled before recovery; or there might be a manual checkpoint.

…orConfiguration

…g for a checkpoint

…ests

…ntTest

Befor this change, when watermark alignment is enabled, it can prevent backlogged jobs from using all available resources. Inadvertently watermark alignment configured with maxAllowedWatermarkDrift and updateInterval was de facto capping the backlog processing speed to maxAllowedWatermarkDrift (event time) / updateInterval (wall clock). For example when maxAllowedWatermarkDrift=30s and updateInterval=1s, backlog could not be processed faster than 30s (event time) / 1s (wall clock). In that case, if job had 1 day of records to process in the backlog (for example after 24h downtime), this backlog could not be processed more quickly than in 48 minutes, regardless of available resources and number of actual records. This change adds SamplingWatermarksRingBuffer that will hide the latency between SourceOperators and SourceCoordinator. For more information please look into the ticket and/or FLIP

…ases

flinkbot · 2026-02-11T15:37:46Z

CI report:

96bba6e Azure: FAILURE

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

davidradl · 2026-02-11T15:39:00Z

flink-core/src/main/java/org/apache/flink/configuration/CheckpointingOptions.java

+                    .booleanType()
+                    .defaultValue(false)
+                    .withDescription(
+                            "Don't pull any data from sources until the first checkpoint is triggered. "


nit: could we rephrase this to "Only pull data from sources after the first checkpoint is triggered.

davidradl · 2026-02-11T15:39:46Z

flink-core/src/main/java/org/apache/flink/configuration/CheckpointingOptions.java

+                    .withDescription(
+                            "Don't pull any data from sources until the first checkpoint is triggered. "
+                                    + "This might be helpful in reducing recovery times in cases where "
+                                    + "recovered records from unaligned checkpoint compete with new incoming records for processing. "


nit:
from unaligned checkpoint -> from an unaligned checkpoint

davidradl · 2026-02-11T15:42:07Z

flink-core/src/main/java/org/apache/flink/configuration/PipelineOptions.java

+                            "Controls size of the ring buffer used to smooth out watermark alignment "
+                                    + "due to the inherent latency of the alignment process. Allowed watermarks "
+                                    + "are announced at the updateInterval and this means they are often out of date "
+                                    + "after the round trip. To address this problem, when pausing consumption of records, "


what does round trip mean in this context?

After sentence means they are often out of date after the round trip , can we extend this sentence with which means that.....

davidradl · 2026-02-11T15:45:01Z

flink-core/src/main/java/org/apache/flink/configuration/PipelineOptions.java

+                                    + "max allowed watermark is not checked against the latest value of the watermark in "
+                                    + "any given split/source, but against the oldest value in the ring buffer, that is "
+                                    + "updated at every updateInterval. This is the config option that controls "
+                                    + "the size of the ring buffer. The default buffer size is 3. Buffer sizes below 2 "


nit :
Buffer sizes below 2 -> Buffer sizes of 1. If there is no use case for buffer size 1 - I wonder if we should validate to such that there is a minimum value of 2

davidradl · 2026-02-11T17:10:18Z

flink-core/src/main/java/org/apache/flink/configuration/PipelineOptions.java

+
+    @Experimental
+    public static final ConfigOption<Integer> WATERMARK_ALIGNMENT_BUFFER_SIZE =
+            key("pipeline.watermark-alignment.buffer-size")


I am wondering about buffer-size, how about ring-buffer-capacity or if you like size then ring-buffer-size.

davidradl · 2026-02-11T17:14:39Z

...rc/main/java/org/apache/flink/runtime/jobgraph/tasks/CheckpointCoordinatorConfiguration.java

@@ -136,10 +141,19 @@ private CheckpointCoordinatorConfiguration(
                !isUnalignedCheckpointsEnabled || maxConcurrentCheckpoints <= 1,
                "maxConcurrentCheckpoints can't be > 1 if UnalignedCheckpoints enabled");

+        // max "in between duration" can be one year - this is to prevent numeric overflows
+        if (minPauseBetweenCheckpoints > 365L * 24 * 60 * 60 * 1_000) {


nit : could you have this number as a constant?

davidradl · 2026-02-11T17:25:31Z

flink-core/src/main/java/org/apache/flink/configuration/PipelineOptions.java

+    public static final ConfigOption<Integer> WATERMARK_ALIGNMENT_BUFFER_SIZE =
+            key("pipeline.watermark-alignment.buffer-size")
+                    .intType()
+                    .defaultValue(3)


I am curious to what value the user would know to set this to and what guidance we can give.

Is it feasible to dynamically workout how many watermarks we need to be aware of rather than hard code the number in a ring buffer.

Efrat19

Thank you for these valuable contributions.

With more frequent checkpoints, ids can be duplicated in RestoreUpgradedJobITCase. This change adds a sipmle deduplication before the assertion.

rkhachatryan

LGTM, I've updated RestoreUpgradedJobITCase to fix the failure.

Thanks for open-sourcing these changes!

rkhachatryan · 2026-02-12T13:04:35Z

@flinkbot run azure

rkhachatryan and others added 14 commits January 23, 2026 08:59

[hotfix] Close OutputWriter in SourceOperatorStreamTaskTest

845e5cb

[FLINK-38939] Pause Sources until the first checkpoint barrier is rec…

168cff8

…eived This allows to prioritize processing of recovered records (when recovering from an unaligned checkpoint)

[hotfix] Try to get last checkpoint on recovery regardless of checkpo…

16d7397

…inting interval The check doesn't make sense because checkpointing might be disabled before recovery; or there might be a manual checkpoint.

[hotfix] Move checkpointing configuration code to CheckpointCoordinat…

40a2ff5

…orConfiguration

[FLINK-38939] Minimize checkpoint trigger delay if sources are waitin…

708a116

…g for a checkpoint

[hotfix][tests] Increase min pause between checkpoints in migration t…

7729a99

…ests

[hotfix][tests] Extract updateIntervalMillis in SourceOperatorAlignme…

63cc868

…ntTest

[hotfix][tests] Refactor SourceOperatorSplitWatermarkAlignmentTest

9d5e00f

[hotfix][tests] Create Builder for SourceOperatorTestContext

c39cb7e

[FLINK-37399][runtime] Add SamplingWatermarkRingBuffer

2314228

[FLINK-37399][tests] Randomize watermark alignment buffer size in ITC…

bfeebcc

…ases

[FLINK-37399][docs] Regenerate docs

fb4a89b

[hotfix] Fix formatting

d941764

davidradl reviewed Feb 11, 2026

View reviewed changes

github-actions bot added the community-reviewed PR has been reviewed by the community. label Feb 12, 2026

Efrat19 reviewed Feb 12, 2026

View reviewed changes

[hotfix][tests] De-duplicate operator ids in RestoreUpgradedJobITCase

96bba6e

With more frequent checkpoints, ids can be duplicated in RestoreUpgradedJobITCase. This change adds a sipmle deduplication before the assertion.

rkhachatryan approved these changes Feb 12, 2026

View reviewed changes

rkhachatryan mentioned this pull request Feb 12, 2026

[FLINK-38939][runtime] Pause sources until the 1st checkpoint to prioritize processing recovered records #27440

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-37399][runtime][source] Buffer watermarks for watermark alignment #27589

[FLINK-37399][runtime][source] Buffer watermarks for watermark alignment #27589

pnowojski commented Feb 11, 2026 •

edited

Loading

Uh oh!

flinkbot commented Feb 11, 2026 •

edited

Loading

Uh oh!

davidradl Feb 11, 2026

Uh oh!

davidradl Feb 11, 2026

Uh oh!

davidradl Feb 11, 2026 •

edited

Loading

Uh oh!

davidradl Feb 11, 2026 •

edited

Loading

Uh oh!

davidradl Feb 11, 2026

Uh oh!

davidradl Feb 11, 2026

Uh oh!

davidradl Feb 11, 2026 •

edited

Loading

Uh oh!

Efrat19 left a comment

Uh oh!

rkhachatryan left a comment

Uh oh!

rkhachatryan commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[FLINK-37399][runtime][source] Buffer watermarks for watermark alignment #27589

Are you sure you want to change the base?

[FLINK-37399][runtime][source] Buffer watermarks for watermark alignment #27589

Conversation

pnowojski commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

flinkbot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI report:

Uh oh!

davidradl Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

davidradl Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

davidradl Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidradl Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidradl Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

davidradl Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

davidradl Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Efrat19 left a comment

Choose a reason for hiding this comment

Uh oh!

rkhachatryan left a comment

Choose a reason for hiding this comment

Uh oh!

rkhachatryan commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pnowojski commented Feb 11, 2026 •

edited

Loading

flinkbot commented Feb 11, 2026 •

edited

Loading

davidradl Feb 11, 2026 •

edited

Loading

davidradl Feb 11, 2026 •

edited

Loading

davidradl Feb 11, 2026 •

edited

Loading