[improve][offload] Coalesce automatic offload triggers to reduce retry loops and ledger scans by void-ptr974 · Pull Request #25793 · apache/pulsar

void-ptr974 · 2026-05-16T02:47:12Z

Related issue

Motivation

Automatic managed ledger offload can be triggered repeatedly while a previous offload is still running, for example around ledger rollover or topic load.

Before this change, every automatic trigger could independently enter the offload path:

read offload policies
scan the managed ledger's ledger list
try to acquire the offload mutex
fail because another offload is already running
schedule another retry after 100ms

When offload is slow, or when a managed ledger has many ledgers, repeated automatic triggers can build up unnecessary scheduler/executor work. These retries do not improve the final offload
result because only one offload can run at a time.

Modifications

This change coalesces automatic offload triggers in ManagedLedgerImpl.

After this change:

there is at most one in-flight automatic offload
repeated automatic triggers during an in-flight offload are merged into one pending rerun
after the current automatic offload completes, one follow-up pass runs if any trigger arrived meanwhile
explicit/manual offload requests keep their existing CompletableFuture<Position> behavior
the automatic offload sentinel is renamed to make it clear that its Position value is not consumed
duplicate getOffloadPolicies() calls are avoided in the appendable offloader policy lookup path

Impact

The final automatic offload progression is preserved: if new ledgers become eligible while an offload is running, the follow-up pass still picks them up.

The main benefit is reducing unnecessary work under slow offload or large-ledger workloads:

avoids repeated 100ms automatic offload retry loops
reduces redundant offload policy reads
reduces redundant ledger-list scans
lowers scheduler and executor pressure while offload is already in progress
keeps explicit/manual offload behavior unchanged

In practice, repeated automatic triggers while one offload is running are reduced from many independent retry loops to one active run plus one pending rerun.

Verifying this change

Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

Added a test that repeated automatic triggers during an in-flight offload do not create independent retry loops.
Added a test that a coalesced automatic trigger causes one follow-up offload pass.
Added a test that automatic offload state is released when offload thresholds are disabled, so later valid triggers can still run.

Local verification:

git diff --check
./gradlew :managed-ledger:test --tests org.apache.bookkeeper.mledger.impl.OffloadPrefixTest
./gradlew :managed-ledger:test --tests org.apache.bookkeeper.mledger.impl.OffloadLedgerDeleteTest

Does this pull request potentially affect one of the following parts:

This only changes the internal scheduling behavior of automatic managed ledger offload. Public APIs, configs, metrics, explicit/manual offload behavior, and offload metadata semantics are
unchanged.

Automatic offload can be triggered repeatedly while a previous offload is still running. Each trigger may read offload policies, scan the ledger list, fail to acquire the offload mutex, and schedule another 100ms retry. With slow offload or many ledgers, this can create unnecessary scheduler and executor pressure. Coalesce automatic triggers so there is at most one in-flight automatic offload and one pending rerun. If another trigger arrives during an in-flight offload, run one follow-up pass after the current offload completes. This keeps the final offload progression behavior while avoiding repeated retry loops, policy reads, and ledger scans. Keep explicit offload requests unchanged, and rename the automatic sentinel to make it clear that its Position value is not consumed. Add tests for trigger coalescing, coalesced reruns, and automatic state release when offload thresholds are disabled.

void-ptr974 changed the title ~~[improve][ml] Coalesce automatic offload triggers to reduce retry loops and ledger scans~~ [improve][offload] Coalesce automatic offload triggers to reduce retry loops and ledger scans May 16, 2026

void-ptr974 mentioned this pull request May 23, 2026

[Improve][Offload] Automatic managed ledger offload triggers can create redundant retry loops #25859

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[improve][offload] Coalesce automatic offload triggers to reduce retry loops and ledger scans#25793

[improve][offload] Coalesce automatic offload triggers to reduce retry loops and ledger scans#25793
void-ptr974 wants to merge 1 commit into
apache:masterfrom
void-ptr974:fix_ml_auto_offload_coalesce

void-ptr974 commented May 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

void-ptr974 commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related issue

Motivation

Modifications

Impact

Verifying this change

Does this pull request potentially affect one of the following parts:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

void-ptr974 commented May 16, 2026 •

edited

Loading