Describe the issue
Automatic managed ledger offload can be triggered repeatedly while a previous automatic offload is still running, for example around ledger rollover or topic load.
Since only one offload can run at a time, repeated automatic triggers do not improve the final offload result. Instead, each trigger can independently enter the offload path, scan ledgers, fail to acquire the offload mutex, and schedule another retry.
Problem
Before automatic triggers are coalesced, repeated triggers can perform the same work independently:
- Read offload policies.
- Scan the managed ledger ledger list.
- Try to acquire the offload mutex.
- Fail because another offload is already running.
- Schedule another retry after 100ms.
When offload is slow, or when a managed ledger has many ledgers, these retries can build up unnecessary scheduler and executor work. They can also repeat policy reads and ledger-list scans that do not change the final offload result.
Expected behavior
Automatic managed ledger offload triggers should be coalesced while an automatic offload is already in progress:
- There should be at most one in-flight automatic offload.
- Repeated automatic triggers during the in-flight offload should be merged into one pending rerun.
- After the current automatic offload completes, one follow-up pass should run if any trigger arrived meanwhile.
- New ledgers that become eligible while an offload is running should still be picked up by the follow-up pass.
- Explicit/manual offload requests should keep the existing
CompletableFuture<Position> behavior.
Actual behavior
Each automatic trigger can independently enter the offload path and schedule its own retry loop when another offload is already running. Under slow offload or large-ledger workloads, this can cause redundant 100ms retries, repeated policy reads, repeated ledger-list scans, and unnecessary scheduler/executor pressure.
Impact
The issue is mostly about avoiding duplicate work and reducing pressure while automatic offload is already in progress. The final automatic offload progression should be preserved: if new ledgers become eligible while an offload is running, a follow-up pass can still offload them.
Verification / reproducer
The scenario is covered by tests added in #25793:
- Repeated automatic triggers during an in-flight offload do not create independent retry loops.
- A coalesced automatic trigger causes one follow-up offload pass.
- Automatic offload state is released when offload thresholds are disabled, so later valid triggers can still run.
Local verification from the related PR:
git diff --check
./gradlew :managed-ledger:test --tests org.apache.bookkeeper.mledger.impl.OffloadPrefixTest
./gradlew :managed-ledger:test --tests org.apache.bookkeeper.mledger.impl.OffloadLedgerDeleteTest
Affected area
Managed ledger automatic offload scheduling and retry behavior.
Related PR
#25793
Describe the issue
Automatic managed ledger offload can be triggered repeatedly while a previous automatic offload is still running, for example around ledger rollover or topic load.
Since only one offload can run at a time, repeated automatic triggers do not improve the final offload result. Instead, each trigger can independently enter the offload path, scan ledgers, fail to acquire the offload mutex, and schedule another retry.
Problem
Before automatic triggers are coalesced, repeated triggers can perform the same work independently:
When offload is slow, or when a managed ledger has many ledgers, these retries can build up unnecessary scheduler and executor work. They can also repeat policy reads and ledger-list scans that do not change the final offload result.
Expected behavior
Automatic managed ledger offload triggers should be coalesced while an automatic offload is already in progress:
CompletableFuture<Position>behavior.Actual behavior
Each automatic trigger can independently enter the offload path and schedule its own retry loop when another offload is already running. Under slow offload or large-ledger workloads, this can cause redundant 100ms retries, repeated policy reads, repeated ledger-list scans, and unnecessary scheduler/executor pressure.
Impact
The issue is mostly about avoiding duplicate work and reducing pressure while automatic offload is already in progress. The final automatic offload progression should be preserved: if new ledgers become eligible while an offload is running, a follow-up pass can still offload them.
Verification / reproducer
The scenario is covered by tests added in #25793:
Local verification from the related PR:
Affected area
Managed ledger automatic offload scheduling and retry behavior.
Related PR
#25793