Skip to content

Conversation

@zhaizhibo
Copy link
Contributor

Motivation

There ard two scenarios where pendingAddEntries can leak in a ManagedLedger.

  1. Leak in Fenced state
    After an add operation, there may be a need to update the ledger's properties. If the ZooKeeper update fails, the callback sets the ManagedLedger state to fenced. Subsequent write operations that fail will trigger ManagedLedgerImpl#ledgerClosed. For a ManagedLedger in the fenced state, this can result in pendingAddEntries not being properly cleaned up, causing a leak.
  2. Leak in Closed state
    The ManagedLedger needs to rollover because the current ledger is full, switching its state to closedLedger or creatingLedger. New incoming add requests are appended to the pendingAddEntries queue but have not yet been initiated. If the ManagedLedger is closed (state set to closed) after the rollover is executed but before its callback returns, the requests in the pendingAddEntries queue will never be completed, resulting in a leak.

Modifications

  1. add a new function clearNotInitiatedPendingAddEntries(), which will be called after ManagedLedger state be set to closed or fenced. This method will immediately fail all OpAddEntry operations in the pendingAddEntries queue that have not yet been initiated and remove them from the queue.
  2. In the OpAddEntry#handleAddFail method, add a check for the fenced state. If the ManagedLedger is fenced, we should also clear all subsequent pending write requests in the queue to prevent leaks.

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Feb 11, 2026
@zhaizhibo zhaizhibo changed the title [fix] fix ManagedLedgerImpl's pendingAddEntries leak. [fix] Fix ManagedLedgerImpl's pendingAddEntries leak. Feb 11, 2026
@lhotari lhotari requested a review from Copilot February 11, 2026 12:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses pendingAddEntries leaks in ManagedLedgerImpl when the managed ledger transitions to Closed or Fenced while there are queued add operations that were never initiated.

Changes:

  • Add OpAddEntry.closeIfNotInitiated() and use it to fail/remove queued-but-not-initiated add operations on close/fence.
  • Schedule the post-rollover metadata-update completion to run on the managed ledger executor thread.
  • Clear pending writes when ledgerClosed() is invoked while in Fenced state.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/OpAddEntry.java Adds a helper to atomically close an OpAddEntry if it has not been initiated.
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java Adds selective clearing of not-initiated pending adds on close/fence and adjusts rollover callback execution/threading and fenced handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


factory.close(this);
STATE_UPDATER.set(this, State.Closed);
executor.execute(() -> clearNotInitiatedPendingAddEntries(new ManagedLedgerException("Managed ledger is closed")));
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In asyncClose(), the new failure passed to clearNotInitiatedPendingAddEntries() is a generic ManagedLedgerException("Managed ledger is closed"). This makes failures for queued-but-not-initiated adds inconsistent with other closed paths (eg internalAsyncAddEntry uses ManagedLedgerAlreadyClosedException) and can break callers that key off the specific subtype. Consider using ManagedLedgerAlreadyClosedException (or at least matching the existing close message pattern) here.

Suggested change
executor.execute(() -> clearNotInitiatedPendingAddEntries(new ManagedLedgerException("Managed ledger is closed")));
executor.execute(() -> clearNotInitiatedPendingAddEntries(
new ManagedLedgerAlreadyClosedException("Managed ledger is closed")));

Copilot uses AI. Check for mistakes.
maybeUpdateCursorBeforeTrimmingConsumedLedger();
}
metadataMutex.unlock();
// make sure that pendingAddEntries' opeartions are executed in the same thread
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment has a typo: "opeartions" → "operations".

Suggested change
// make sure that pendingAddEntries' opeartions are executed in the same thread
// make sure that pendingAddEntries' operations are executed in the same thread

Copilot uses AI. Check for mistakes.
Comment on lines 1759 to 1769
LedgerHandle originalCurrentLedger = currentLedger;
ledgers.put(lh.getId(), newLedger);
currentLedger = lh;
currentLedgerTimeoutTriggered = new AtomicBoolean();
currentLedgerEntries = 0;
currentLedgerSize = 0;
updateLedgersIdsComplete(originalCurrentLedger);
mbean.addLedgerSwitchLatencySample(System.currentTimeMillis()
- lastLedgerCreationInitiationTimestamp, TimeUnit.MILLISECONDS);
// May need to update the cursor position
maybeUpdateCursorBeforeTrimmingConsumedLedger();
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new executor.execute() block updates ledgers/currentLedger and calls updateLedgersIdsComplete() without re-checking the ManagedLedger state. Since this work is now deferred, it can run after asyncClose()/setFenced and potentially transition the state back to LedgerOpened and/or touch ledgers after closure. Consider checking STATE_UPDATER.get(this) for Closed or isFenced inside the executor task and bailing out (while still releasing metadataMutex) if the ledger is no longer writable.

Suggested change
LedgerHandle originalCurrentLedger = currentLedger;
ledgers.put(lh.getId(), newLedger);
currentLedger = lh;
currentLedgerTimeoutTriggered = new AtomicBoolean();
currentLedgerEntries = 0;
currentLedgerSize = 0;
updateLedgersIdsComplete(originalCurrentLedger);
mbean.addLedgerSwitchLatencySample(System.currentTimeMillis()
- lastLedgerCreationInitiationTimestamp, TimeUnit.MILLISECONDS);
// May need to update the cursor position
maybeUpdateCursorBeforeTrimmingConsumedLedger();
// Re-check state after deferring to executor to avoid mutating a closed/fenced ledger
if (STATE_UPDATER.get(ManagedLedgerImpl.this) == State.Closed || isFenced()) {
if (log.isDebugEnabled()) {
log.debug("[{}] Skipping ledger update after create complete because ledger is "
+ "closed or fenced", name);
}
} else {
LedgerHandle originalCurrentLedger = currentLedger;
ledgers.put(lh.getId(), newLedger);
currentLedger = lh;
currentLedgerTimeoutTriggered = new AtomicBoolean();
currentLedgerEntries = 0;
currentLedgerSize = 0;
updateLedgersIdsComplete(originalCurrentLedger);
mbean.addLedgerSwitchLatencySample(System.currentTimeMillis()
- lastLedgerCreationInitiationTimestamp, TimeUnit.MILLISECONDS);
// May need to update the cursor position
maybeUpdateCursorBeforeTrimmingConsumedLedger();
}

Copilot uses AI. Check for mistakes.
Comment on lines 1945 to 1946
} else if (state == State.Fenced) {
clearPendingAddEntries(new ManagedLedgerFencedException("Managed ledger is fenced"));
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ledgerClosed() only checks state == State.Fenced, but State.FencedForDeletion also returns true for state.isFenced(). In that case pending add entries would still not be cleared, which can reintroduce the leak for fenced-for-deletion. Consider using state.isFenced() here (and keep the exception message aligned with the specific fenced state if needed).

Suggested change
} else if (state == State.Fenced) {
clearPendingAddEntries(new ManagedLedgerFencedException("Managed ledger is fenced"));
} else if (state.isFenced()) {
clearPendingAddEntries(new ManagedLedgerFencedException(
"Managed ledger is fenced (" + state + ")"));

Copilot uses AI. Check for mistakes.
Comment on lines +2081 to +2089
void clearNotInitiatedPendingAddEntries(ManagedLedgerException e) {
Iterator<OpAddEntry> iterator = pendingAddEntries.iterator();
while (iterator.hasNext()) {
OpAddEntry op = iterator.next();
if (op.closeIfNotInitiated()) {
op.failed(e);
iterator.remove();
}
}
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are existing tests in this module for pendingAddEntries behavior, but the new leak-prevention logic (clearNotInitiatedPendingAddEntries + OpAddEntry.closeIfNotInitiated) is not covered. Please add tests for the two scenarios described in the PR (close/fence occurring while ops are queued but not yet initiated, and fenced write-failure path clearing subsequent queued ops) to prevent regressions.

Copilot uses AI. Check for mistakes.
@zhaizhibo
Copy link
Contributor Author

Pull request overview

This PR addresses pendingAddEntries leaks in ManagedLedgerImpl when the managed ledger transitions to Closed or Fenced while there are queued add operations that were never initiated.

Changes:

  • Add OpAddEntry.closeIfNotInitiated() and use it to fail/remove queued-but-not-initiated add operations on close/fence.
  • Schedule the post-rollover metadata-update completion to run on the managed ledger executor thread.
  • Clear pending writes when ledgerClosed() is invoked while in Fenced state.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/OpAddEntry.java Adds a helper to atomically close an OpAddEntry if it has not been initiated.
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java Adds selective clearing of not-initiated pending adds on close/fence and adjusts rollover callback execution/threading and fenced handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs ready-to-test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants