Skip to content

[Bug][ML] ManagedLedger.terminate() can race with ledger rollover and make a terminated ledger writable again #25858

@void-ptr974

Description

@void-ptr974

Describe the bug

ManagedLedger.terminate() can race with ledger rollover and allow a managed ledger that has already entered Terminated state to be moved back to LedgerOpened by a delayed rollover callback.

Problem

ManagedLedger.terminate() seals the managed ledger at the current BookKeeper committed boundary. After termination, no new entries should be accepted and the managed ledger should not become writable again.

The key invariant should be:

Any add operation that is acknowledged successfully to the caller must have a position less than or equal to the final terminatedPosition.

However, there is a race between terminate() and ledger rollover:

  1. An add fills the current ledger and triggers rollover.
  2. The managed ledger moves into ClosingLedger / CreatingLedger.
  3. terminate() runs before the rollover create/switch callback finishes and marks the managed ledger as Terminated.
  4. The delayed createComplete() or updateLedgersIdsComplete() callback resumes the old rollover flow.
  5. The callback can set the state back to LedgerOpened, making a terminated managed ledger writable again.

Expected behavior

After terminate() takes ownership of the managed ledger state:

  • Terminated should remain the final write state.
  • Queued adds that were not sent to BookKeeper should fail with ManagedLedgerTerminatedException.
  • In-flight adds already sent to BookKeeper should only succeed if they are included in the final LAC / terminatedPosition.
  • Late ledger create or ledger switch callbacks should not reopen the managed ledger.
  • Termination should not create or switch to another writable ledger for pending writes.

Actual behavior

A delayed rollover callback can continue the normal ledger-switch path after termination and move the managed ledger back to LedgerOpened. This can break termination semantics, incorrectly handle pending writes as normal rollover writes, or leave add callbacks hanging when BookKeeper close drains writes that were not included in the final LAC.

Verification / reproducer

The scenario is covered by tests added in #25795:

  • terminateDuringLedgerSwitchKeepsTerminatedState
  • terminatePositionIncludesAddAlreadyAckedByBookKeeper
  • terminateFailsInflightAddDrainedByLedgerClose
  • ledgerSwitchCompletionDoesNotReopenTerminatedLedger

Local verification from the related PR:

./gradlew :managed-ledger:test --tests org.apache.bookkeeper.mledger.impl.ManagedLedgerTerminationTest
./gradlew :managed-ledger:checkstyleMain :managed-ledger:checkstyleTest

Affected area

Managed ledger termination and ledger rollover state transitions.

Related PR

#25795

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions