KAFKA-20310: Persist previousProducerId and nextProducerId in transaction log#21828
Open
haltandcatchwater wants to merge 1 commit intoapache:trunkfrom
Open
KAFKA-20310: Persist previousProducerId and nextProducerId in transaction log#21828haltandcatchwater wants to merge 1 commit intoapache:trunkfrom
haltandcatchwater wants to merge 1 commit intoapache:trunkfrom
Conversation
…tion log valueToBytes() was not setting PreviousProducerId or NextProducerId on the TransactionLogValue, causing these fields to reset to -1 (NO_PRODUCER_ID) after coordinator failover. The read path correctly deserializes both fields, so after a failover the transaction metadata loses track of which producer IDs were involved in epoch rotation. This causes two failure modes: - Coordinator failover during epoch exhaustion leaves the producer stuck at the exhausted epoch with no recovery path. - Client retry after epoch rotation gets PRODUCER_FENCED because the previous producer ID no longer matches. The fix sets both fields in valueToBytes, guarded by logValueVersion >= 1 since they are tagged fields only available in the flexible version.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
TransactionLog.valueToBytes()does not setPreviousProducerIdorNextProducerIdon theTransactionLogValuewhen serializing transaction metadata. Both fields exist in the schema (tagged fields at version 1+, tags 0 and 1) and are correctly read back byreadTxnRecord(), but because they are never written, they always deserialize as-1(NO_PRODUCER_ID) after coordinator failover.Impact
This causes two failure modes described in KAFKA-20310:
Coordinator failover during epoch exhaustion:
nextProducerIdis lost, soprepareComplete()cannot rotate to the new producer ID. The producer is stuck at the exhausted epoch with no recovery path.Client retry after epoch rotation failover:
prevProducerIdis lost, so the validation checkprevProducerId == expectedProducerIdfails. The client receivesPRODUCER_FENCED.Fix
Set both fields in
valueToBytes(), guarded bylogValueVersion >= 1since they are tagged fields only available in the flexible version (1+). At version 0, the fields are not set, preserving backward compatibility.Compatibility
These are tagged fields (tags 0 and 1) in the
TransactionLogValueschema. Tagged fields are forward/backward compatible by design:-1), which is the existing behavior.NextProducerEpoch(tag 3) was intentionally left out of this fix because the read path currently hardcodesRecordBatch.NO_PRODUCER_EPOCHrather than reading the field, making it a separate behavioral discussion.Tests
Two new tests added to
TransactionLogTest:shouldRoundTripPreviousAndNextProducerIds— verifies both fields survive serialization at version 1+shouldNotPersistProducerIdsAtVersion0— verifies version 0 serialization is unchanged (fields default toNO_PRODUCER_ID)All 16 existing tests continue to pass.