Skip to content

KAFKA-20310: Persist previousProducerId and nextProducerId in transaction log#21828

Open
haltandcatchwater wants to merge 1 commit intoapache:trunkfrom
haltandcatchwater:KAFKA-20310-persist-producer-ids
Open

KAFKA-20310: Persist previousProducerId and nextProducerId in transaction log#21828
haltandcatchwater wants to merge 1 commit intoapache:trunkfrom
haltandcatchwater:KAFKA-20310-persist-producer-ids

Conversation

@haltandcatchwater
Copy link

Summary

TransactionLog.valueToBytes() does not set PreviousProducerId or NextProducerId on the TransactionLogValue when serializing transaction metadata. Both fields exist in the schema (tagged fields at version 1+, tags 0 and 1) and are correctly read back by readTxnRecord(), but because they are never written, they always deserialize as -1 (NO_PRODUCER_ID) after coordinator failover.

Impact

This causes two failure modes described in KAFKA-20310:

  1. Coordinator failover during epoch exhaustion: nextProducerId is lost, so prepareComplete() cannot rotate to the new producer ID. The producer is stuck at the exhausted epoch with no recovery path.

  2. Client retry after epoch rotation failover: prevProducerId is lost, so the validation check prevProducerId == expectedProducerId fails. The client receives PRODUCER_FENCED.

Fix

Set both fields in valueToBytes(), guarded by logValueVersion >= 1 since they are tagged fields only available in the flexible version (1+). At version 0, the fields are not set, preserving backward compatibility.

Compatibility

These are tagged fields (tags 0 and 1) in the TransactionLogValue schema. Tagged fields are forward/backward compatible by design:

  • Older brokers reading version 1 logs ignore unknown tags.
  • Newer brokers reading version 0 logs see the default value (-1), which is the existing behavior.

NextProducerEpoch (tag 3) was intentionally left out of this fix because the read path currently hardcodes RecordBatch.NO_PRODUCER_EPOCH rather than reading the field, making it a separate behavioral discussion.

Tests

Two new tests added to TransactionLogTest:

  • shouldRoundTripPreviousAndNextProducerIds — verifies both fields survive serialization at version 1+
  • shouldNotPersistProducerIdsAtVersion0 — verifies version 0 serialization is unchanged (fields default to NO_PRODUCER_ID)

All 16 existing tests continue to pass.

…tion log

valueToBytes() was not setting PreviousProducerId or NextProducerId on
the TransactionLogValue, causing these fields to reset to -1
(NO_PRODUCER_ID) after coordinator failover. The read path correctly
deserializes both fields, so after a failover the transaction metadata
loses track of which producer IDs were involved in epoch rotation.

This causes two failure modes:
- Coordinator failover during epoch exhaustion leaves the producer
  stuck at the exhausted epoch with no recovery path.
- Client retry after epoch rotation gets PRODUCER_FENCED because
  the previous producer ID no longer matches.

The fix sets both fields in valueToBytes, guarded by logValueVersion >= 1
since they are tagged fields only available in the flexible version.
@github-actions github-actions bot added triage PRs from the community transactions Transactions and EOS small Small PRs labels Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

small Small PRs transactions Transactions and EOS triage PRs from the community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant