Course
data-engineering-zoomcamp
Question
How to Inspect Messages in a Kafka Topic Using Offsets?
Answer
When working with Apache Kafka, one of the most common tasks when debugging data streams is inspecting specific messages within a topic. In many cases, errors or unexpected behaviors are associated with a specific offset within a partition.
1. What is an offset in Kafka
In Kafka, each message within a partition has an incremental number called an offset:
- The
offset is unique within each partition.
- There is no global
offset for the entire topic.
- Consumers use
offsets to keep track of which messages they have already processed.
Example:
| Partition |
Offset |
Message |
| 0 |
0 |
event A |
| 0 |
1 |
event B |
| 0 |
2 |
event C |
In this case, a consumer that has read up to offset 1 will start at 2 when it resumes.
2. Why offsets are useful for debugging
When an error occurs in a real-time data stream, it is common to find messages like:
Error processing message at offset 4839201
If we can inspect the messages near that offset we can:
- See what data caused the error.
- Understand what happened before and after.
- Reproduce the problem locally.
3. Viewing offsets and consumer lag
To see how far a consumer has read, you can use the Kafka CLI tool kafka-consumer-groups:
kafka-consumer-groups \
--bootstrap-server localhost:9092 \
--describe \
--group rides-to-postgres
Which you can also use via Docker without installing it locally:
docker run --rm -it --network pyflink_default confluentinc/cp-kafka:7.6.0 kafka-consumer-groups \
--bootstrap-server redpanda:29092 \
--describe \
--group rides-to-postgres
The arguments mean:
--bootstrap-server to specify the broker.
--describe to request information about a consumer group.
--group [group] to specify the consumer group you want information about.
A typical output in our data stream might look like:
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
rides-to-postgres rides 0 27 27 0 kafka-python-2.3.0-0d42c17a-052c-4770-8326-52cf2537e6d4 172.19.0.1 kafka-python-2.3.0
rides-to-postgres rides 1 25 25 0 kafka-python-2.3.0-0d42c17a-052c-4770-8326-52cf2537e6d4 172.19.0.1 kafka-python-2.3.0
rides-to-postgres rides 2 21 21 0 kafka-python-2.3.0-0d42c17a-052c-4770-8326-52cf2537e6d4 172.19.0.1 kafka-python-2.3.0
Key fields:
- CURRENT-OFFSET: last
offset processed by the consumer
- LOG-END-OFFSET: last
offset available in the topic
- LAG: messages pending to be processed
See the official documentation at: kafka-consumer-groups-sh
4. Consuming messages from the beginning of the topic
To inspect all messages in a topic you can use the Kafka CLI tool kafka-console-consumer:
kafka-console-consumer \
--bootstrap-server localhost:9092 \
--topic rides \
--from-beginning
Alternatively, you can use it from a Docker image:
docker run --rm -it --network pyflink_default confluentinc/cp-kafka:7.6.0 kafka-console-consumer \
--bootstrap-server redpanda:29092 \
--topic rides \
--from-beginning
The arguments mean:
--bootstrap-server to specify the broker.
--topic [topic] to specify the topic to query.
--from-beginning to request all messages from the start of the topic.
This is useful for basic exploration, but does not allow jumping to specific offsets.
See the official documentation at: kafka-console-consumer-sh
5. Inspecting messages from a specific offset
A very useful tool for debugging is kcat (formerly kafkacat).
Example:
kcat -C \
-b localhost:9092 \
-t rides \
-p 0 \
-o 25 \
-c 5
You can also use it from a Docker image to avoid installing it locally:
docker run --network pyflink_default edenhill/kcat:1.7.1 -C \
-b redpanda:29092 \
-t rides \
-p 0 \
-o 25 \
-c 5
The arguments mean:
- consumer mode
-C
- broker
-b: localhost:9092 (or redpanda:29092 from within a Docker network)
- topic
-t: rides
- partition
-p: 0
- start at offset
-o: 25
- read up to
-c: 5 messages
This allows you to see exactly what happens starting from a specific offset.
See the official documentation at: kafkacat usage
Conclusion
Knowing how to inspect messages using offsets is a fundamental skill for working with Kafka. It allows you to understand what is happening inside a topic and debug data pipelines efficiently. Tools like kcat or the Kafka CLI commands make it easy to explore specific messages and understand the state of consumers.
Checklist
Course
data-engineering-zoomcamp
Question
How to Inspect Messages in a Kafka Topic Using Offsets?
Answer
When working with Apache Kafka, one of the most common tasks when debugging data streams is inspecting specific messages within a topic. In many cases, errors or unexpected behaviors are associated with a specific
offsetwithin a partition.1. What is an
offsetin KafkaIn Kafka, each message within a partition has an incremental number called an
offset:offsetis unique within each partition.offsetfor the entire topic.offsets to keep track of which messages they have already processed.Example:
In this case, a consumer that has read up to offset
1will start at2when it resumes.2. Why
offsets are useful for debuggingWhen an error occurs in a real-time data stream, it is common to find messages like:
If we can inspect the messages near that
offsetwe can:3. Viewing
offsets and consumer lagTo see how far a consumer has read, you can use the Kafka CLI tool
kafka-consumer-groups:kafka-consumer-groups \ --bootstrap-server localhost:9092 \ --describe \ --group rides-to-postgresWhich you can also use via Docker without installing it locally:
docker run --rm -it --network pyflink_default confluentinc/cp-kafka:7.6.0 kafka-consumer-groups \ --bootstrap-server redpanda:29092 \ --describe \ --group rides-to-postgresThe arguments mean:
--bootstrap-serverto specify the broker.--describeto request information about a consumer group.--group [group]to specify the consumer group you want information about.A typical output in our data stream might look like:
Key fields:
offsetprocessed by the consumeroffsetavailable in the topic4. Consuming messages from the beginning of the topic
To inspect all messages in a topic you can use the Kafka CLI tool
kafka-console-consumer:kafka-console-consumer \ --bootstrap-server localhost:9092 \ --topic rides \ --from-beginningAlternatively, you can use it from a Docker image:
docker run --rm -it --network pyflink_default confluentinc/cp-kafka:7.6.0 kafka-console-consumer \ --bootstrap-server redpanda:29092 \ --topic rides \ --from-beginningThe arguments mean:
--bootstrap-serverto specify the broker.--topic [topic]to specify the topic to query.--from-beginningto request all messages from the start of the topic.This is useful for basic exploration, but does not allow jumping to specific
offsets.5. Inspecting messages from a specific offset
A very useful tool for debugging is kcat (formerly kafkacat).
Example:
kcat -C \ -b localhost:9092 \ -t rides \ -p 0 \ -o 25 \ -c 5You can also use it from a Docker image to avoid installing it locally:
docker run --network pyflink_default edenhill/kcat:1.7.1 -C \ -b redpanda:29092 \ -t rides \ -p 0 \ -o 25 \ -c 5The arguments mean:
-C-b: localhost:9092 (or redpanda:29092 from within a Docker network)-t: rides-p: 0-o: 25-c: 5 messagesThis allows you to see exactly what happens starting from a specific
offset.Conclusion
Knowing how to inspect messages using
offsets is a fundamental skill for working with Kafka. It allows you to understand what is happening inside a topic and debug data pipelines efficiently. Tools like kcat or the Kafka CLI commands make it easy to explore specific messages and understand the state of consumers.Checklist