Skip to content

[FAQ] How to Inspect Messages in a Kafka Topic Using Offsets #242

@elcapo

Description

@elcapo

Course

data-engineering-zoomcamp

Question

How to Inspect Messages in a Kafka Topic Using Offsets?

Answer

When working with Apache Kafka, one of the most common tasks when debugging data streams is inspecting specific messages within a topic. In many cases, errors or unexpected behaviors are associated with a specific offset within a partition.

1. What is an offset in Kafka

In Kafka, each message within a partition has an incremental number called an offset:

  • The offset is unique within each partition.
  • There is no global offset for the entire topic.
  • Consumers use offsets to keep track of which messages they have already processed.

Example:

Partition Offset Message
0 0 event A
0 1 event B
0 2 event C

In this case, a consumer that has read up to offset 1 will start at 2 when it resumes.

2. Why offsets are useful for debugging

When an error occurs in a real-time data stream, it is common to find messages like:

Error processing message at offset 4839201

If we can inspect the messages near that offset we can:

  • See what data caused the error.
  • Understand what happened before and after.
  • Reproduce the problem locally.

3. Viewing offsets and consumer lag

To see how far a consumer has read, you can use the Kafka CLI tool kafka-consumer-groups:

kafka-consumer-groups \
    --bootstrap-server localhost:9092 \
    --describe \
    --group rides-to-postgres

Which you can also use via Docker without installing it locally:

docker run --rm -it --network pyflink_default confluentinc/cp-kafka:7.6.0 kafka-consumer-groups \
    --bootstrap-server redpanda:29092 \
    --describe \
    --group rides-to-postgres

The arguments mean:

  • --bootstrap-server to specify the broker.
  • --describe to request information about a consumer group.
  • --group [group] to specify the consumer group you want information about.

A typical output in our data stream might look like:

GROUP             TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                             HOST            CLIENT-ID
rides-to-postgres rides           0          27              27              0               kafka-python-2.3.0-0d42c17a-052c-4770-8326-52cf2537e6d4 172.19.0.1      kafka-python-2.3.0
rides-to-postgres rides           1          25              25              0               kafka-python-2.3.0-0d42c17a-052c-4770-8326-52cf2537e6d4 172.19.0.1      kafka-python-2.3.0
rides-to-postgres rides           2          21              21              0               kafka-python-2.3.0-0d42c17a-052c-4770-8326-52cf2537e6d4 172.19.0.1      kafka-python-2.3.0

Key fields:

  • CURRENT-OFFSET: last offset processed by the consumer
  • LOG-END-OFFSET: last offset available in the topic
  • LAG: messages pending to be processed

See the official documentation at: kafka-consumer-groups-sh

4. Consuming messages from the beginning of the topic

To inspect all messages in a topic you can use the Kafka CLI tool kafka-console-consumer:

kafka-console-consumer \
    --bootstrap-server localhost:9092 \
    --topic rides \
    --from-beginning

Alternatively, you can use it from a Docker image:

docker run --rm -it --network pyflink_default confluentinc/cp-kafka:7.6.0 kafka-console-consumer \
    --bootstrap-server redpanda:29092 \
    --topic rides \
    --from-beginning

The arguments mean:

  • --bootstrap-server to specify the broker.
  • --topic [topic] to specify the topic to query.
  • --from-beginning to request all messages from the start of the topic.

This is useful for basic exploration, but does not allow jumping to specific offsets.

See the official documentation at: kafka-console-consumer-sh

5. Inspecting messages from a specific offset

A very useful tool for debugging is kcat (formerly kafkacat).

Example:

kcat -C \
    -b localhost:9092 \
    -t rides \
    -p 0 \
    -o 25 \
    -c 5

You can also use it from a Docker image to avoid installing it locally:

docker run --network pyflink_default edenhill/kcat:1.7.1 -C \
    -b redpanda:29092 \
    -t rides \
    -p 0 \
    -o 25 \
    -c 5

The arguments mean:

  • consumer mode -C
  • broker -b: localhost:9092 (or redpanda:29092 from within a Docker network)
  • topic -t: rides
  • partition -p: 0
  • start at offset -o: 25
  • read up to -c: 5 messages

This allows you to see exactly what happens starting from a specific offset.

See the official documentation at: kafkacat usage

Conclusion

Knowing how to inspect messages using offsets is a fundamental skill for working with Kafka. It allows you to understand what is happening inside a topic and debug data pipelines efficiently. Tools like kcat or the Kafka CLI commands make it easy to explore specific messages and understand the state of consumers.

Checklist

  • I have searched existing FAQs and this question is not already answered
  • The answer provides accurate, helpful information
  • I have included any relevant code examples or links

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions