[FAQ] PyFlink job keeps restarting with JSON deserialization error

### Course

data-engineering-zoomcamp

### Question

Why does the PyFlink streaming job fail with a JSON deserialization error when consuming records from the Kafka/Redpanda topic?

### Answer

The Green Taxi dataset contains NaN values in some numeric fields (for example passenger_count). When the producer serializes rows to JSON and sends them to the Kafka topic, those NaN values appear in the payload. Flink’s JSON parser does not accept NaN because it is not valid JSON, so the Kafka source fails during deserialization and the job keeps restarting.
Fix: clean the dataset before producing events by replacing NaN values with null or a valid number before serialization.

### Checklist

- [x] I have searched existing FAQs and this question is not already answered
- [x] The answer provides accurate, helpful information
- [x] I have included any relevant code examples or links

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FAQ] PyFlink job keeps restarting with JSON deserialization error #244

Course

Question

Answer

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FAQ] PyFlink job keeps restarting with JSON deserialization error #244

Description

Course

Question

Answer

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions