Course
data-engineering-zoomcamp
Question
How can I correctly calculate trip duration using Spark timestamps?
Answer
Spark timestamps cannot be subtracted directly to obtain durations in hours.
To compute trip duration, convert the timestamps to Unix time (seconds) first and then divide by 3600.
Example:
from pyspark.sql import functions as F
trip_duration_hours = (
F.unix_timestamp("tpep_dropoff_datetime") -
F.unix_timestamp("tpep_pickup_datetime")
) / 3600
This converts both timestamps to seconds, computes the difference, and returns the duration in hours.
Checklist
Course
data-engineering-zoomcamp
Question
How can I correctly calculate trip duration using Spark timestamps?
Answer
Spark timestamps cannot be subtracted directly to obtain durations in hours.
To compute trip duration, convert the timestamps to Unix time (seconds) first and then divide by 3600.
Example:
from pyspark.sql import functions as F
trip_duration_hours = (
F.unix_timestamp("tpep_dropoff_datetime") -
F.unix_timestamp("tpep_pickup_datetime")
) / 3600
This converts both timestamps to seconds, computes the difference, and returns the duration in hours.
Checklist