Skip to content

[FAQ] How can I correctly calculate trip duration using Spark timestamps? #235

@AsherJD-io

Description

@AsherJD-io

Course

data-engineering-zoomcamp

Question

How can I correctly calculate trip duration using Spark timestamps?

Answer

Spark timestamps cannot be subtracted directly to obtain durations in hours.

To compute trip duration, convert the timestamps to Unix time (seconds) first and then divide by 3600.

Example:

from pyspark.sql import functions as F

trip_duration_hours = (
F.unix_timestamp("tpep_dropoff_datetime") -
F.unix_timestamp("tpep_pickup_datetime")
) / 3600

This converts both timestamps to seconds, computes the difference, and returns the duration in hours.

Checklist

  • I have searched existing FAQs and this question is not already answered
  • The answer provides accurate, helpful information
  • I have included any relevant code examples or links

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions