Skip to content

[FAQ] Unable to download <parquet link> using wget from the TLC Trip Record Data website. #250

@Kimchi21

Description

@Kimchi21

Course

data-engineering-zoomcamp

Question

Why does running either !wget https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2025-11.parquet or !wget --no-check-certificate https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2025-11.parquet still fail when downloading the parquet file?

Answer

Even when using --no-check-certificate, the download may still fail because the issue is not related to SSL verification but network-level blocking of Amazon CloudFront domains.

In some networks, requests to the dataset URL may be redirected to a block page such as https://blocked.sbmd.cicc.gov.ph/

Solution 1 - Skip Certificate Check (highlighted in this FAQ)

Try downloading the file with SSL verification disabled:

!wget --no-check-certificate https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2025-11.parquet

Solution 2 - If your network is blocking CloudFront entirely, connect to a VPN and run the original command again:

!wget https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2025-11.parquet

Using a VPN successfully bypassed the network block.

Checklist

  • I have searched existing FAQs and this question is not already answered
  • The answer provides accurate, helpful information
  • I have included any relevant code examples or links

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions