Skip to content

Conversation

@doconnoronca
Copy link
Contributor

@doconnoronca doconnoronca commented Jun 15, 2025

I sometimes run into problems on TransSee where my system to automatically download GTFS files with libcurl fails because the web server hosting it is blocking it.

Two of the reason this happens is because the web server only allow actual browsers to download files and because I am blocked from downloading it because they are excluding access to IP addresses other then ones in the United States from accessing it. I am in Canada.

This change addresses this by adding best practices advising against this.

@skinkie
Copy link
Contributor

skinkie commented Jun 15, 2025

While supportive, the statements have to do with reuse and open data. Is that within scope of the best practises?
How does your libcurl implementation handle last modified and etags? Also part of the best practise.

@doconnoronca
Copy link
Contributor Author

While supportive, the statements have to do with reuse and open data. Is that within scope of the best practises?

There are other best practises related to how the file is hosted, like not requiring a login and supporting the file modification date.

How does your libcurl implementation handle last modified and etags? Also part of the best practise.

I use CURLOPT_TIMECONDITION and CURLOPT_TIMEVALUE to set a If-Modified-Since http header, which works for most sources.

@eliasmbd eliasmbd added Change type: Non-Functional Refers to important updates to the specification that do not significantly affect functionalities. Discussion Period The community engages in conversations to help refine and develop the proposal. labels Jun 18, 2025
@eliasmbd eliasmbd added the Former Governance Applies This proposal is subject to the former governance process which predates July 7, 2025. label Jul 7, 2025
@tzujenchanmbd
Copy link
Collaborator

I wonder if the current use of negative phrasing (e.g., "should not") might come across as a bit strong, especially considering that the final decision rests with data producers. In some regions, producers may have "compelling reasons" (e.g., political considerations).

Would it make sense to rephrase these statements using more positive, recommendation-based language? This might better respect the position of producers while still conveying the best practices valued by the community.

@skinkie
Copy link
Contributor

skinkie commented Jul 7, 2025

If you are creating a standard you must use standard language not "do whatever you want to do".

@doconnoronca
Copy link
Contributor Author

"Should not" is a phase with a specific definition in the standard.

@kurtraschke
Copy link

I would suggest that any recommendation made to serving GTFS feeds should apply to GTFS-rt as well. (I am thinking, for example, of the GTFS-rt endpoint that sporadically responds with a Cloudflare CAPTCHA...)

@felixguendling
Copy link

"Should not" is a phase with a specific definition in the standard.

I think "should not" is a good fit here, quoting from RFC 2119:

SHOULD NOT This phrase, or the phrase "NOT RECOMMENDED" mean that
there may exist valid reasons in particular circumstances when the
particular behavior is acceptable or even useful, but the full
implications should be understood and the case carefully weighed
before implementing any behavior described with this label.


I would suggest that any recommendation made to serving GTFS feeds should apply to GTFS-rt as well. (I am thinking, for example, of the GTFS-rt endpoint that sporadically responds with a Cloudflare CAPTCHA...)

I agree. Adding it to this PR might help to bring awareness.

@doconnoronca
Copy link
Contributor Author

I would suggest that any recommendation made to serving GTFS feeds should apply to GTFS-rt as well. (I am thinking, for example, of the GTFS-rt endpoint that sporadically responds with a Cloudflare CAPTCHA...)

I had less problems with GTFS Realtime probably because they are often hosted on separate web servers rather then the transit agency's main site, but I ran into one with a geographic block yesterday.

I have added similar best practices for GTFS Realtime.

(I noticed the document is really inconsistent about putting a dash between "real" and "time".)

@doconnoronca
Copy link
Contributor Author

I'm going to call for a vote. The voting period will end on Aug 21, 2025 at 23:59:59 UTC.

@skinkie
Copy link
Contributor

skinkie commented Aug 6, 2025

-1 OpenGeo

@felixguendling
Copy link

+1 MOTIS

@gcamp
Copy link
Contributor

gcamp commented Aug 7, 2025

+1 Transit

@skinkie are you suggesting we don't mention open data best practices in the GTFS best practices at all, or that we should re-use existing open data best practices? Tried to look and The Open Definition looks like it's the closest to what GTFS needs, but it's also very broad and strict, probably stricter that we need it to be.

@eliasmbd eliasmbd added Vote to Adopt Community votes to officially adopt the change. and removed Discussion Period The community engages in conversations to help refine and develop the proposal. labels Aug 7, 2025
@skinkie
Copy link
Contributor

skinkie commented Aug 7, 2025

@gcamp either we follow strict "open data" rules / best practices. Or we stick with being a transit standard. While most of us share the first goal (too), we are here for a transit standard.

Using git blame I cannot see where this line actually came from "The URL should be directly accessible without requiring a login to access the feed." but it is also an open activism claim.

@vkrause
Copy link

vkrause commented Aug 7, 2025

+1 Transitous

@westontrillium
Copy link
Contributor

westontrillium commented Aug 7, 2025

+1 Trillium

@skinkie The spec's Best Practices already make value statements regarding open data, so precedence has already been established. This change just adds stipulations to the already-existing best practice for publishing GTFS openly.

If your data is to be considered "public" and "directly...[and] openly downloadable" (https://gtfs.org/documentation/schedule/schedule-best-practices/), thus in compliance with Best Practices, it necessarily cannot simultaneously restrict access—based on geography or otherwise.

@nighthawk
Copy link

+1 SkedGo

@miklcct
Copy link
Contributor

miklcct commented Aug 19, 2025

+1 Aubin

@etienne0101
Copy link
Collaborator

Thank you all for your contributions.
The voting period ended on Aug 21, with 6 votes in favor and 1 vote against. @skinkie, does @westontrillium’s statement above make sense to you?
As a reminder, in the governance model, the PR author (here @doconnoronca) decides whether to re-open the vote.

@skinkie
Copy link
Contributor

skinkie commented Aug 28, 2025

If your data is to be considered "public" and "directly...[and] openly downloadable" (https://gtfs.org/documentation/schedule/schedule-best-practices/), thus in compliance with Best Practices, it necessarily cannot simultaneously restrict access—based on geography or otherwise.

My problem is that @isabelle-dr added this, and this was not done with any prior discussion. Hence this is mixing stuff, and I think it would even be better that this is removed. Given my previous argument: this is not part of a technical specification. And I am fully supportive of open data. But not as a "requirement" in a technical specification.

@stevenmwhite
Copy link
Contributor

I agree with @skinkie on the delineation between the technical spec and the practice of hosting data in an open way... but isn't that why this is in the "best practices" and not in the spec itself? This seems to be like a reasonable best practice.

@skinkie
Copy link
Contributor

skinkie commented Aug 28, 2025

I agree with @skinkie on the delineation between the technical spec and the practice of hosting data in an open way... but isn't that why this is in the "best practices" and not in the spec itself? This seems to be like a reasonable best practice.

Not trying to be stuborn. But from this point https://gtfs.org/documentation/schedule/schedule-best-practices/#practice-recommendations-organized-by-file onward the best practice becomes a guideline, with examples. Read: better than the "schema".

@doconnoronca
Copy link
Contributor Author

Not trying to be stuborn. But from this point https://gtfs.org/documentation/schedule/schedule-best-practices/#practice-recommendations-organized-by-file onward the best practice becomes a guideline, with examples. Read: better than the "schema".

The whole page is best practices, which would be guidelines by definition. The spot you point out is just where best practices for individual files begin.

The best practices used to be a separate document but where merged into the specification to give them more prominence.

@isabelle-dr
Copy link
Collaborator

A note on adding the statement about data publishing into the Official Spec from the Best Practices.

Previous discussions:

  • Original issue for adding the Dataset Publishing & General Practices, originally in the Best Practice document, into the official spec: #375
  • PR associated, which added the following statement (amongst others), to which you voted in favor of @skinkie.

Datasets should be published at a public, permanent URL, including the zip file name. (e.g., www.agency.org/gtfs/gtfs.zip). Ideally, the URL should be directly downloadable without requiring login to access the file, to facilitate download by consuming software applications. While it is recommended (and the most common practice) to make a GTFS dataset openly downloadable, if a data provider does need to control access to GTFS for licensing or other reasons, it is recommended to control access to the GTFS dataset using API keys, which will facilitate automatic downloads.

Although this statement was recently added to the spec, it has long existed in the GTFS Best Practices (prior to MobilityData). We see the integration of widely adopted Best Practices into the official spec more of a consolidation than a new addition.

@etienne0101 etienne0101 removed the Vote to Adopt Community votes to officially adopt the change. label Sep 29, 2025
@github-actions
Copy link

This pull request has been automatically marked as stale because of lack of recent activity. It may be closed manually after one month of inactivity. Thank you for your contributions.

@github-actions github-actions bot added the Status: Stale Issues and Pull Requests that have remained inactive for 30 calendar days or more. label Dec 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Change type: Non-Functional Refers to important updates to the specification that do not significantly affect functionalities. Former Governance Applies This proposal is subject to the former governance process which predates July 7, 2025. Status: Stale Issues and Pull Requests that have remained inactive for 30 calendar days or more.

Projects

None yet

Development

Successfully merging this pull request may close these issues.