-
Notifications
You must be signed in to change notification settings - Fork 206
Enhancing GTFS Schedule and Realtime with original_trip_id #534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The original_trip_id is added to both GTFS Schedule and GTFS Realtime. This field allows the association of trips across different realtime and schedule standards, e.g., NeTEx and SIRI. It also allows matching between schedule and realtime.
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
|
You need to update the .proto file for the real time field. Please use a larger number and avoid the field numbers I am proposing in #504 , as I intend to produce it as soon as I can for integration with other systems such as Darwin (my static GTFS has this field already). |
Thank you @miklcct, I missed that. I looked at #504 and it seems that 8 is available as field number for the original_trip_id (within TripDescriptor, underneath optional ModifiedTripSelector modified_trip = 7;). This is also the number we currently use in our implementation. Would that interfere with your work? |
|
That's great, so I can continue to use 5 and 6 for trip_headsign and trip_short_name respectively. Are you using 5 or 6 for something else? |
No, we only add I'll push that now. |
According to discussions in google#534 to reflect the documentation in reference.md
|
Critical question: why is this important for passenger information. I read the interoperability argument, but not how that is used. |
It allows consumers to match the GTFS data with external data from other sources. |
|
Concrete examples please. The GTFS ecosystem is that GTFS can be matched with GTFS-RT. In what situation is it valuable to have other trip identifiers. I can think of some, but those should be written down. |
I am using the field to match upstream data from systems of Network Rail, where their IDs are only unique on a single day. |
Are you asking why this matters to the rider? |
Exactly. |
We give an example in the introductory text under "Implementation": "In one case our consumers use the original_trip_id in GTFS Realtime to match with timetable data in the (proprietary) HRDF format." Our consumers use both GTFS and HRDF. However, HRDF is able to better reflect certain services in Switzerland due to its more comprehensive and complex data structure, such as for linked trips. This approach allows to maintain the efficient structure of GTFS (which is one of the reasons for its popularity with our consumers), while providing the full bandwidth of our available customer information by combining it with our other formats.
I would also like to point out this statement by @miklcct, which would be another motivation for this field. This is also true for our SJYID. |
|
Could you say a little more why your consumers want to use the HRDF together with GTFS and not go either full GTFS/GTFS-RT, HRDF or Netex/SIRI? If you're already dealing with the complexities of linked trips in HDRF, would the extra complexity of (say) SIRI-ET make a difference? To summarise: I'm a bit sceptical of changing the GTFS specification to accommodate non-GTFS workflows. |
I have the same skeptism. But there is fundamental thing both GTFS and NeTEx are overlooking. For every time in GTFS or NeTEx a property changes, a new identifier must be introduced. Now you could argue "this makes a lot of sense" and for some organisations (and even implementers) it does not. They are instead managing these validities of properties at different levels. That is why virtually everything is in conflict with each other once HRDF is mentioned. This is the absolute root cause. |
|
The very reason why this field is needed is that a consumer with local knowledge can use it to reference other passenger-facing systems outside the GTFS world using the |
|
It is not just HRDF. As Stefan mentions there is a core problem in NeTEx and GTFS: uniqueness of the trip in the file. Traditional public transport has uniqueness of the ServiceJourney per operating day. Even when the trip is slightly different e.g. for Wednesday, it still is the trip that starts at 08:01 to Zürich. This can be expressed by the global id. We have to split the trip into different ones for GTFS, but for many other use cases (and systems). It is still useful/crucial to know that this is indeed the one. |
|
So what the PR really does is to accomodate both "worlds". |
|
Can you not do the same "split" when converting to GTFS-RT? BTW, I don't doubt that it would be useful to your consumers but I doubt that it's GTFS's responsibility to deal with other representations of public transport. |
|
From a GTFS standpoint it can also be interesting. For example aggregating all the "truly unique trips". It is a very specific use case, therefore I hope some more examples can be provided. |
|
Non sequitur @leonardehrenfried: With your argumentation we could say: Why should we produce GTFS at all. HRDF and VDV 454 contain all necessary information. It can't be the responsibility of Switzerland to facilitite work for others? We believe this PR is a simple way to simplify interactions between the different formats. In the ideal world one can use on realtime stream and a time table stream of your choice. |
|
I'll grant you that: it's not a complicated proposal. |
Also there is currently no facility in GTFS (unlike NeTEx) to specify that the modified Wednesday 08:01 is the same trip as the Wednesday 08:01, that if the timetable has been modified from the base timetable it is not currently possible for a client with saved pre-planned journey to know that the timetable has been changed. They will just fail to find the trip in the updated timetable. I have a use case where a traveller can plan journeys up to months beforehand and saved in the user's device. If the timetable is changed, requiring a new ID in the GTFS, it is impossible for the client to find the new ID (however, I think this is worth another PR to associate a trip to a calendar exception, as it is not the purpose of Roughly speaking, it can be described in the following way: In such case, a trip running on a modified timetable on Easter can associate with the original Mon-Fri timetable by specifying another new field (NOT |
I honestly think NeTEx did not standardise "global trip id" either. And yes, there is PrivateCode but that is not the "concept" that we mean here? |
This heterogenity is what reflects mobility in Europe. When looking at CEN (European) standards you see they allow a certain amount of interpretation and flexibility, which is the only way to accomodate for the many local differences. Standards that are restrictive will (most likely) not change what we do in Europe (or at least in Switzerland). More likely we will not adopt those standards. |
|
The flexibility of standards in interpretation means that it is not possible for implementations to guarantee correct desired behaviour across different feeds in different places |
|
I'd argue that it depends a lot on the passenger what is considered their regular commute. Some examples:
TLDR: The planning perspective's (of large European rail-focused organisations with fixed schedules) notion of a "commute" doesn't match the passengers' notions. Often, GTFS trips are either a) referenced by multiple (relevant) companies or even b) operated by multiple companies. Most of them will likely have their own "original" trip ID for their planning purposes for the same GTFS trip, so effectively for one GTFS trip ID we end up with >1 "original" trip IDs. Once we consider a one-to-many mapping, I see chaos coming. While it's technically feasible to just include all known "original" trip IDs and hope that consumers will recognise any >1 among them, I think this approach needs more thinking and coordination. This is why, in addition being sceptic about the use case as explained above, I think such a field should stay in an extension for now. (An extension can still be widely agreed-upon and developed in an open and backwards-compatible manner!) |
Of course it is still his commute. My app would tell the user that the wheelchair value is changed from the saved journey and asks the user how to proceed (e.g. picking up another departure).
What I have saved is the complete itinerary, with all the access / transfer / egress in addition to all the PT leg as well. All the step by step details from how to cycle from my home to a parking, walk to the platform, take the train, take to the office are saved as part of the regular commute, and can be shown to the user if the refetched PT leg is different. |
|
It seems to me it was a mistake to require the trip_id be unique. In retrospect it should have been unique for the schedule day. It would have solved quite a few problems I seen people complain about over the years. Adding a new way to link trips access days and schedule changes could be useful in unexpected ways. |
That is true. However, the reason is not only that the same transport behavior is represented differently in different countries (although it could be done alike), but also because the transport is simply done differently. For the latter the limitations of a standard won't initiate a change in the complete operative infrastructure of a country (costing millions). |
But from the points of a passenger, the concepts are applicable everywhere in the world. We are talking about GTFS, a standard for passenger information, not Transmodel. The concept which we want to introduce is the notion of "the same departure" across timetable changes, which for technical reasons necessitates a change of trip_id. |
I don't clearly see how linking trips across schedule changes is related to linking trips across days. How does knowing that an operator considers a certain trip on Wednesdays to be "the same" as a certain trip on Tuesdays, help me know that a new replacement trip on Wednesday 19 February corresponds to a specific original trip on Wednesday 19 February? The idea of a dataset being built around a relation called It may be worth noting that a single trip (i.e. with a single |
You are now conflating two issues here.
The use cases are:
|
|
@abyrd you say it is geared towards simpler schedules and that is fine. However, some schedules are not simple even when often simplifiying it for GTFS is fine for easier consumption. However, if sombody needs to see the nuances, they need to get the information from somewhere and for that they need the necessary reference. I agree that probably this reference will not always look the same due to the lack of homogenisation and master data management. But that's the path forward in my opinion. Let's assume the following scenario:
How do you suggest this should be done in the current situation? |
I am not sure what you mean by "my world", and I am making no statements that things "should not be done". I think I should clarify here that I am not strongly invested in a particular outcome for this PR. I was just trying to encourage communication between commenters who seemed to have different perspectives. I think most or all commenters are sympathetic to your situation and want to help transform your proposal into something that fits well into GTFS and will be well-received and well-understood by the community. My understanding is that some commenters would prefer a field defined by its concrete use cases, focusing on relationships within GTFS, and remaining silent on any relationship to data sets outside GTFS. My intent was to encourage communication about whether the only known use case is rediscovering trips (for example, that someone has bookmarked or purchased a ticket for) when they have been altered by realtime data or other small schedule modifications. That is: are we functionally just talking about a
I understand that passengers will not look directly at ID strings. I was trying to encourage clear description of concrete situations encountered by the passenger where this ID would be used by software. The example provided by @miklcct would require an ID that maintains identity across small modifications to a trip on a particular day. I was asking: do we have any analogous examples of concrete use cases requiring an ID that groups trips across multiple days?
This PR seems to be about adding a field that can contain any "original" identifier for the trip, without attaching any meaning or usage context to that identifier. My understanding of comments from other contributors is that they prefer to see fields that have a specific meaning and use case. To rephrase my question: Why would you prefer a field that can contain any arbitrary external identifier with no stated purpose, instead of a field or fields that contain identifiers with specific purposes, such as matching realtime messages to trips that have been altered? |
|
Unfortunately although it was my original idea, I am now skeptical about this PR. It is impossible to define a precise semantic about what "upstream" data the I think we should disregard the Google extension and define fields which cater for the use cases I mentioned above. |
I also consider these two different cases; I agree with your perspective. It seemed to me that they were being conflated in previous comments about the importance of |
|
I think that we can define a field, called Then we can attach a specific requirement of that ID: Two trips with the same Use case: I want to save my regular commute. The consumer can tell me immediately that my commute will be affected by planned engineering works on a certain day, and the equivalent journey will be a certain replacement.
|
I can think of a few possibilities. I am not promoting these or saying they're desirable, just brainstorming:
Keeping in mind that GTFS consumers should be tolerant of unofficial extensions. If something is clearly specific to your dataset, you could add [edit: I should have listed perhaps the most obvious, an extension field |
|
These most recent comments from @abyrd and @miklcct are getting much closer to something that seems acceptable to me. The ideas have specific, definable purposes. Would If that's true, just something to keep in mind that this would be a new model of thinking. |
|
I had hoped to contribute to this discussion with something useful from a (North American) producer perspective but honestly am finding it challenging to gain a technical understanding of what’s being discussed. It sounds like the conversation may be moving beyond what has been originally proposed anyway, but I wanted to call out that even after reading through the original PR and the entirety of this comment thread, I didn’t find any comprehensive examples using actual data or tables (not just isolated trips.txt snippets) so we can clearly see the proposed file/field relationships. As such, I’m still unclear on whether there are any restrictions for an I’m also having trouble wrapping my head around some of the language used to describe the temporal restrictions for
These statements all seem to be describing the same thing though slightly differently and potentially contradict one another. What exactly are we saying? a single |
|
Here's a real-world example from our implementation. GTFS Schedule URL. We have the following situation. The train S6 goes from Basel Bad Bf to Basel SBB. This is a train that many commuters and other regulars use and for which they know its stops. So they keep in mind the "pattern", especially, because in Switzerland we have a "Taktfahrplan". So the trains are part of that pattern. On occasions "their S6" stops at a different quay. This is what we need to be able to represent to the commuters. And we do as described below using the "original_trip_id" (Note, we named the field based on GTFS Transit. It is used for the described use-case. At least based on what we understood from an exchange with data providers for Google). Swiss Journey ID (SJYID) aka original_trip_id: ch:1:sjyid:100308:87865-001
|
|
@westontrillium @stevenmwhite @abyrd @miklcct - any thoughts/feedback concerning our example? |
|
Whether I am going to support the proposal will depend on how it is defined. If the definition is "an internal identifier which is used to group related trips across different dates or across different version of feeds", I'll support it. If the definition is "an ID which is used to reference external information systems", I'll oppose it. |
|
This pull request has been automatically marked as stale because of lack of recent activity. It may be closed manually after one month of inactivity. Thank you for your contributions. |
|
This conversation is still active, not stale. |
|
This pull request has been automatically marked as stale because of lack of recent activity. It may be closed manually after one month of inactivity. Thank you for your contributions. |

This pull request is related to issue #462
Context: In Switzerland we've introduced the Swiss Journey ID (documentation only in DE/FR/IT: https://www.oev-info.ch/de/datenmanagement/sid4pt-swiss-id-public-transport/swiss-journey-identification-sjyid).
This ID is valid for one operating day and across different days of a scheduled year. It therefore maps to one or more trip_ids.
Proposal: Based on the suggestion by @miklcct (in the referenced issue) we propose to use the original_trip_id (as defined in https://developers.google.com/transit/gtfs/reference?hl=en) in GTFS Schedule and GTFS Realtime to represent constructs such as our SJYID. With this it is possible to combine trips from GTFS Schedule and GTFS Realtime with other standards such as SIRI or NeTEx, which have a similar concept.
Implementation: Since the 12.12.2024 we offer the original_trip_id (filled with our SJYID) as part of GTFS Schedule (doc: https://opentransportdata.swiss/en/cookbook/gtfs/#tripstxt) and GTFS Realtime (doc: https://opentransportdata.swiss/en/cookbook/gtfs-rt/#Trip_updates). In one case our consumers use the original_trip_id in GTFS Realtime to match with timetable data in the (proprietary) HRDF format.
Generalizability: Based on early discussions with other public transport providers, we think this enhancement can benefit many other producers and consumers and increase the inter-operability of GTFS with other standards.