Enhancing GTFS Schedule and Realtime with original_trip_id #534

davidr1234 · 2025-01-30T16:46:08Z

This pull request is related to issue #462

Context: In Switzerland we've introduced the Swiss Journey ID (documentation only in DE/FR/IT: https://www.oev-info.ch/de/datenmanagement/sid4pt-swiss-id-public-transport/swiss-journey-identification-sjyid).
This ID is valid for one operating day and across different days of a scheduled year. It therefore maps to one or more trip_ids.

Proposal: Based on the suggestion by @miklcct (in the referenced issue) we propose to use the original_trip_id (as defined in https://developers.google.com/transit/gtfs/reference?hl=en) in GTFS Schedule and GTFS Realtime to represent constructs such as our SJYID. With this it is possible to combine trips from GTFS Schedule and GTFS Realtime with other standards such as SIRI or NeTEx, which have a similar concept.

Implementation: Since the 12.12.2024 we offer the original_trip_id (filled with our SJYID) as part of GTFS Schedule (doc: https://opentransportdata.swiss/en/cookbook/gtfs/#tripstxt) and GTFS Realtime (doc: https://opentransportdata.swiss/en/cookbook/gtfs-rt/#Trip_updates). In one case our consumers use the original_trip_id in GTFS Realtime to match with timetable data in the (proprietary) HRDF format.

Generalizability: Based on early discussions with other public transport providers, we think this enhancement can benefit many other producers and consumers and increase the inter-operability of GTFS with other standards.

The original_trip_id is added to both GTFS Schedule and GTFS Realtime. This field allows the association of trips across different realtime and schedule standards, e.g., NeTEx and SIRI. It also allows matching between schedule and realtime.

google-cla · 2025-01-30T16:46:12Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

miklcct · 2025-01-31T09:49:39Z

You need to update the .proto file for the real time field. Please use a larger number and avoid the field numbers I am proposing in #504 , as I intend to produce it as soon as I can for integration with other systems such as Darwin (my static GTFS has this field already).

davidr1234 · 2025-01-31T12:46:44Z

You need to update the .proto file for the real time field. Please use a larger number and avoid the field numbers I am proposing in #504 , as I intend to produce it as soon as I can for integration with other systems such as Darwin (my static GTFS has this field already).

Thank you @miklcct, I missed that. I looked at #504 and it seems that 8 is available as field number for the original_trip_id (within TripDescriptor, underneath optional ModifiedTripSelector modified_trip = 7;). This is also the number we currently use in our implementation. Would that interfere with your work?

miklcct · 2025-01-31T12:48:21Z

That's great, so I can continue to use 5 and 6 for trip_headsign and trip_short_name respectively.

Are you using 5 or 6 for something else?

davidr1234 · 2025-01-31T12:57:38Z

That's great, so I can continue to use 5 and 6 for trip_headsign and trip_short_name respectively.

Are you using 5 or 6 for something else?

No, we only add optional string original_trip_id = 8;

I'll push that now.

According to discussions in google#534 to reflect the documentation in reference.md

skinkie · 2025-01-31T15:40:29Z

Critical question: why is this important for passenger information. I read the interoperability argument, but not how that is used.

miklcct · 2025-01-31T16:13:28Z

Critical question: why is this important for passenger information. I read the interoperability argument, but not how that is used.

It allows consumers to match the GTFS data with external data from other sources.

skinkie · 2025-01-31T16:16:02Z

Concrete examples please. The GTFS ecosystem is that GTFS can be matched with GTFS-RT. In what situation is it valuable to have other trip identifiers. I can think of some, but those should be written down.

miklcct · 2025-01-31T16:17:48Z

Concrete examples please. The GTFS ecosystem is that GTFS can be matched with GTFS-RT. In what situation is it valuable to have other trip identifiers. I can think of some, but those should be written down.

I am using the field to match upstream data from systems of Network Rail, where their IDs are only unique on a single day.

eliasmbd · 2025-01-31T16:20:05Z

Critical question: why is this important for passenger information. I read the interoperability argument, but not how that is used.

Are you asking why this matters to the rider?

skinkie · 2025-01-31T16:22:39Z

Are you asking why this matters to the rider?

Exactly.

davidr1234 · 2025-02-03T10:42:16Z

Concrete examples please. The GTFS ecosystem is that GTFS can be matched with GTFS-RT. In what situation is it valuable to have other trip identifiers. I can think of some, but those should be written down.

We give an example in the introductory text under "Implementation": "In one case our consumers use the original_trip_id in GTFS Realtime to match with timetable data in the (proprietary) HRDF format."

Our consumers use both GTFS and HRDF. However, HRDF is able to better reflect certain services in Switzerland due to its more comprehensive and complex data structure, such as for linked trips.

This approach allows to maintain the efficient structure of GTFS (which is one of the reasons for its popularity with our consumers), while providing the full bandwidth of our available customer information by combining it with our other formats.

Concrete examples please. The GTFS ecosystem is that GTFS can be matched with GTFS-RT. In what situation is it valuable to have other trip identifiers. I can think of some, but those should be written down.

I am using the field to match upstream data from systems of Network Rail, where their IDs are only unique on a single day.

I would also like to point out this statement by @miklcct, which would be another motivation for this field. This is also true for our SJYID.

leonardehrenfried · 2025-02-03T11:31:53Z

Could you say a little more why your consumers want to use the HRDF together with GTFS and not go either full GTFS/GTFS-RT, HRDF or Netex/SIRI? If you're already dealing with the complexities of linked trips in HDRF, would the extra complexity of (say) SIRI-ET make a difference?

To summarise: I'm a bit sceptical of changing the GTFS specification to accommodate non-GTFS workflows.

skinkie · 2025-02-03T11:42:32Z

To summarise: I'm a bit sceptical of changing the GTFS specification to accommodate non-GTFS workflows.

I have the same skeptism. But there is fundamental thing both GTFS and NeTEx are overlooking. For every time in GTFS or NeTEx a property changes, a new identifier must be introduced. Now you could argue "this makes a lot of sense" and for some organisations (and even implementers) it does not. They are instead managing these validities of properties at different levels. That is why virtually everything is in conflict with each other once HRDF is mentioned. This is the absolute root cause.

miklcct · 2025-02-03T11:47:52Z

The very reason why this field is needed is that a consumer with local knowledge can use it to reference other passenger-facing systems outside the GTFS world using the original_trip_id provided, when other passenger-facing systems use an ID which is not the public code.

ue71603 · 2025-02-03T11:56:19Z

It is not just HRDF. As Stefan mentions there is a core problem in NeTEx and GTFS: uniqueness of the trip in the file. Traditional public transport has uniqueness of the ServiceJourney per operating day. Even when the trip is slightly different e.g. for Wednesday, it still is the trip that starts at 08:01 to Zürich. This can be expressed by the global id. We have to split the trip into different ones for GTFS, but for many other use cases (and systems). It is still useful/crucial to know that this is indeed the one.

ue71603 · 2025-02-03T11:58:04Z

So what the PR really does is to accomodate both "worlds".

leonardehrenfried · 2025-02-03T12:00:14Z

Can you not do the same "split" when converting to GTFS-RT?

BTW, I don't doubt that it would be useful to your consumers but I doubt that it's GTFS's responsibility to deal with other representations of public transport.

skinkie · 2025-02-03T12:07:05Z

From a GTFS standpoint it can also be interesting. For example aggregating all the "truly unique trips". It is a very specific use case, therefore I hope some more examples can be provided.

ue71603 · 2025-02-03T12:15:37Z

Non sequitur @leonardehrenfried: With your argumentation we could say: Why should we produce GTFS at all. HRDF and VDV 454 contain all necessary information. It can't be the responsibility of Switzerland to facilitite work for others?

We believe this PR is a simple way to simplify interactions between the different formats. In the ideal world one can use on realtime stream and a time table stream of your choice.

leonardehrenfried · 2025-02-03T12:21:54Z

I'll grant you that: it's not a complicated proposal.

miklcct · 2025-02-03T12:21:59Z

It is not just HRDF. As Stefan mentions there is a core problem in NeTEx and GTFS: uniqueness of the trip in the file. Traditional public transport has uniqueness of the ServiceJourney per operating day. Even when the trip is slightly different e.g. for Wednesday, it still is the trip that starts at 08:01 to Zürich. This can be expressed by the global id. We have to split the trip into different ones for GTFS, but for many other use cases (and systems). It is still useful/crucial to know that this is indeed the one.

Also there is currently no facility in GTFS (unlike NeTEx) to specify that the modified Wednesday 08:01 is the same trip as the Wednesday 08:01, that if the timetable has been modified from the base timetable it is not currently possible for a client with saved pre-planned journey to know that the timetable has been changed. They will just fail to find the trip in the updated timetable.

I have a use case where a traveller can plan journeys up to months beforehand and saved in the user's device. If the timetable is changed, requiring a new ID in the GTFS, it is impossible for the client to find the new ID (however, I think this is worth another PR to associate a trip to a calendar exception, as it is not the purpose of original_trip_id).

Roughly speaking, it can be described in the following way:
Calendar Base: Mon - Fri, 1 Jan to 30 Jun 2025, with exception on Good Friday and Easter Monday

In such case, a trip running on a modified timetable on Easter can associate with the original Mon-Fri timetable by specifying another new field (NOT original_trip_id) with the ID for the base trip.

skinkie · 2025-02-03T12:33:11Z

Also there is currently no facility in GTFS (unlike NeTEx) to specify that the modified Wednesday 08:01 is the same trip as the Wednesday 08:01, that if the timetable has been modified from the base timetable it is not currently possible for a client with saved pre-planned journey to know that the timetable has been changed. They will just fail to find the trip in the updated timetable.

I honestly think NeTEx did not standardise "global trip id" either. And yes, there is PrivateCode but that is not the "concept" that we mean here?

davidr1234 · 2025-02-20T07:55:33Z

I don't see the confusion @miles-grant-ibigroup: If you don't need it, then you don't consume it. It is the same with other fields that are in the spec and in the real data. If you don't know what to do with transfers.txt or flex then don't consume it. It is not breaking the rest, but if you need it, you really need it.

The difference here is that there is one standardized way to interpret what transfers.txt means. It seems like this field will be used differently by every feed producer. Therefore, it's almost impossible to make use of in an abstract and predictable way.

This heterogenity is what reflects mobility in Europe. When looking at CEN (European) standards you see they allow a certain amount of interpretation and flexibility, which is the only way to accomodate for the many local differences. Standards that are restrictive will (most likely) not change what we do in Europe (or at least in Switzerland). More likely we will not adopt those standards.

miklcct · 2025-02-20T12:33:01Z

The flexibility of standards in interpretation means that it is not possible for implementations to guarantee correct desired behaviour across different feeds in different places

ue71603 · 2025-02-20T13:02:27Z

@abyrd

Why from the passenger perspective is it an "important requirement of public transit" that trips have persistent existence across multiple days?
It is in my view a prerequisite to good passenger information. You seem to think it is technical and aloof. Naturally it will not be presented to the passenger in this way. You don't do this with trip_id either. Having only the local and artifically created trip_id means that it does not mesh with other items in the data space. Perhaps in your world this is something people don't do and if it can't be done in GTFS then it should not be done. Here it is a major concern of people using our data (especially additional fields that are not available in GTFS, but are in NeTEx and HRDF). We didn't do this, because we like architecture, but because we were asked to do something about the problem. We wanted for people to either stay in the GTFS bubble if they want and still get the additional data in an easy way.

Granted "It is virtually impossible to regroup [trips] when the information is lost" but why would a passenger want to "regroup" trips?
Again, they won't see it directly usual. If you don't want to do start/stop matching, then one needs references. And most of Europes public transport is organisaed operating+key as unique id and not just id as GTFS is.
We are working slowly towards a master data mgmt. Meaning that things like stops, lines, journeys will have a master identification. And for the journey it will be something of the type operating day + id.

Why would you prefer a semantic-free identifier for arbitrary grouping, instead of a field with a specific purpose?
I don't understand whay you are saying.

derhuerst · 2025-02-20T13:12:37Z

We plan in the future to have the functionality to save a "regular commute" in our app, which will allow you to pick a certain trip, show that the service days the trip will run, and it will check for update daily before you leave. Therefore we need a stable ID in this case even if the timetable is changed.

I'd argue that it depends a lot on the passenger what is considered their regular commute. Some examples:

Some passengers will have to know if their commute is wheelchair-accessible. If, one day, a different trip a differtent wheelchair_accessible value (due to a different physical vehicle being used) runs, is this "new" trip (with the same Swiss Journey ID) still truly their commute?
For most people that I have talked to, their commute is inherently intermodal, as they always use some form of mobility (e.g. walking) before boarding anything that's modelled in GTFS. Therefore, an ID identifying their commute wouldn't refer to the PT trips only! IMHO the use case "identifying my commute" is something both more complex and more abstract than the use cases GTFS is designed for, and needs to be handled in a layer/spec on top of GTFS.

TLDR: The planning perspective's (of large European rail-focused organisations with fixed schedules) notion of a "commute" doesn't match the passengers' notions.

Often, GTFS trips are either a) referenced by multiple (relevant) companies or even b) operated by multiple companies. Most of them will likely have their own "original" trip ID for their planning purposes for the same GTFS trip, so effectively for one GTFS trip ID we end up with >1 "original" trip IDs.

Once we consider a one-to-many mapping, I see chaos coming. While it's technically feasible to just include all known "original" trip IDs and hope that consumers will recognise any >1 among them, I think this approach needs more thinking and coordination. This is why, in addition being sceptic about the use case as explained above, I think such a field should stay in an extension for now. (An extension can still be widely agreed-upon and developed in an open and backwards-compatible manner!)

miklcct · 2025-02-20T13:19:45Z

I'd argue that it depends a lot on the passenger what is considered their regular commute. Some examples:

Some passengers will have to know if their commute is wheelchair-accessible. If, one day, a different trip a differtent wheelchair_accessible value (due to a different physical vehicle being used) runs, is this "new" trip (with the same Swiss Journey ID) still truly their commute?

Of course it is still his commute. My app would tell the user that the wheelchair value is changed from the saved journey and asks the user how to proceed (e.g. picking up another departure).

For most people that I have talked to, their commute is inherently intermodal, as they always use some form of mobility (e.g. walking) before boarding anything that's modelled in GTFS. Therefore, an ID identifying their commute wouldn't refer to the PT trips only! IMHO the use case "identifying my commute" is something both more complex and more abstract than the use cases GTFS is designed for, and needs to be handled in a layer/spec on top of GTFS.

What I have saved is the complete itinerary, with all the access / transfer / egress in addition to all the PT leg as well. All the step by step details from how to cycle from my home to a parking, walk to the platform, take the train, take to the office are saved as part of the regular commute, and can be shown to the user if the refetched PT leg is different.

doconnoronca · 2025-02-20T13:36:38Z

It seems to me it was a mistake to require the trip_id be unique. In retrospect it should have been unique for the schedule day. It would have solved quite a few problems I seen people complain about over the years.

Adding a new way to link trips access days and schedule changes could be useful in unexpected ways.

davidr1234 · 2025-02-21T08:08:53Z

The flexibility of standards in interpretation means that it is not possible for implementations to guarantee correct desired behaviour across different feeds in different places

That is true. However, the reason is not only that the same transport behavior is represented differently in different countries (although it could be done alike), but also because the transport is simply done differently. For the latter the limitations of a standard won't initiate a change in the complete operative infrastructure of a country (costing millions).
If at all they result in extensions, adaptations, and "profiles" of the standard in those countries. These local changes tend to stay even if the modifications have been included in the "main" standard. And this amplifies the problem you describe.

miklcct · 2025-02-21T09:17:24Z

The flexibility of standards in interpretation means that it is not possible for implementations to guarantee correct desired behaviour across different feeds in different places

That is true. However, the reason is not only that the same transport behavior is represented differently in different countries (although it could be done alike), but also because the transport is simply done differently. For the latter the limitations of a standard won't initiate a change in the complete operative infrastructure of a country (costing millions).
If at all they result in extensions, adaptations, and "profiles" of the standard in those countries. These local changes tend to stay even if the modifications have been included in the "main" standard. And this amplifies the problem you describe.

But from the points of a passenger, the concepts are applicable everywhere in the world. We are talking about GTFS, a standard for passenger information, not Transmodel.

The concept which we want to introduce is the notion of "the same departure" across timetable changes, which for technical reasons necessitates a change of trip_id.

abyrd · 2025-02-21T09:38:48Z

It seems to me it was a mistake to require the trip_id be unique. In retrospect it should have been unique for the schedule day. It would have solved quite a few problems I seen people complain about over the years.
Adding a new way to link trips access days and schedule changes could be useful in unexpected ways.

I don't clearly see how linking trips across schedule changes is related to linking trips across days. How does knowing that an operator considers a certain trip on Wednesdays to be "the same" as a certain trip on Tuesdays, help me know that a new replacement trip on Wednesday 19 February corresponds to a specific original trip on Wednesday 19 February?

The idea of a dataset being built around a relation called trips, but the identifier of those trips (trip_id) not being dataset-unique is quite odd to me. I feel like we're mixing together a lot of different topics here. I suppose I can imagine the trips relation having a compound key of (service_id, trip_id) but it's not clear to me what this would improve.

It may be worth noting that a single trip (i.e. with a single trip_id) can already be instantiated across many days. This is the purpose of calendars.txt and calendar_dates.txt. The system is geared toward simpler schedules recurring regularly on certain days of the week, but is relatively flexible. The same trip_id can appear on many different service days.

miklcct · 2025-02-21T11:11:56Z

It seems to me it was a mistake to require the trip_id be unique. In retrospect it should have been unique for the schedule day. It would have solved quite a few problems I seen people complain about over the years.
Adding a new way to link trips access days and schedule changes could be useful in unexpected ways.

I don't clearly see how linking trips across schedule changes is related to linking trips across days. How does knowing that an operator considers a certain trip on Wednesdays to be "the same" as a certain trip on Tuesdays, help me know that a new replacement trip on Wednesday 19 February corresponds to a specific original trip on Wednesday 19 February?

The idea of a dataset being built around a relation called trips, but the identifier of those trips (trip_id) not being dataset-unique is quite odd to me. I feel like we're mixing together a lot of different topics here. I suppose I can imagine the trips relation having a compound key of (service_id, trip_id) but it's not clear to me what this would improve.

It may be worth noting that a single trip (i.e. with a single trip_id) can already be instantiated across many days. This is the purpose of calendars.txt and calendar_dates.txt. The system is geared toward simpler schedules recurring regularly on certain days of the week, but is relatively flexible. The same trip_id can appear on many different service days.

You are now conflating two issues here.

Within the same dataset, I want to know that a trip on Wednesday runs on the same timetable as a trip on Tuesday. This is already supported.
Across different versions of the dataset, I want to know that a trip on Wednesday in the new timetable corresponds to a certain trip on Wednesday in the old timetable. This is not possible due to the constraint that a trip_id must be unique in a dataset, so if in the old timetable, the Wednesday trip is the same trip as the Tuesday trip, but in the new timetable, the Wednesday trip runs a minute different from the Tuesday trip, it is not possible to associate them.

The use cases are:

I plan my regular commute. The system can tell me which days are my regular commute valid.
I save an one-off journey in the future. The system can tell me that the timetable is changed.

ue71603 · 2025-02-21T11:16:01Z

@abyrd you say it is geared towards simpler schedules and that is fine. However, some schedules are not simple even when often simplifiying it for GTFS is fine for easier consumption. However, if sombody needs to see the nuances, they need to get the information from somewhere and for that they need the necessary reference. I agree that probably this reference will not always look the same due to the lack of homogenisation and master data management. But that's the path forward in my opinion.

Let's assume the following scenario:

The different variants of the train with the journey id ch:1:sjyid:111011:12341 result in 10 different trips in the export process to GTFS.
GTFS has some information about the accessibility and none about the formation of the train.
Think I want that. I can't fetch it with the trip_id. What I need is the operating day 2025-04-03 and the journey id to hand it to the formation service or to look it up in the HRDF data that you obtained as well.

How do you suggest this should be done in the current situation?

abyrd · 2025-02-21T11:27:42Z

It is in my view a prerequisite to good passenger information. You seem to think it is technical and aloof. Naturally it will not be presented to the passenger in this way...
Perhaps in your world this is something people don't do and if it can't be done in GTFS then it should not be done. Here it is a major concern of people using our data...

I am not sure what you mean by "my world", and I am making no statements that things "should not be done". I think I should clarify here that I am not strongly invested in a particular outcome for this PR. I was just trying to encourage communication between commenters who seemed to have different perspectives.

I think most or all commenters are sympathetic to your situation and want to help transform your proposal into something that fits well into GTFS and will be well-received and well-understood by the community.

My understanding is that some commenters would prefer a field defined by its concrete use cases, focusing on relationships within GTFS, and remaining silent on any relationship to data sets outside GTFS.

My intent was to encourage communication about whether the only known use case is rediscovering trips (for example, that someone has bookmarked or purchased a ticket for) when they have been altered by realtime data or other small schedule modifications. That is: are we functionally just talking about a realtime_trip_id?

Granted "It is virtually impossible to regroup [trips] when the information is lost" but why would a passenger want to "regroup" trips?

Again, they won't see it directly usual.

I understand that passengers will not look directly at ID strings. I was trying to encourage clear description of concrete situations encountered by the passenger where this ID would be used by software. The example provided by @miklcct would require an ID that maintains identity across small modifications to a trip on a particular day. I was asking: do we have any analogous examples of concrete use cases requiring an ID that groups trips across multiple days?

Why would you prefer a semantic-free identifier for arbitrary grouping, instead of a field with a specific purpose?

I don't understand whay you are saying.

This PR seems to be about adding a field that can contain any "original" identifier for the trip, without attaching any meaning or usage context to that identifier. My understanding of comments from other contributors is that they prefer to see fields that have a specific meaning and use case.

To rephrase my question: Why would you prefer a field that can contain any arbitrary external identifier with no stated purpose, instead of a field or fields that contain identifiers with specific purposes, such as matching realtime messages to trips that have been altered?

miklcct · 2025-02-21T11:33:40Z

Unfortunately although it was my original idea, I am now skeptical about this PR.

It is impossible to define a precise semantic about what "upstream" data the original_trip_id refers to, especially if a feed combines multiple sources or if the same trip appears in multiple upstream systems. We don't want to add any fields that have unclear meaning into the standard.

I think we should disregard the Google extension and define fields which cater for the use cases I mentioned above.

abyrd · 2025-02-21T11:37:10Z

You are now conflating two issues here.

I also consider these two different cases; I agree with your perspective. It seemed to me that they were being conflated in previous comments about the importance of original_trip_id. My intent was to acknowledge the validity of your example use case, and try to encourage discussion on whether the proposed original_trip_id was needed to solve or communicate any inter-trip relationships other than a sort of realtime_trip_id (your use case number 2).

miklcct · 2025-02-21T11:43:49Z

I think that we can define a field, called semantic_id, in the trip model to group trips which are "semantically" considered the same by operators and passengers.

Then we can attach a specific requirement of that ID:

Two trips with the same semantic_id on different days should be considered the approximate equivalent from the passenger's view, for example, if there is a planned diversion on a certain Sunday, the diverted trip should share the same semantic_id with the original trip on different Sundays.

Use case: I want to save my regular commute. The consumer can tell me immediately that my commute will be affected by planned engineering works on a certain day, and the equivalent journey will be a certain replacement.

semantic_id must be consistent across different versions of the dataset. If a new version of the dataset has a trip which has the same semantic_id on a service day, it is considered the same trip as the previous version which may have different timetables. For example, if there is a planned diversion, consumers can notify the passenger upon receiving the new version of the feed.

semantic_id should be unique within a service day. Exception can be made if a logical trip is truncated into multiple parts, for example, a train - replacement bus - train forms a through journey as a replacement from the original train journey.

abyrd · 2025-02-21T12:26:27Z

GTFS has some information about the accessibility and none about the formation of the train.

Think I want that. I can't fetch it with the trip_id. What I need is the operating day 2025-04-03 and the journey id to hand it to the formation service or to look it up in the HRDF data that you obtained as well.
How do you suggest this should be done in the current situation?

I can think of a few possibilities. I am not promoting these or saying they're desirable, just brainstorming:

Extension field (proprietary or shared) for referencing very specific external data sources or formats (hafas_journey_id)
Extension field (clearly and intentionally proprietary) (sbb_journey_id)
An extension table to GTFS to capture train composition (specific aspects relevant to passenger information)

Keeping in mind that GTFS consumers should be tolerant of unofficial extensions. If something is clearly specific to your dataset, you could add sbb_field_a or sbb_field_b to any table, or sbb_table_a.txt to the archive, and it should not break anything.

[edit: I should have listed perhaps the most obvious, an extension field sjyid or ch_sjyid on trips.]

stevenmwhite · 2025-02-21T15:09:53Z

These most recent comments from @abyrd and @miklcct are getting much closer to something that seems acceptable to me. The ideas have specific, definable purposes.

Would semantic_id be the first field in GTFS where consistency across datasets is required? I know it's a best practice for things like stop_id and route_id to remain consistent but I don't think that's actually a requirement for any fields currently. As long as they're internally consistent within the dataset, they are technically allowed to change with each dataset update because they're not meant to be referenced outside of that dataset.

If that's true, just something to keep in mind that this would be a new model of thinking.

westontrillium · 2025-02-25T20:06:53Z

I had hoped to contribute to this discussion with something useful from a (North American) producer perspective but honestly am finding it challenging to gain a technical understanding of what’s being discussed. It sounds like the conversation may be moving beyond what has been originally proposed anyway, but I wanted to call out that even after reading through the original PR and the entirety of this comment thread, I didn’t find any comprehensive examples using actual data or tables (not just isolated trips.txt snippets) so we can clearly see the proposed file/field relationships.

As such, I’m still unclear on whether there are any restrictions for an original_trip_id (or equivalent) referring to multiple trip_id’s: Can any trips be grouped together? If not, how identical do the trips have to be? Same stops? Do arrival/departure_times have to differ? What about pickup/drop_off_type? I would really love for someone to share a few simple tables showing the flow from route/calendar to original_trip_id+trip_ids to stop_times (for all trips in the example) so I can see what sort of “grouping” is possible/intended.

I’m also having trouble wrapping my head around some of the language used to describe the temporal restrictions for original_trip_id without concrete examples:

“… This ID is valid for one operating day and across different days of a scheduled year…” “… only unique on a single day…” “… Unique per operating day…” “… Can reoccur across different days of a scheduled year…”

These statements all seem to be describing the same thing though slightly differently and potentially contradict one another. What exactly are we saying? a single original_trip_id only occurs for literally one calendar day (e.g., 2024-01-01), so you must structure calendar.txt/calendar_dates.txt accordingly? Or one day of the week across multiple dates of a scheduled year, e.g., Wednesdays on a service_id where wednesday=1? Does that mean it can’t be attached to multiple trips occurring in a single day? Again, examples would be very useful!

davidr1234 · 2025-02-28T11:11:13Z

Here's a real-world example from our implementation.

GTFS Schedule URL.

We have the following situation. The train S6 goes from Basel Bad Bf to Basel SBB.

This is a train that many commuters and other regulars use and for which they know its stops. So they keep in mind the "pattern", especially, because in Switzerland we have a "Taktfahrplan". So the trains are part of that pattern.

On occasions "their S6" stops at a different quay. This is what we need to be able to represent to the commuters. And we do as described below using the "original_trip_id" (Note, we named the field based on GTFS Transit. It is used for the described use-case. At least based on what we understood from an exchange with data providers for Google).

Swiss Journey ID (SJYID) aka original_trip_id: ch:1:sjyid:100308:87865-001
Trip IDs: ['105.TA.91-6-E-j25-1.14.H', '44.TA.91-6-E-j25-1.8.H', '53.TA.91-6-E-j25-1.10.H', '74.TA.91-6-E-j25-1.13.H']

routes.txt
-- route_id,agency_id,route_short_name,route_long_name,route_desc,route_type
-- "91-6-E-j25-1","351","S6","","S","109"
trips.txt
-- route_id,service_id,trip_id,trip_headsign,trip_short_name,direction_id,block_id,original_trip_id,hints
-- "91-6-E-j25-1","TA+ep500","105.TA.91-6-E-j25-1.14.H","Basel SBB","87865","0","","ch:1:sjyid:100308:87865-001",""
//
-- "91-6-E-j25-1","TA+h8000","53.TA.91-6-E-j25-1.10.H","Basel SBB","87865","0","","ch:1:sjyid:100308:87865-001",""
stop_times.txt
--trip_id,arrival_time,departure_time,stop_id,stop_sequence,pickup_type,drop_off_type
-- "105.TA.91-6-E-j25-1.14.H","21:45:00","21:45:00","8500090","1","0","0"
-- "105.TA.91-6-E-j25-1.14.H","21:50:00","21:50:00","8500010:0:5","2","0","0"
//
-- "53.TA.91-6-E-j25-1.10.H","21:45:00","21:45:00","8500090","1","0","0"
-- "53.TA.91-6-E-j25-1.10.H","21:50:00","21:50:00","8500010:0:3","2","0","0"
calendar.txt
-- service_id,monday,tuesday,wednesday,thursday,friday,saturday,sunday,start_date,end_date
-- "TA+ep500","1","1","1","1","0","0","0","20241215","20251213"
-- "TA+h8000","0","0","0","1","0","0","0","20241215","20251213"
calendar_dates.txt
-- service_id,date,exception_type
-- "TA+ep500","20241216","2" ...(dates in range with exception_type=2)... "TA+ep500","20251211","2"
//
-- "TA+h8000","20241219","2" ...(dates in range with exception_type=2)... "TA+h8000","20251211","2"

davidr1234 · 2025-03-13T09:02:45Z

@westontrillium @stevenmwhite @abyrd @miklcct - any thoughts/feedback concerning our example?

miklcct · 2025-03-13T09:12:58Z

Whether I am going to support the proposal will depend on how it is defined.

If the definition is "an internal identifier which is used to group related trips across different dates or across different version of feeds", I'll support it.

If the definition is "an ID which is used to reference external information systems", I'll oppose it.

github-actions · 2025-06-12T04:05:24Z

This pull request has been automatically marked as stale because of lack of recent activity. It may be closed manually after one month of inactivity. Thank you for your contributions.

eliasmbd · 2025-06-12T13:03:50Z

This conversation is still active, not stale.

github-actions · 2025-09-11T04:01:39Z

This pull request has been automatically marked as stale because of lack of recent activity. It may be closed manually after one month of inactivity. Thank you for your contributions.

david-sbb added 2 commits January 30, 2025 13:15

Addition of original_trip_id

c69e45c

The original_trip_id is added to both GTFS Schedule and GTFS Realtime. This field allows the association of trips across different realtime and schedule standards, e.g., NeTEx and SIRI. It also allows matching between schedule and realtime.

Added newline

75f7422

Amending .proto

48b5218

According to discussions in google#534 to reflect the documentation in reference.md

This was referenced Feb 20, 2025

motis: Enable duplicate matching public-transport/transitous#54

Merged

related discussions derhuerst/stable-public-transport-ids#3

Open

miklcct mentioned this pull request Mar 23, 2025

Trip display names show route name instead of trip name for long distance trains public-transport/transitous#1015

Open

github-actions bot added the Status: Stale Issues and Pull Requests that have remained inactive for 30 calendar days or more. label Jun 12, 2025

eliasmbd removed the Status: Stale Issues and Pull Requests that have remained inactive for 30 calendar days or more. label Jun 12, 2025

github-actions bot added the Status: Stale Issues and Pull Requests that have remained inactive for 30 calendar days or more. label Sep 11, 2025

Enhancing GTFS Schedule and Realtime with original_trip_id #534

Are you sure you want to change the base?

Enhancing GTFS Schedule and Realtime with original_trip_id #534

Uh oh!

Conversation

davidr1234 commented Jan 30, 2025

Uh oh!

google-cla bot commented Jan 30, 2025

Uh oh!

miklcct commented Jan 31, 2025

Uh oh!

davidr1234 commented Jan 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

miklcct commented Jan 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidr1234 commented Jan 31, 2025

Uh oh!

skinkie commented Jan 31, 2025

Uh oh!

miklcct commented Jan 31, 2025

Uh oh!

skinkie commented Jan 31, 2025

Uh oh!

miklcct commented Jan 31, 2025

Uh oh!

eliasmbd commented Jan 31, 2025

Uh oh!

skinkie commented Jan 31, 2025

Uh oh!

davidr1234 commented Feb 3, 2025

Uh oh!

leonardehrenfried commented Feb 3, 2025

Uh oh!

skinkie commented Feb 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

miklcct commented Feb 3, 2025

Uh oh!

ue71603 commented Feb 3, 2025

Uh oh!

ue71603 commented Feb 3, 2025

Uh oh!

leonardehrenfried commented Feb 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skinkie commented Feb 3, 2025

Uh oh!

ue71603 commented Feb 3, 2025

Uh oh!

leonardehrenfried commented Feb 3, 2025

Uh oh!

miklcct commented Feb 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skinkie commented Feb 3, 2025

Uh oh!

davidr1234 commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

miklcct commented Feb 20, 2025

Uh oh!

ue71603 commented Feb 20, 2025

Uh oh!

derhuerst commented Feb 20, 2025

Uh oh!

miklcct commented Feb 20, 2025

Uh oh!

doconnoronca commented Feb 20, 2025

Uh oh!

davidr1234 commented Feb 21, 2025

Uh oh!

miklcct commented Feb 21, 2025

Uh oh!

abyrd commented Feb 21, 2025

Uh oh!

miklcct commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidr1234 commented Jan 31, 2025 •

edited

Loading

miklcct commented Jan 31, 2025 •

edited

Loading

skinkie commented Feb 3, 2025 •

edited

Loading

leonardehrenfried commented Feb 3, 2025 •

edited

Loading

miklcct commented Feb 3, 2025 •

edited

Loading

davidr1234 commented Feb 20, 2025 •

edited

Loading

miklcct commented Feb 21, 2025 •

edited

Loading

miklcct commented Feb 21, 2025 •

edited

Loading

abyrd commented Feb 21, 2025 •

edited

Loading

stevenmwhite commented Feb 21, 2025 •

edited

Loading