|
2 | 2 | slug: decentralized-coordination |
3 | 3 | title: "Consistency and Availability Challenges with Decentralized Coordination" |
4 | 4 | authors: [fra-p, eal, rcakella] |
5 | | -tags: [lingua franca, federation, decentralized] |
| 5 | +tags: [lingua franca, federation, decentralized, STA] |
6 | 6 | --- |
7 | 7 |
|
8 | | -The design of [distributed applications](/docs/writing-reactors/distributed-execution) in Lingua Franca requires care, particularly if the coordination of the federation is [decentralized](/docs/writing-reactors/distributed-execution#decentralized-coordination). |
| 8 | +The design of [distributed applications](/docs/writing-reactors/distributed-execution) in Lingua Franca requires care, particularly if the coordination of the federation is [decentralized](/docs/writing-reactors/distributed-execution#decentralized-coordination). The intent of this post is to illustrate and handle the challenges arising from designing distributed applications in Lingua Franca, focusing on a realistic automotive use case. |
| 9 | + |
| 10 | +## Automatic emergency braking use case |
| 11 | + |
9 | 12 |
|
10 | 13 | Consider the above Lingua Franca implementation of an automatic emergency braking system, one of the most critical ADAS systems which modern cars are equipped with. |
11 | | -The controller system reads data coming from two sensors, a lidar and a radar, and uses both to detect if objects or pedestrians cross the path of the car, thus performing sensor fusion. |
12 | | -When either of the two signals the presence of a close object, the controller triggers the brake to stop the car and avoid crashing into it. |
| 14 | +The controller system modeled by the `AutomaticEmergencyBraking` reactor reads data coming from two sensors, a lidar and a radar, and uses both to detect if objects or pedestrians cross the trajectory the car, thus performing _sensor fusion_. |
| 15 | +When one of the two sensors signals the presence of an object at a distance shorter than a configurable threshold, the controller triggers the brake to stop the car and avoid crashing into it. |
| 16 | + |
| 17 | +The sensors are modeled with their own timer that triggers the generation of data. The clocks of all federates are automatically synchronized by the [clock synchronization algorithm](/docs/writing-reactors/distributed-execution#clock-synchronization) of the Lingua Franca runtime. |
| 18 | +Typically, in a real use case of this kind, the clock of sensor devices cannot be controlled by Lingua Franca, but a way to work around this limitation is to resample the data collected by sensors with the timing given by a clock that the runtime can control. |
| 19 | +The sensor reactors of our application are then modeling this resampling of sensor data that fits well with the Lingua Franca semantics for time determinism. |
| 20 | + |
| 21 | +The lidar sensor has a sampling frequency that is twice that of the radar, and this is reflected by the timer in the corresponding reactors: the lidar timer has a period of 50ms, while that of the radar 100ms. |
| 22 | +Their deadline is equal to their period and is enforced using the dedicated `DeadlineCheck` reactors, following the guidelines of how to [work with deadlines](/blog/deadlines). |
13 | 23 |
|
14 | | -The lidar sensor has a higher sampling frequency, while the radar is slower, and this is reflected by the timer in the corresponding reactors. |
15 | | -Their deadline is equal to their period and is enforced using dedicated deadline checking reactors, following the guidelines of how to [work with deadlines](/blog/deadlines). |
| 24 | +The sensor behavior in the application is simulated in a way that each sensor constantly produces distance values above the threshold (i.e., no objects in the way), and then at a random time it sends a distance value below the threshold, indicating the presence of a close object. When the `AutomaticEmergencyBraking` reactor receives that message, it signals the `BrakingSystem` reactor to brake the car, and the whole system shuts down. |
| 25 | + |
| 26 | +### Desired system properties |
16 | 27 | Availability is a crucial property of this application, because we want the automatic emergency braking system to brake as fast as possible when a close object is detected. Consistency is also necessary: sensor fusion happens with sensor data produced at the same logical time, so in-order data processing is critical. |
17 | 28 |
|
| 29 | +### Challenges of decentralized coordination |
18 | 30 | The application is implemented as a federated program with decentralized coordination, which means that the advancement of logical time in each single federate is not subject to approval from any centralized entities, but it is done locally based on the input it receives from the other federates. |
19 | | -Consistency problems may arise when a federate receives data from two or more federates, as it is the case of the automatic emergency braking reactor. |
20 | | -As an example, the controller expects to receive input from both sensors at times 0ms, 100ms, 200ms, etc. Let's consider the case where the remote connection between the controller and the radar has a slightly larger delay than that between the controller and the lidar. The lidar input will arrive slightly earlier than the radar one. When the controller receives the lidar input, should it process the data immediately, or should it wait for the radar input to come? Sensor fusion requires consistency: if the controller processes the input from the lidar and then the radar data comes, the elaborated control action did not take into account both sensors even though it should have. |
21 | 31 |
|
22 | | -The desired behavior with simultaneous inputs is highly dependent on the application under analysis, and Lingua Franca lets you customize it. Each federate has a parameter called [STA (safe-to-advance)](/docs/writing-reactors/distributed-execution#safe-to-advance-sta) that controls how long the federate should wait for inputs from other federates before processing an input it has just received. |
23 | | -More precisely, the STA is how much time a federate waits before advancing its tag to that of the just received event, when it is not known if the other input ports will receive data at the same or an earlier tag. At the expiration of the STA, the federate assumes that those unresolved ports will not receive data at earlier tags, and advances its logical time to the tag of the received event. |
| 32 | +#### Consistency challenge |
| 33 | +Consistency problems may arise when a federate receives data from two or more federates, as it is the case of the `AutomaticEmergencyBraking` reactor. |
| 34 | +The controller expects to receive input from both sensors at times 0ms, 100ms, 200ms, etc. Let's consider as an example the case where the remote connection between the controller and the radar has a slightly larger delay than that between the controller and the lidar. The lidar input will then always arrive slightly earlier than the radar one. When the controller receives the lidar input, should it process the data immediately, or should it wait for the radar input to come? Sensor fusion requires consistency: if the controller processes the input from the lidar and then the radar data comes, the control action elaborated upon the arrival of the lidar data does not take into account both sensors, even though it should. Hence, in our use case, the `AutomaticEmergencyBraking` reactor needs to wait for both inputs before processing new data. |
| 35 | + |
| 36 | +In general, the desired behavior with simultaneous inputs and decentralized coordination is highly dependent on the application under analysis, and Lingua Franca lets you customize it. Each federate has a parameter called [`STA` (safe-to-advance)](/docs/writing-reactors/distributed-execution#safe-to-advance-sta) that controls how long the federate should wait for inputs from other federates before processing an input it has just received. |
| 37 | +More precisely, the `STA` is how much time a federate waits before advancing its tag to that of the just received event, when it is not known if the other input ports will receive data at the same or an earlier tag. At the expiration of the `STA`, the federate assumes that those unresolved ports will not receive any data at earlier tags, and advances its logical time to the tag of the received event. |
| 38 | + |
| 39 | +When a reactor commits to a tag after the `STA` expires, it may happen that one of the unresolved ports receives new data at an earlier logical time. |
| 40 | +Since the current tag is greater than the just received one, this event cannot be processed, as it would result in out-of-order handling of messages, thus violating the Lingua Franca semantics. |
| 41 | +In such cases, a safe-to-process (`STP`) violation occurs, the received event is dropped and a [fault handler](/docs/writing-reactors/distributed-execution#safe-to-process-stp-violation-handling) is executed instead: consistency is then preserved. |
24 | 42 |
|
25 | | -The maximum consistency guarantee is given by indefinitely waiting for the radar input before processing the radar, i.e., STA = forever, but this is viable only if the following two conditions are always satisfied: |
| 43 | +In our application, we aim to avoid `STP` violations and process all incoming data for sensor fusion. The maximum consistency guarantee is given by _indefinitely waiting_ for the radar input before processing the radar, i.e., `STA = forever`, but this is viable only if the following two conditions are always satisfied: |
26 | 44 | * the communication medium between the sensors and the controller is perfectly reliable; and |
27 | 45 | * none of the three federates is subject to faults. |
28 | 46 |
|
29 | | -These conditions guarantee that all expected data will be generated, sent and correctly received by the communication parties. |
| 47 | +These conditions guarantee that all expected data will be generated, sent and correctly received by the communication parties. If any of the two does not hold, the application may potentially experience indefinite blocking. |
30 | 48 |
|
31 | | -However, setting the STA to forever creates problems when only the lidar input is expected (50ms, 150ms, 250ms, etc): the controller cannot process that input until an input from the radar comes, because the STA will never expire. For example, if the single lidar input comes at 50ms, it has to wait until time 100ms before being processed. If that input was signaling the presence of a close object, the detection would be delayed by 50ms, which may potentially mean crashing into the object. The automatic emergency braking system must be available, otherwise it might not brake in time to avoid collisions. |
32 | | -The ideal STA value for maximum availability in the time instants with only the lidar input is 0, because if a single input is expected, no wait is necessary. |
| 49 | +#### Availability challenge |
| 50 | +However, setting the `STA` to `forever` creates problems when only the lidar input is expected (50ms, 150ms, 250ms, etc): the controller cannot process that input until an input from the radar comes, because the `STA` will never expire. For example, if the single lidar input comes at time 50ms, it has to wait until time 100ms before being processed. If that input was signaling the presence of a close object, the detection would be delayed by 50ms, which may potentially mean crashing into the object. The automatic emergency braking system must be available, otherwise it might not brake in time to avoid collisions. |
| 51 | +The ideal `STA` value for maximum availability in the time instants with only the lidar input is 0, because if a single input is expected, no wait is necessary. |
33 | 52 |
|
34 | | -Summing up, consistency for sensor fusion requires STA=forever when inputs from both sensors are expected, while availability calls for STA=0 when only the lidar input is coming. The two values are at odds, and any value in between would mean sacrificing both properties at the same time. |
| 53 | +Summing up, consistency for sensor fusion requires `STA = forever` when inputs from both sensors are expected, while availability calls for `STA = 0` when only the lidar input is coming. The two values are at odds, and any value in between would mean sacrificing both properties at the same time. |
35 | 54 |
|
36 | | -The knowledge of the timing properties of the application under analysis enables the a priori determination of the time instants when both inputs are expected and those when only the lidar has new data available. |
37 | | -Lingua Franca allows to dynamically change the STA in the reaction body using the lf_set_maxwait API, that takes as input parameter the new STA value to set. |
| 55 | +### Dynamic adjustment of STA |
| 56 | +The knowledge of the timing properties of the application under analysis enables the _a priori_ determination of the time instants when both inputs are expected and those when only the lidar has new data available. |
| 57 | +Lingua Franca allows to dynamically change the `STA` in the reaction body using the `lf_set_sta` API, that takes as input parameter the new `STA` value to set. |
38 | 58 | This capability of the language permits the automatic emergency braking federate to: |
39 | | -* start with the STA statically set to forever, because at time 0 (startup) both sensors produce data; |
40 | | -* set the STA to 0 after processing both inputs arrived at the same logical time, because the next data will be sent by the lidar only; |
41 | | -* set the STA back to forever after processing the radar input alone, because the next data will be sent by both sensors. |
| 59 | +* start with the `STA` statically set to `forever`, because at time 0 (startup) both sensors produce data; |
| 60 | +* set the `STA` to 0 after processing both inputs arrived at the same logical time, because the next data will be sent by the lidar only; |
| 61 | +* set the `STA` back to `forever` after processing the radar input alone, because the next data will be sent by both sensors. |
42 | 62 |
|
43 | 63 | This dynamic solution guarantees both consistency and availability in all input cases. |
| 64 | +The implementation of the `AutomaticEmergencyBraking` reactor is shown below: |
44 | 65 |
|
45 | | -Knowing the LF decentralized coordination: |
46 | | -- consistency = in-order processing of events even with multiple events |
47 | | -- availability = the system is responsive even with a single input |
| 66 | +```lf-c |
| 67 | +reactor AutomaticEmergencyBraking(dist_thld: float = 20.0) { |
| 68 | + input lidar_in: float |
| 69 | + input radar_in: float |
| 70 | + output brake: int |
| 71 | + state n_invocs: int = 0 |
48 | 72 |
|
49 | | -Oh, maybe mention that the clock of the two sensors is synced because we're resampling the data |
| 73 | + reaction (lidar_in, radar_in) -> brake {= |
| 74 | + if (lf_is_present(lidar_in) && lidar_in->value < self->dist_thld) { |
| 75 | + printf("Lidar has detected close object -> signaling braking\n"); |
| 76 | + lf_set(brake, 1); |
| 77 | + lf_request_stop(); |
| 78 | + } else if (lf_is_present(radar_in) && radar_in->value < self->dist_thld) { |
| 79 | + printf("Radar has detected close object -> signaling braking\n"); |
| 80 | + lf_set(brake, 1); |
| 81 | + lf_request_stop(); |
| 82 | + } |
50 | 83 |
|
51 | | -I might also say that forever does not work when one of the sensors is delayed too much or when the medium fails for too much time, in which cases a finite STA is better (like a period or something) (this is gonna be the topic of a new blog post) |
| 84 | + self->n_invocs++; |
| 85 | + if (self->n_invocs % 2) { |
| 86 | + lf_set_sta(0); |
| 87 | + } else { |
| 88 | + lf_set_sta(FOREVER); |
| 89 | + } |
| 90 | + =} deadline(100ms) {= |
| 91 | + printf("AEB deadline violated\n"); |
| 92 | + =} STA(forever) {= |
| 93 | + printf("STP violation on AEB\n"); |
| 94 | + =} |
| 95 | +} |
| 96 | +``` |
52 | 97 |
|
53 | | --maybe a little bit of what happens when out-of-order msg.s are received? (not sure this is really needed though) |
| 98 | +The `dist_thld` parameter is the distance threshold from detected objects below which the `AutomaticEmergencyBraking` reactor activates the brakes. |
| 99 | +The reaction body reads the distance reported by both the lidar and the radar, and if any of these is less than the threshold, it sends a signal to the `BrakingSystem` reactor. |
| 100 | +The `n_invocs` integer state variable counts the number of times the reaction of the `AutomaticEmergencyBraking` reactor is invoked. This variable is used to determine how many inputs the reaction will see at the next invocation and set the `STA` accordingly. Even invocation numbers mean that the next reaction invocation will happen with both sensor inputs present, so the `STA` is set to `forever`; with odd invocation numbers, the next reaction invocation will see new data from the lidar only, and the `STA` is then set to 0. |
0 commit comments