Skip to content

Huge library problems  #162

@antoniocaso

Description

@antoniocaso

Hello everyone, and hello @pablogs9 . This time I’m writing here about a problem that’s probably much bigger than the last one, which has been keeping me busy for weeks and has pushed my thesis way beyond the expected timeline. So, a quick initial premise for context: I’m using microROS on my STM Nucleo H723ZG because I need to interface with a vehicle equipped with an onboard computer that already uses ROS2. This vehicle operates on a 20 ms cycle, and so on my MCU I need to configure subscriptions to make (2 of them), which ideally should also occur every 20 ms. I also need to send data to the onboard computer, meaning there will also be a need to publish messages from the MCU to the onboard computer.

Well, my goal would be to achieve real-time timing and guarantees, so as to have publications that occur every 50 ms and reception callbacks that occur every 20 ms. The problem is that using rclc_executor_spin_some() throws everything into disarray, and something even stranger happens: the reception callbacks on the MCU cycle sometimes every 15-16 ms, inexplicably even before the onboard computer (simulated by simply running a publishing node on my personal PC with a timer every 20 ms) can publish! Moreover, as if that weren’t enough, these cycle times aren’t even stable, ranging from 15 ms to 53 ms (sometimes it’s 15, then 16, 17, 18, 19, then 25, 21, 23, and so on), and this doesn’t just happen with the callbacks but also with all the other tasks and cycles, even publication. However, I must point out that when I disable spin_some(), it executes perfectly and precisely (the pubber task) every 15 ms, and NEVER varies, so the problem is definitely inherent to spin_some().

To further describe the problem, it arose when I (foolishly) tried to use TRANSIENT_LOCAL instead of VOLATILE. Before making this change (applied both on the Ubuntu side and on the MCU side, of course), everything was going great: precise, punctual, and with reasonable timing (callbacks were triggered every 25 ms, which is fair because time is needed to execute the code), but after changing, all this madness started, and clearly going back to VOLATILE didn’t solve anything, as if the agent had been permanently corrupted by my reckless move. The reason for this move, however, was that I had noticed that the subscription task was cycling every 2-3 ms, way too fast compared to the 20 ms I had set as the second parameter of spin_some(), and in investigating this problem, I ended up empty-handed.

To try to solve it, I vainly attempted to use spin_period() instead of for(;;) in the subscription task. However, the problem is that when I compile, I get this error:

I tried including unistd.h, but to no avail. I then tried adding this to the code (freertos.c):
#ifdef usleep
#undef usleep
#endif

void usleep(uint32_t useconds) {
HAL_Delay(useconds / 1000);
}

But it still didn’t work because it overlapped with the usleep definition in unistd.h. At first, it said it couldn’t see the usleep from unistd.h, and now it can, just to mess with me! By removing the unistd.h include, I launched again, but the result is perhaps even more depressing and dramatic than before. Here’s an example, where I count how many times the cycle times exceed a threshold (but mostly I use it to see how much the values fluctuate without even remotely staying stable):
Category: PUBBER
Total numbers found: 8505
1 instances of number 1551.0
1 instances of number 17.0
1736 instances of number 16.0
5741 instances of number 15.0
1026 instances of number 14.0

Category: SUBBER
Total numbers found: 1549
1 instances of number 1545.0
3 instances of number 21.0
1 instances of number 20.0
22 instances of number 19.0
1025 instances of number 18.0
478 instances of number 17.0
6 instances of number 16.0
13 instances of number 15.0

Category: Arbiter Callback
Total numbers found: 8505
2 instances of number 1556.0
5 instances of number 25.0
12 instances of number 24.0
538 instances of number 23.0
655 instances of number 22.0
825 instances of number 21.0
4958 instances of number 20.0
305 instances of number 19.0
777 instances of number 18.0
413 instances of number 17.0
11 instances of number 16.0
3 instances of number 15.0
1 instances of number 13.0

Category: FDCAN Task
Total numbers found: 8506
7 instances of number 44.0
20 instances of number 43.0
47 instances of number 42.0
1669 instances of number 41.0
6397 instances of number 40.0
304 instances of number 39.0
33 instances of number 38.0
26 instances of number 37.0
3 instances of number 35.0

Category: Info Callback
Total numbers found: 8505
1 instances of number 1559.0
1 instances of number 40.0
1 instances of number 29.0
8 instances of number 25.0
10 instances of number 24.0
547 instances of number 23.0
609 instances of number 22.0
843 instances of number 21.0
4897 instances of number 20.0
327 instances of number 19.0
848 instances of number 18.0
402 instances of number 17.0
8 instances of number 16.0
2 instances of number 15.0

So, in conclusion, all the problems converge: the spin_some() function that doesn’t give me a moment of peace, and perhaps an agent that’s gone crazy.

Please help me, and I apologize for any inaccuracies. Thank you in advance, and if you believe in it, may God bless you Attached at the end, you’ll find the publishing file on the Ubuntu side, the subscription file on the Ubuntu side, and the code running on the MCU (both subscription and publication, all in freertos.c).

freertos.txt
ubuntu_pubber.txt
ubuntu_subber.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions