Skip to content

fix(thread_configurator): apply scheduling to all reentrant callback group threads#39

Draft
atsushi421 wants to merge 4 commits intomainfrom
fix/reentrant-thread-config
Draft

fix(thread_configurator): apply scheduling to all reentrant callback group threads#39
atsushi421 wants to merge 4 commits intomainfrom
fix/reentrant-thread-config

Conversation

@atsushi421
Copy link
Copy Markdown
Collaborator

@atsushi421 atsushi421 commented Apr 2, 2026

Description

Previously, ThreadConfiguratorNode had two issues with reentrant callback groups:

  1. Only the first thread was configured: Reentrant callback groups spawn multiple threads sharing the same callback_group_id. The node skipped subsequent threads as "already configured".
  2. No re-application on app restart: Once all configs were applied, the node exited. Restarting the target application required restarting the configurator too.

This PR fixes both by:

  • Distinguishing between duplicate messages (same thread_id → skip) and new threads (different thread_id → re-apply)
  • Making the node persistent via executor->spin() with automatic state reset after all configs are applied
  • Storing SCHED_DEADLINE configs by value (not pointer) so each thread's tid is preserved independently
  • Automatically applying SCHED_DEADLINE configs when unapplied_num_ reaches 0, removing the interactive stdin prompt

Related links

How was this PR tested?

Ran thread_configurator_node with a YAML config (SCHED_OTHER, nice=5) alongside reentrant_node_main (4 reentrant threads). Verified:

  • All 4 threads received the configuration (1 initial + 3 re-applied via "Detected a new thread" log)
  • nice values confirmed via /proc/<tid>/stat for all threads
  • Re-application after app restart works (second run also gets configured)
  • Mandatory reentrant verification passed (Timer1: ~90-100, Timer2: ~9-10, TIDs: >=2)

Notes for reviewers

Thread configurator behavior changes

…estart

Previously, ThreadConfiguratorNode skipped scheduling configuration for
duplicate callback_group_ids. Since reentrant callback groups spawn
multiple threads sharing the same callback_group_id, only the first
thread received scheduling attributes.

Fix by distinguishing between duplicate messages (same thread_id -> skip)
and new threads (different thread_id -> re-apply). Additionally:
- Make the node persistent via executor->spin() with automatic state
  reset, enabling re-application when target applications restart
- Store SCHED_DEADLINE configs by value so each thread's tid is
  preserved independently
- Apply SCHED_DEADLINE configs automatically when unapplied_num_ reaches
  0, removing the interactive stdin prompt

Signed-off-by: Atsushi Watanabe <atsushi.w@autoware.org>
Signed-off-by: atsushi421 <atsushi.yano.2@tier4.jp>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates thread_configurator_node to correctly (re-)apply scheduling/affinity settings for reentrant callback groups that publish multiple thread IDs under the same callback group ID, and to keep the configurator running persistently so configurations can be re-applied when the target application restarts.

Changes:

  • Distinguish duplicate thread notifications (same thread_id) from new threads (different thread_id) and re-apply configuration for newly detected threads.
  • Make the configurator run persistently (executor->spin()), remove interactive stdin prompts, and auto-apply SCHED_DEADLINE once registration completes.
  • Store SCHED_DEADLINE configs by value to preserve per-thread thread_id, and reset internal state after applying.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
README.md Updates user-facing behavior docs (auto-apply SCHED_DEADLINE, persistent node, Ctrl+C exit).
cie_thread_configurator/src/thread_configurator_node.cpp Implements per-thread reapplication logic, value-stored deadline configs, auto-apply + state reset.
cie_thread_configurator/src/main.cpp Switches to persistent spinning and updates shutdown-time reporting.
cie_thread_configurator/include/cie_thread_configurator/thread_configurator_node.hpp Updates public API (remove old apply/all_applied helpers, add has_configured_once, change deadline storage type).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +366 to +371

// Reset for re-application when target applications restart
unapplied_num_ = thread_configs_.size();
for (auto &config : thread_configs_) {
config.applied = false;
}
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apply_deadline_configs() resets unapplied_num_ and clears all config.applied immediately after the first time unapplied_num_ reaches 0. Because ROS messages are processed sequentially, it’s possible to hit 0 and reset while additional CallbackGroupInfo/NonRosThreadInfo messages are still queued (e.g., reentrant callback groups publishing multiple thread IDs). Those later messages will start a new “cycle”, and if the other callback groups don’t republish, unapplied_num_ may never reach 0 again—leaving newly queued SCHED_DEADLINE configs unapplied. It also re-enables re-applying configs for duplicate messages with the same thread_id after a cycle completes (since applied is forced back to false).

Consider separating “cycle reset” from apply_deadline_configs() (e.g., debounce deadline application, reset only on confirmed app restart, or keep a per-(id,thread_id) configured set so duplicates are skipped across cycles) so late-arriving thread notifications still get configured.

Suggested change
// Reset for re-application when target applications restart
unapplied_num_ = thread_configs_.size();
for (auto &config : thread_configs_) {
config.applied = false;
}

Copilot uses AI. Check for mistakes.
- Apply SCHED_DEADLINE immediately after first cycle for late-arriving
  reentrant threads, avoiding configs stuck in deferred queue when other
  groups won't re-report
- Track failures in apply_deadline_configs() and log accurate
  success/failure message instead of unconditional "Success"
- Clarify README that cgroup cleanup on exit is best-effort and may
  require manual removal if target app threads are still attached

Signed-off-by: atsushi421 <atsushi.yano.2@tier4.jp>
Accept PR review suggestion: cgroup removal on exit only
succeeds if the target application has already stopped and
no threads remain attached. Users may need to stop the app
first or manually remove remaining cgroups.

Signed-off-by: atsushi421 <atsushi.yano.2@tier4.jp>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants