Signal to socket refactor#502
Merged
Merged
Conversation
The Ruby driver coordinated with its child daemons (sync_daemon_mpi, sync_daemon_fs, thapi_sampling_daemon) via POSIX real-time signals (SIGRTMIN+0..3) and a busy-spin on a Signal.trap-toggled @@ready flag. That flag was shared across SyncDaemon and SamplingDaemon, which was a latent race. Switch to one SOCK_SEQPACKET UNIX socketpair per daemon. The parent sends short ASCII tokens (INIT, LOCAL_BARRIER, GLOBAL_BARRIER, FINISH) and blocks on recvmsg for the daemon's READY reply. No signal handlers, no busy waits, no shared state. The argv contract for the daemon binaries changes: they now take a single <fd> argument (parent_pid is gone). These are internal binaries shipped under BINDIR, but worth noting. sync_daemon_mpi.c is renamed to .cpp so MPI session setup and the message loop share the same C++ idioms (string_view dispatch); the build switches to MPICXX, and configure.ac wraps AX_MPI/AX_MPI_SESSION in AC_LANG_PUSH([C++]) so MPICXX is set. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Remove manual = alignment on constexpr/using declarations, collapse the dlsym reinterpret_cast lines to fit clang-format's continuation indent, and break the if-then-return chains onto two lines (LLVM default disallows short ifs on one line). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
C++ forbids a goto from jumping over a variable initialization. CHECK_MPI expands to goto fn_exit, so const int fd must be declared before the first CHECK_MPI call. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Ruby creates UNIX sockets with O_NONBLOCK on Linux by default. The
inherited fd carried that flag into the daemon, so its blocking read()
returned EAGAIN ("Resource temporarily unavailable") and the daemon
exited before processing any message. Clear the flag on the child end
before spawn in both SyncDaemon and SamplingDaemon.
Replace integration_tests/light_iprof_only_sync.sh with a Ruby driver
that speaks the new socketpair protocol. The old shell driver spawned
the daemon directly using the signal protocol (passing PARENT_PID as
argv) and is incompatible with the new <fd> argv contract.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The setter is not loaded by default in Ruby 3+; CI hit NoMethodError. Reproduced locally and confirmed the require resolves it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Extract the socketpair protocol shared by iprof and its daemons: - Ruby side: new DaemonClient module holds MSG_INIT/FINISH/READY, lazy_exec, finalize, wait_ready, and the spawn boilerplate (incl. the nonblock= dance). SyncDaemon and SamplingDaemon each include it and supply only their own gating predicate and per-instance state. - C++ side: new utils/include/daemon_proto.hpp holds the MSG_* string_view constants, send_msg, and recv_expect. Both sync_daemon_mpi and thapi_sampling_daemon use it. - Also: shrink the SyncDaemon daemon-path case statement, drop the send_ready lambda in sync_daemon_fs, and add the new header to utils/Makefile.am noinst_HEADERS. Net: ~80 fewer lines across the daemon code (header included). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
These LOGGER.debug { "spawn(...)" } lines predated the socketpair
refactor when they included Process.pid (needed for kill-based IPC).
Now that spawn is gated by lazy_exec, the wrapping
LOGGER.with_info_block("Initialize ...Daemon ...") already logs entry
and exit; the inner debug is dead.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ctive - Lift daemon spawn boilerplate into DaemonClient#spawn (mirrors Process.spawn signature + log:); fd insertion and `fd => fd` cloexec clearing now hidden from callers. - Both SyncDaemon and SamplingDaemon early-return on `unless active?`, so bogus THAPI_SYNC_DAEMON values and missing binaries no longer raise on runs that don't actually need a daemon. - Replace top-level `sampling?` with `SamplingDaemon.active?` (class method); add SyncDaemon.active? for symmetry; hoist the instance `active?` delegate into DaemonClient. - SyncDaemon#initialize daemon-path selection rewritten as case/when, with a better error pointing at --disable-sync-daemon-mpi. - Fix SAMPLING_DAEMON_PATH copy-paste bug (was pointing at sync_daemon_mpi); switch sampling spawn to argv-form to drop the shell middleman. - LocalMaster.setup guards SamplingDaemon against bt_analysis raising before teardown can run. - Add integration test: bogus THAPI_SYNC_DAEMON is ignored when not under MPI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Was discarding $? from Process.wait. If the daemon crashes during teardown (e.g., MPI_Finalize segfault — see THAPI_SYNC_DAEMON_MPI_NO_FINALIZE workaround in parallel_execution.bats), the failure was silent. Switch to wait2 and warn on non-success. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Pull '-Wall -Wextra $(WERROR)' (plus '-std=c++17' for C++) out of each per-target line into AM_CFLAGS / AM_CXXFLAGS. Per-target rules that already set _CFLAGS / _CXXFLAGS now reference $(AM_*FLAGS) explicitly, since automake target-specific flags replace rather than extend AM_*. Side fix: sampling/thapi_sampling_daemon was missing -std=c++17 entirely. It now picks it up via AM_CXXFLAGS. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
clang-format-18 on daemon_proto.hpp (CI runs the same version) and strip an over-aligned 'then nil' in sync_daemon_fs's case arm. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.