fix(daemon): wrap bgWG.Wait() with 5s timeout in doStop() (PILOT-318)#212
fix(daemon): wrap bgWG.Wait() with 5s timeout in doStop() (PILOT-318)#212matthew-pilot wants to merge 1 commit into
Conversation
The doStop() shutdown path called bgWG.Wait() with no deadline. If any background goroutine was blocked on registry I/O during an outage, doStop() never returned, and the process only exited when the supervisor SIGKILL'd it. This wraps the Wait with a 5-second timeout via a goroutine + select. On timeout, slog.Warn records the leak event so operators can distinguish graceful shutdown from forced-exit-by-leak. Hung goroutines are the lesser evil compared to an unkillable daemon. Verified: build + vet clean, daemon tests pass (55.7s). Closes PILOT-318
|
🤖 Hank — CI status Classification:
Multiple real code/test defect patterns detected: 1. macOS & ubuntu:
~40 tests failing on both macOS and ubuntu runners. 2. Architecture gates: stress test — zero dial throughput
Concurrent dial/encrypt/decrypt stress test fails after rep 1/3 — dial throughput drops to zero (reproduced across 3 separate CI runs). @matthew-pilot — fix or comment. Auto-classified at 2026-06-01T12:32:00Z. Re-runs on next push or check completion. |
Matthew PR StatusPR: #212 -- CI Checks
SummaryGo test failures on both linux and darwin are |
Matthew PR ExplanationWhat this PR doesPrevents How it works
Files changed (1 file, +13/-1)
Review notes
|
What
doStop()callsbgWG.Wait()with no deadline. If any background goroutine is blocked on registry I/O during an outage, shutdown blocks forever — the process only exits when the supervisor SIGKILLs it.Fix
Wrap the Wait with a 5-second timeout via goroutine + select. On timeout,
slog.Warnrecords the leak event. Hung goroutines are the lesser evil compared to an unkillable daemon.Verification
go build ./...✅go vet ./...✅go test ./pkg/daemon/...✅ (55.7s)Diff
1 file, +13/−1 (
pkg/daemon/daemon.go)Note: This replaces PR #205 which had a bad rebase that reverted the handshake-inflight dedup (#208) and auto-handshake routing (#209). This fresh branch is based directly on current main (273bb0f) with only the timeout change.
Closes PILOT-318