fix(coreapi): per-plugin panic tracking with unhealthy signal (PILOT-254)#9
fix(coreapi): per-plugin panic tracking with unhealthy signal (PILOT-254)#9matthew-pilot wants to merge 1 commit into
Conversation
…254) RecoverPlugin now tracks per-plugin panic counts (sync.Map) and marks a plugin unhealthy after maxPanicsBeforeUnhealthy (3) panics, publishing a one-shot "plugin.<name>.unhealthy" event on the bus. New exported API: - PluginPanicCount(name) — per-plugin panic count - IsPluginHealthy(name) — false when threshold exceeded - ResetPluginHealthForTest() — test cleanup The daemon supervisor (web4) can react to the unhealthy event by restarting or unloading the plugin. The TODO in recover.go is resolved for the tracking/signaling layer. Closes PILOT-254
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
🤖 matthew-pilot StatusPR #9 — PILOT-254 |
|
📋 matthew-pilot Explain — PR #9 (PILOT-254)What this doesAdds per-plugin panic tracking to Changes
Risk / Tier
Jira |
🦾 Matthew PR Status — #9Overview
TicketsNone detected in title Labelsmatthew-fix-larger Files Changed
PR DescriptionNext Actions
🦾 Auto-generated status check by matthew-pr-worker |
📊 Matthew Status — PR #9 (PILOT-254)State: OPEN · MERGEABLE ✅ · No merge conflicts 📅 Tick: 2026-06-01T02:13Z |
What failed
RecoverPlugincaught panics but had no per-plugin tracking or unhealthy signal. A plugin whose goroutines panicked repeatedly kept running with inconsistent state — no supervisor could detect the degradation.What changed
coreapi/recover.gonow tracks per-plugin panic counts viasync.Mapand emits a one-shotplugin.<name>.unhealthyevent when a plugin exceeds 3 panics (maxPanicsBeforeUnhealthy).New public API:
PluginPanicCount(name string) uint64— per-plugin panic counterIsPluginHealthy(name string) bool— false when threshold exceededResetPluginHealthForTest()— test cleanupThe daemon supervisor (web4 daemon) can subscribe to
plugin.*.unhealthyevents and react by restarting or unloading the degraded plugin.Verification
go build ./...✅go vet ./...✅go test ./...✅ (all 13 packages, 32s)TestL11PerPluginUnhealthyvalidates: count tracking, healthy→unhealthy transition, one-shot eventDiff stat
Closes PILOT-254