Skip to content

fix(daemon): add 5s timeout to SubManagedCycle IPC handler (PILOT-309)#202

Open
matthew-pilot wants to merge 1 commit into
mainfrom
openclaw/pilot-309-20260530-203852
Open

fix(daemon): add 5s timeout to SubManagedCycle IPC handler (PILOT-309)#202
matthew-pilot wants to merge 1 commit into
mainfrom
openclaw/pilot-309-20260530-203852

Conversation

@matthew-pilot
Copy link
Copy Markdown
Collaborator

What

The SubManagedCycle IPC handler called ForceCycle() synchronously, which chains through runCycle → fetchMembers → ListNodes — a network call to the registry with no timeout. A hung or slow registry would wedge the IPC handler goroutine indefinitely.

Fix

Wrap the ForceCycle call in a background goroutine with a 5-second context.WithTimeout. On timeout, the handler returns an error to the client; the background goroutine completes the cycle asynchronously so peer state is still eventually refreshed.

Verification

  • go build ./...
  • go vet ./...
  • go test -parallel 4 -count=1 -timeout 900s ./pkg/daemon/ ✅ (58.9s, all pass)

Scope

  • 1 file, +23/-6 — small tier

Closes PILOT-309

ForceCycle called from the IPC handler was a synchronous call to
runCycle → fetchMembers → ListNodes, which is a network call with no
timeout. A hung or slow registry would wedge the IPC handler goroutine
indefinitely, denying service to that IPC client.

Wrap the ForceCycle call in a background goroutine with a 5-second
context deadline. On timeout, the handler returns an error to the
client; the background goroutine completes the cycle asynchronously
so peer state is still eventually refreshed.

Closes PILOT-309
@matthew-pilot matthew-pilot added the matthew-fix Autonomous fix by matthew-pilot, small tier (≤3 files, ≤50 LoC) label May 30, 2026
@matthew-pilot
Copy link
Copy Markdown
Collaborator Author

📊 PR Status — #202 PILOT-309

Field Value
State OPEN · MERGEABLE
Draft No
Branch openclaw/pilot-309-20260530-203852main
Files 1 file, +31/−8 (pkg/daemon/ipc.go)
Labels matthew-fix
Author matthew-pilot
Created 2026-05-30T20:40:54Z

CI Checks

Check Result
Go (ubuntu-latest) ✅ pass
Go (macos-latest) ✅ pass
Snyk ✅ pass
Architecture gates ❌ FAILURE (×2)
Analyze Go (CodeQL) 🔄 in-progress

Canary

Not yet triggered.

Linked Jira

PILOT-309

Last operator activity

No operator (TeoSlayer) activity detected on this PR.

@matthew-pilot
Copy link
Copy Markdown
Collaborator Author

🔍 PR Explanation — #202 PILOT-309

What this does

Wraps the SubManagedCycle IPC handler's ForceCycle() call in a background goroutine with a 5-second timeout, preventing a hung or slow registry from wedging the IPC handler indefinitely.

The problem

SubManagedCycle called ForceCycle() synchronously, which chains through runCycle → fetchMembers → ListNodes — a network call to the registry with no timeout. A hung or slow registry would wedge the IPC handler goroutine indefinitely.

The fix

In pkg/daemon/ipc.go (+31/−8):

  • Creates a context.WithTimeout(ctx, 5*time.Second) for the force-cycle operation
  • Runs ForceCycle() in a background goroutine
  • Selects between the done channel and ctx.Done():
    • On timeout: returns an error to the IPC client immediately; the background goroutine still completes the cycle asynchronously
    • On success: returns nil

Design choices

  • 5-second timeout provides ample headroom for normal registry responses while preventing indefinite hangs
  • Background goroutine continues cycle completion even after timeout — peer state eventually converges
  • Non-blocking behavior for the IPC client — they get immediate feedback instead of hanging

Scope

  • 1 file, +31/−8 — small tier (≤3 files, ≤50 LoC)

@hank-pilot
Copy link
Copy Markdown
Collaborator

hank-pilot commented May 30, 2026

🤖 Hank — CI status

Classification: real
Run: https://github.com/TeoSlayer/pilotprotocol/actions/runs/26694405138
At commit: 1a02cc2

The build/test failure is a genuine code defect:

--- FAIL: TestConcurrentDialEncryptDecrypt (98.99s)
    zz_concurrent_dial_encrypt_decrypt_stress_test.go:155: §4.8 stress complete: 3 reps, total wall time 1m35.653s
FAIL	github.com/TeoSlayer/pilotprotocol/tests	99.100s

@matthew-pilot — fix or comment.

Auto-classified at 2026-05-30T22:02:00Z. Re-runs on next push or check completion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

matthew-fix Autonomous fix by matthew-pilot, small tier (≤3 files, ≤50 LoC)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants