Skip to content

feat(cloud): detect dead local daemon in cloud status and document launchd unit#337

Open
jlsevillano wants to merge 1 commit intoGentleman-Programming:mainfrom
jlsevillano:feat/cloud-status-detect-dead-daemon
Open

feat(cloud): detect dead local daemon in cloud status and document launchd unit#337
jlsevillano wants to merge 1 commit intoGentleman-Programming:mainfrom
jlsevillano:feat/cloud-status-detect-dead-daemon

Conversation

@jlsevillano
Copy link
Copy Markdown

🔗 Linked Issue

Closes #279


🏷️ PR Type

  • type:bug — Bug fix
  • type:feature — New feature
  • type:docs — Documentation only
  • type:refactor — Code refactoring (no behavior change)
  • type:chore — Maintenance, dependencies, tooling
  • type:breaking-change — Breaking change

📝 Summary

  • engram cloud status now probes the local engram serve daemon at 127.0.0.1:7437 (respects ENGRAM_PORT) with a 1s timeout and prints a Local daemon: running | not running | unreachable line so users can detect a silently dead autosync after brew upgrade engram or any other binary replacement.
  • Adds a launchd template to DOCS.md "Running as a Service" so macOS users can supervise engram serve the same way Linux users do with systemd. With KeepAlive=true, autosync now survives brew upgrade automatically.
  • Probe is informational: exit code is unchanged. Probe is suppressed when cloud is not configured to avoid noisy output for users who don't use cloud sync.

📂 Changes

File Change
cmd/engram/cloud_daemon_probe.go New: cloudDaemonProbe variable function (1s timeout GET /health), port resolution (ENGRAM_PORT → 7437), and printCloudStatusDaemonProbe writer with recovery hint when the daemon is down.
cmd/engram/cloud_daemon_probe_test.go New: unit tests covering all four probe outcomes (running, refused, non-2xx, timeout) and the three printer states.
cmd/engram/cloud.go cmdCloudStatus calls printCloudStatusDaemonProbe in each cloud-configured branch (token, token+insecure, no-token) before the existing sync diagnostic. No behavior change in the "not configured" branch.
cmd/engram/main_extra_test.go stubRuntimeHooks now stubs cloudDaemonProbe so existing tests stay deterministic; new TestCmdCloudStatusEmitsLocalDaemonLine verifies the line is printed when configured (and suppressed when not).
DOCS.md Renames Using systemdUsing systemd (Linux). Adds Using launchd (macOS) with full plist template (KeepAlive=true so brew upgrade does not break autosync), load/unload steps, and verification via engram cloud status. Updates the engram cloud status reference bullet to describe the new Local daemon: line.
docs/INSTALLATION.md Homebrew section gains a tip pointing macOS users at the launchd template so autosync survives brew upgrade.

🧪 Test Plan

  • Unit tests pass locally: go test ./... (passes with the standard CI environment; my local env had ENGRAM_CLOUD_SERVER set which leaks into pre-existing tests — repro by unset ENGRAM_CLOUD_SERVER ENGRAM_CLOUD_TOKEN ENGRAM_CLOUD_INSECURE_NO_AUTH ENGRAM_CLOUD_AUTOSYNC ENGRAM_PORT before running, same isolation as CI).
  • E2E tests pass locally: go test -tags e2e ./internal/server/....
  • Manually verified all three states with cloud config --server ...:
    • daemon down on port 7777 → Local daemon: not running on port 7777 + recovery hint mentioning engram serve and the launchd template
    • daemon up on port 7777 → Local daemon: running on port 7777
    • ENGRAM_PORT=9000 while daemon stays on 7777 → probe targets 9000 and reports not running on port 9000 (env override honored)

🤖 Automated Checks

These run automatically and all must pass before merge:

Check What it verifies Status
Check Issue Reference PR body contains Closes #N / Fixes #N / Resolves #N
Check Issue Has status:approved Linked issue has status:approved label
Check PR Has type:* Label PR has exactly one type:* label
Unit Tests go test ./... passes
E2E Tests go test -tags e2e ./internal/server/... passes

✅ Contributor Checklist

  • I linked an approved issue above (Closes #279)
  • I added exactly one type:* label to this PR (type:feature)
  • I ran unit tests locally: go test ./...
  • I ran e2e tests locally: go test -tags e2e ./internal/server/...
  • Docs updated (DOCS.md service section + cloud status reference + INSTALLATION.md tip)
  • Commits follow conventional commits format
  • No Co-Authored-By trailers in commits

💬 Notes for Reviewers

  • The Homebrew formula post-install hook approach mentioned in the issue lives in homebrew-tap, so it is intentionally out of scope for this PR. The two in-repo mitigations (status probe + launchd template) close the gap on this side.
  • The probe deliberately distinguishes not_running (TCP dial error to 127.0.0.1) from unreachable (timeout / non-2xx / unexpected error) so the recovery hint only fires when restarting engram serve is the right action.
  • Probe timeout is exposed as a var daemonProbeTimeout so tests can shorten it; default in production stays at 1s.
  • Plist uses literal <HOME> placeholders because launchd does not expand $HOME/~ inside plist values; the docs explicitly call this out.

…unchd unit

`engram cloud status` now probes the local engram serve daemon at
127.0.0.1:7437 (respects ENGRAM_PORT) with a 1s timeout and prints a
`Local daemon:` line so users can detect a silently dead autosync after
brew upgrade engram, log out, or any binary replacement. Exit code is
unchanged (informational) and the probe is only run when cloud is
configured.

DOCS.md "Running as a Service" gains a launchd (macOS) subsection with a
KeepAlive plist template that survives brew upgrade by relaunching
engram serve automatically. The Homebrew section in docs/INSTALLATION.md
links to the new template so macOS users hit the supervisor guidance
right after install.

Closes Gentleman-Programming#279
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

engram serve dies on brew upgrade engram and is not auto-restarted, causing silent autosync downtime

1 participant