feat(updater): tier 2 — manual-click update from /admin/update (#7607)#7704
feat(updater): tier 2 — manual-click update from /admin/update (#7607)#7704JohnMcLear wants to merge 21 commits intoether:developfrom
Conversation
20-task TDD plan for shipping the manual-click update flow on top of the Tier 1 (notify) work merged in ether#7601. Covers UpdateExecutor, RollbackHandler, SessionDrainer, lock + trustedKeys, four admin endpoints (apply / cancel / acknowledge / log), admin UI updates, integration tests against a tmp git repo, and a manual smoke runbook for the spec's "before each tier ships" gate. Plan deliberately scopes signature verification to an opt-in stub (updates.requireSignature: false default) to avoid blocking on a separate release-signing project. Plan: docs/superpowers/plans/2026-05-08-auto-update-pr2-manual-click.md Spec: docs/superpowers/specs/2026-04-25-auto-update-design.md Issue: ether#7607 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds ExecutionStatus discriminated union, bootCount, and lastResult to UpdateState, plus the preApplyGraceMinutes/drainSeconds/diskSpaceMinMB/ requireSignature/trustedKeysPath knobs that Tier 2's executor needs. loadState backfills the new fields on Tier 1 state files so existing installs keep working. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single-flight guard for Tier 2's UpdateExecutor. Atomic O_CREAT|O_EXCL acquire; on EEXIST, sends signal 0 to the recorded PID and reaps if dead. Unparseable / partially-written lock files are treated as stale rather than fatal so a half-written lock from a SIGKILL'd parent doesn't lock the install out forever. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Default updates.requireSignature=false: log a warning and return ok with reason=signature-not-required. Set true to make preflight refuse a tag whose signature does not verify under the system keyring (or trustedKeysPath via GNUPGHOME). Etherpad's release process does not yet sign tags consistently; turning the check on by default would break Tier 2 for every admin and forcing a release-signing change is out of scope for this PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure orchestrator over injected probes for install-method, working tree, disk space, pnpm presence, lock state, remote tag existence and signature verification. Cheap-and-definitive checks run first; first failure short-circuits with a typed reason that the route layer will surface in the preflight-failed admin banner. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Direct file-append + size-based rotation rather than a log4js appender — avoids re-configuring log4js on top of the user's existing logconfig. appendLine creates parents, rotates at 10MB (configurable), keeps 5 backups by default. tailLines reads the last N lines for /admin/update/log. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drainer schedules T-60 / -30 / -10 broadcasts and resolves at T=0; isAcceptingConnections() flips off for the duration. PadMessageHandler consults the flag at the start of CLIENT_READY and disconnects new joiners with reason "updateInProgress" — existing sockets are unaffected. Drains shorter than 30s collapse the early timers to fire ASAP rather than queue past the drain end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d, exit 75 Pure-DI orchestrator: spawnFn, copyFile, readSha, saveState, exit are all injected so unit tests run the full pipeline without spawning real children or mutating the real install. Streams stdout/stderr to update.log via the now-best-effort appendLine helper (swallows fs errors so the executor itself never breaks on read-only / unwritable log dirs). Failure paths transition to rolling-back and return — the route layer hands off to RollbackHandler which owns the rollback exit, so we don't double-exit and lose tail lines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
checkPendingVerification arms a 60s timer at boot when state is pending-verification and increments bootCount; bootCount>2 forces an immediate rollback (crash-loop guard). markVerified persists the verified state and stops the timer. performRollback restores the backup lockfile, runs git checkout <fromSha> and pnpm install, lands on rolled-back or rollback-failed (terminal) on sub-step failure, exits 75 either way so the supervisor restart brings the new state up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rollback-failed - expressCreateServer now invokes checkPendingVerification before polling starts so a previous boot's pending-verification either re-arms the health-check timer or, when bootCount has climbed past the crash-loop threshold, forces an immediate rollback. - server.ts calls markBootHealthy after state hits RUNNING so /health-being-up is the implicit happy-path signal that cancels the rollback timer. - /admin/update/status surfaces execution + lastResult + lockHeld so the admin UI can render the right Apply / Cancel / Acknowledge state. - UpdatePolicy gains an `executionStatus` input. While it equals 'rollback-failed', canAuto / canAutonomous are denied (reason: rollback-failed-terminal); manual stays on because clicking Apply IS the intervention the terminal state needs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Strict admin-only POSTs that drive Tier 2's manual-click flow: - POST /admin/update/apply: acquire lock, persist preflight, run preflight, drain $drainSeconds, executeUpdate (which exits 75 on success), or run performRollback on a failure path (also exits 75). - POST /admin/update/cancel: cancel a pre-execute drain/preflight, write cancelled lastResult, release lock. - POST /admin/update/acknowledge: clear terminal states (preflight-failed, rolled-back, rollback-failed) back to idle. lastResult is preserved so the admin still sees what happened. - GET /admin/update/log: tail var/log/update.log (200 lines) for the in- progress UI. Strict admin auth. Also: - socketio hook exports getIo() so the apply endpoint can broadcast the drain shoutMessage outside the regular hook surface. - ep.json registers updateActions after admin/updateStatus. - 11 mocha integration tests cover auth, policy denial, execution-busy, acknowledge-clears-terminal, log content-type. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
UpdatePage renders the right action set based on execution.status:
Apply when idle/verified and policy allows, Cancel during
preflight/draining, Acknowledge on terminal preflight-failed /
rolled-back / rollback-failed. While the executor is in flight
(preflight/draining/executing/rolling-back) the page polls
/admin/update/log + /admin/update/status once a second and shows the
rolling tail; polling stops automatically when the run terminates.
lastResult and policy denial reasons surface localised copy. Buttons
disable themselves while a network round-trip is in flight to dodge
double-clicks. New i18n keys live under update.page.{apply,cancel,
acknowledge,log,execution,policy.*,last_result.*}, update.execution.*,
update.banner.terminal.rollback-failed, and update.drain.{t60,t30,t10}.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
broadcastShout now sends {messageKey, values, sticky} so the existing
pad-side shout pipeline can route through html10n.get(). The renderer
gains a values pass-through so update.drain.t60 etc. interpolate
{{seconds}}, and gives updater shouts a different gritter title (the
banner.title localised string) so users know it's a system event
rather than a generic admin message.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… tmp git repo RollbackHandler now does git checkout -f <fromSha> BEFORE overlaying the backup lockfile. Without -f, git refuses checkout when there are unstaged modifications to files it would overwrite — exactly the case after a partial executor run that mutated the working tree. With -f the partial mutation is discarded and the working tree returns to fromSha cleanly. The backup-lockfile copy is still done (belt-and-braces) but tolerates ENOENT since checkout already restored the right lockfile. The new integration suite at src/tests/backend/specs/updater-integration.ts exercises the full pipeline against a disposable git repo: happy path, install-fail rollback, build-fail rollback, crash-loop guard, and a target-sha-doesn't-exist rollback-failed terminal case. 5 mocha tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stubs /admin/update/status (and /admin/update/apply for the apply path) at the route level so we can assert UI transitions without actually running an update. Four scenarios: - Apply button POSTs and re-fetches status (>=2 status fetches total). - install-method-not-writable hides the button and shows localised denial copy. - rollback-failed terminal state shows the Acknowledge button and the "Manual intervention required" lastResult copy. - lockHeld=true hides Apply even when policy.canManual is on. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When execution.status === 'rollback-failed' the banner switches to a role=alert with the strong update.banner.terminal.rollback-failed copy and overrides the regular "update available" framing — an admin who left the system in this state needs to fix it before any other admin work matters. Other terminal states (preflight-failed, rolled-back) are informational and surface on the page itself, not the banner. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
doc/admin/updates.md gains a full Tier 2 section: prerequisites (git install + process supervisor with sample systemd unit), Apply flow with timings, every failure mode and the resulting state, the four endpoints, and the signature-verification opt-in. Settings table picks up the new updates.* knobs. docs/superpowers/specs/2026-04-25-auto-update-runbook.md is the manual smoke runbook the design spec calls for: disposable VM, systemd unit, every observable transition (happy path, install/ build-fail rollback, crash-loop guard, rollback-failed terminal, cancel during drain) plus a sign-off checklist for the release cut. CHANGELOG Unreleased section explains the supervisor requirement and points readers at the runbook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one. |
Review Summary by Qodo(Agentic_describe updated until commit 41d4a42)Tier 2 manual-click updates with session drain, health-check timer, and crash-loop guard
WalkthroughsDescription• Implements **Tier 2 (manual-click) updates** allowing admins to click "Apply update" at
/admin/update to trigger a controlled update workflow
• Core update pipeline: git fetch → git checkout → pnpm install → pnpm run build:ui, with
60-second session drain (T-60/T-30/T-10 broadcasts) before restart
• Boot-time health-check timer with **crash-loop guard**: forces automatic rollback if new version
reboots more than twice (bootCount > 2)
• Terminal rollback-failed state surfaces strong red banner; admins click "Acknowledge" after
manual recovery to clear the lock
• New atomic modules under src/node/updater/: lock.ts (PID-based file lock), trustedKeys.ts
(GPG signature verification), preflight.ts (validation pipeline), UpdateExecutor.ts (execution
pipeline), RollbackHandler.ts (health-check + rollback), SessionDrainer.ts (drain broadcasts),
updateLog.ts (rolling logs)
• New Express routes: POST /admin/update/{apply,cancel,acknowledge}, GET /admin/update/log with
strict admin authentication
• Six new settings under updates.*: preApplyGraceMinutes, drainSeconds,
rollbackHealthCheckSeconds, diskSpaceMinMB, requireSignature, trustedKeysPath (all opt-in
with sane defaults)
• Signature verification is opt-in; requireSignature: false logs warning and passes (Etherpad
release process does not yet sign tags consistently)
• Admin UI (UpdatePage.tsx) renders Apply/Cancel/Acknowledge buttons per execution status, polls
/admin/update/log during flight, displays lastResult and policy denial copy
• Pad UI drain announcements routed through i18n with seconds interpolation; new pad connections
refused during drain via SessionDrainer.acceptingConnections flag
• Comprehensive test coverage: integration tests for executor/rollback, unit tests for all modules,
backend route tests, frontend E2E tests
• Process supervisor (systemd/pm2/docker) **required** to apply updates; exit code 75 signals
restart
• Extensive documentation: admin guide, smoke-test runbook with systemd unit template, i18n strings,
changelog entry
Diagramflowchart LR
Admin["Admin clicks Apply"]
Preflight["Preflight checks<br/>install method, disk space,<br/>lock, signature"]
Drain["Session drain<br/>T-60/T-30/T-10 broadcasts<br/>refuse new connections"]
Execute["Execute pipeline<br/>git fetch/checkout<br/>pnpm install/build"]
PendingVer["pending-verification<br/>exit 75"]
HealthCheck["Boot health-check timer<br/>crash-loop guard"]
Verified["verified<br/>success"]
Rollback["performRollback<br/>restore SHA + lockfile"]
RolledBack["rolled-back"]
RollbackFailed["rollback-failed<br/>terminal state"]
Admin --> Preflight
Preflight -->|ok| Drain
Preflight -->|fail| Idle1["idle"]
Drain --> Execute
Execute -->|success| PendingVer
Execute -->|failure| Rollback
PendingVer --> HealthCheck
HealthCheck -->|health ok| Verified
HealthCheck -->|timeout or bootCount > 2| Rollback
Rollback -->|success| RolledBack
Rollback -->|failure| RollbackFailed
Idle1 -.->|acknowledge| Idle2["idle"]
RollbackFailed -.->|acknowledge| Idle2
File Changes1. src/node/hooks/express/updateActions.ts
|
Code Review by Qodo
1.
|
Tier 2 refuses Apply on installMethod=docker because in-container mutation doesn't survive a container restart. Adds a future-work note covering the two reasonable paths for an in-product docker Apply button (instructions-only vs deploy-webhook) and explicitly rules out mounting /var/run/docker.sock as a footgun. Watchtower gets a pointer for admins who want fully autonomous docker updates today. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1. Tier 2 endpoints now gate on tier in {manual, auto, autonomous} —
notify and off return 404 to match the prior PR-1 behaviour. Gate is
evaluated per-request via app.use middleware so a settings.json reload
takes effect without a full restart, and so integration tests can flip
the tier dynamically. Adds a regression test that exercises 404 at
tier=notify across all four endpoints.
2. cancel/apply race fixed: /admin/update/cancel no longer releases the
lock — apply's finally block owns it for the request's lifetime. Apply
now reloads state after preflight and aborts with 409 cancelled-during-
preflight if execution.status is no longer 'preflight' for the same
targetTag. Prevents a second apply from sneaking in while the first is
still running its slow checks, and prevents the post-cancel apply from
continuing into drain/execute.
3. SessionDrainer now restores acceptingConnections=true at drain
completion (not just on cancel). The lock + persisted execution.status
prevent a fresh apply from racing in — the in-memory flag was redundant
safety that turned into a wedge if the executor threw post-drain. Adds
a unit test asserting the flag is restored after natural drain end.
4. PadMessageHandler drain guard switched from socket.json.send (a
socket.io v2/v3 API that may not exist on v4) to socket.emit('message',
...) for consistency with the other disconnect paths in the file.
5. Spawn 'error' handlers added to runStep helpers in UpdateExecutor and
RollbackHandler, plus the gpg verify-tag spawn in trustedKeys. Without
them, a missing/unexecutable binary leaves the promise hanging forever
and the update flow stuck in-flight. SpawnFn type extended to allow
on('error', ...) listeners cleanly. Spawn errors now resolve with code
1 + the error message in stderr, so the existing failure-detection
branches fire normally.
6. executeUpdate body wrapped in try/catch. An exception from readSha,
saveState, copyFile, or any step now lands in a rolling-back persist +
returns failed-checkout, so the route's post-executor rollback path
picks it up. State can no longer wedge at 'executing'. The catch's
inner saveState is itself try/wrapped so a write-after-write failure
doesn't crash the route either.
CI: Playwright update-page-actions strict-mode violation fixed. Both the
banner and the lastResult <p> contain "Manual intervention required";
selector now scopes to p.last-result-rollback-failed for the lastResult
assertion specifically.
129 vitest unit tests + 23 mocha integration tests passing; ts-check clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one. |
|
Persistent review updated to latest commit 41d4a42 |
…in values) ether#7. /admin/update/status now redacts diagnostic strings for unauth callers even when requireAdminForStatus is left at its default (false). Status enum + outcome enum are kept (the admin banner / pad-side badge need them to render the right UI) but execution.reason / execution.fromSha / execution.targetTag and the same fields on lastResult are stripped. Authed admin sessions still get the full payload — they're looking at their own server's diagnostics. Two new mocha tests cover both paths: "redacts execution.reason / lastResult.reason for unauth callers" and "returns full diagnostic payload to authed admin sessions". ether#8. SessionDrainer no longer schedules T-30 / T-10 broadcasts when the configured drainSeconds can't honour them. Previously, with drainSeconds < 30 the T-30 timer fired at zero remaining but the broadcast still claimed "30 seconds" — misleading. Now T-30 only schedules when drainSeconds > 30 and T-10 only when > 10. Admins picking a short drain get fewer announcements but each carries an accurate countdown. The opening announcement now reports the configured drain length rather than a hardcoded 60. Two updated unit tests: drainSeconds=15 (skips T-30, still fires T-10) and drainSeconds=5 (skips both). 131 vitest unit + 26 mocha integration tests passing; ts-check clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Qodo review — resolution statusAll 8 concerns from the original Qodo review have been addressed across two commits (
Plus the CI Playwright strict-mode selector failure (banner + 131 vitest unit + 26 mocha integration tests passing locally; full CI matrix green on the latest commit before this push. |
| if (state.bootCount > 2) { | ||
| // Don't await — fire and forget so the boot sequence proceeds; the rollback | ||
| // path will exit 75 asynchronously and the supervisor restarts on the | ||
| // restored SHA. | ||
| void performRollback(state, deps); | ||
| return {armed: false, markVerified: () => {}}; | ||
| } | ||
|
|
||
| const incremented: UpdateState = {...state, bootCount: state.bootCount + 1}; | ||
| void deps.saveState(incremented); | ||
|
|
||
| let cleared = false; | ||
| const timer = setTimeout(() => { | ||
| if (cleared) return; | ||
| void performRollback({ | ||
| ...incremented, | ||
| execution: { | ||
| status: 'rolling-back', | ||
| reason: 'health-check-timeout', | ||
| targetTag: exec.targetTag, | ||
| fromSha: exec.fromSha, | ||
| at: deps.now().toISOString(), | ||
| }, | ||
| }, deps); | ||
| }, deps.rollbackHealthCheckSeconds * 1000); |
There was a problem hiding this comment.
2. Unhandled rollback rejections 🐞 Bug ☼ Reliability
checkPendingVerification() fires deps.saveState(...) and performRollback(...) with void and no .catch(). If any of those Promises reject (e.g., filesystem error while saving terminal state), the process can hit an unhandled rejection and/or skip the intended rollback/exit behavior.
Agent Prompt
### Issue description
RollbackHandler uses `void someAsyncFn()` without attaching `.catch()`. If those async operations reject (notably `deps.saveState()` inside rollback), the rejection can be unhandled and the rollback/verification flow may not reach the intended terminal state + `exit(75)`.
### Issue Context
These code paths run during boot (pending-verification) and are explicitly intended to be non-blocking, but they still must handle failures deterministically.
### Fix Focus Areas
- src/node/updater/RollbackHandler.ts[79-105]
- src/node/updater/RollbackHandler.ts[160-209]
### What to change
- Replace fire-and-forget calls with guarded variants:
- `void performRollback(...).catch((err) => logger.error(...))`
- `void deps.saveState(...).catch((err) => logger.warn(...))`
- Consider a fallback action on rollback failure (for example: attempt to persist a minimal `rollback-failed` state in a best-effort try/catch, then call `deps.exit(75)`), so the system does not remain in a broken intermediate state.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
…ions, state validation
Qodo posted three new concerns after the first fix push.
1. Git tag option injection (security). The release tag from GitHub's
tag_name flowed into `git checkout` / `git verify-tag` as a positional
arg. A tag starting with '-' would be parsed as an option and could
bypass signature verification or change checkout semantics. Mitigated
in three layers:
- New refSafety helper (isValidTag / assertValidTag / refsTagsForm)
enforces a strict subset of git's check-ref-format spec: rejects
leading '-' or '.', whitespace, control chars, and ~ ^ : ? * [ \\
and the '..' sequence.
- VersionChecker validates tag_name before persisting to state, so a
malformed value from a misconfigured githubRepo never lands on disk.
- UpdateExecutor calls assertValidTag and uses the refs/tags/<tag>
form for git checkout. trustedKeys also validates and adds '--' to
git verify-tag for an end-of-options marker. updateActions does an
up-front isValidTag check on state.latest.tag so a corrupt state
file gets a clean 409 instead of a 500.
2. Unhandled rollback rejections. checkPendingVerification was firing
`void deps.saveState(...)` and `void performRollback(...)` without
.catch(), so an fs error during boot's rollback path would bubble out
as an unhandled rejection. Both callsites now go through fireSaveState
/ fireRollback helpers that catch and log; rollback rejections fall
through to a best-effort terminal-state write + exit 75 so the
supervisor can re-try the next boot with bootCount++.
3. Execution state under-validated. isValidExecution previously checked
only that `status` was a known enum value, so a hand-edited state file
with `{execution: {status: 'pending-verification'}}` (missing fromSha
/ targetTag / deadlineAt) would pass validation and reach
RollbackHandler with undefined refs. The validator now consults a
per-status required-fields map mirroring the ExecutionStatus union in
types.ts and rejects empty strings as well as missing fields. Same
tightening applied to lastResult.outcome (must be in the allowed enum,
not just any string). Six new unit tests cover hand-edited corruption.
145 vitest + 26 mocha tests green; ts-check clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Qodo follow-up review — resolution status (3 new concerns)Qodo's persistent review surfaced three additional concerns after my first fix push (
Running tally — all 11 Qodo concerns resolved
Test totals after all fixes: 145 vitest unit tests + 26 mocha integration tests passing locally; ts-check clean. CI rebuilding now. Free-tier Qodo can't auto-re-review again this month, so this comment serves as the audit trail for human reviewers. |
| return true; | ||
| }; | ||
|
|
||
| const wrapAsync = |
There was a problem hiding this comment.
That's a lot of any. There are still these fetch calls everywhere in place. Is this setup for the next tier upgrade?
SamTV12345
left a comment
There was a problem hiding this comment.
Besides the one comment. Nice addition.
Summary
Ships Tier 2 (manual click) of the four-tier auto-update design at
docs/superpowers/specs/2026-04-25-auto-update-design.md. Builds on PR #7601 (Tier 1 — notify, merged 2026-05-01). Closes part of #7607./admin/update. Etherpad runs a 60s session drain (with T-60 / T-30 / T-10 broadcasts to every connected pad),git fetch / checkout / pnpm install --frozen-lockfile / pnpm run build:ui, and exits with code 75 so a process supervisor restarts it on the new version./healthdoesn't come up the previous SHA + lockfile are restored automatically. Crash-loop guard: if the new version reboots more than twice (bootCount > 2) RollbackHandler forces a rollback regardless of the timer.rollback-failedstate surfaces a strong red banner; admins click Acknowledge after manual recovery to clear the lock and re-allow Tier 2 attempts.canAuto/canAutonomousare denied while terminal; manual stays on because clicking Apply is the intervention.updates.*:preApplyGraceMinutes,drainSeconds,rollbackHealthCheckSeconds,diskSpaceMinMB,requireSignature,trustedKeysPath(all opt-in / sane defaults).updates.requireSignature) is opt-in and stub-friendly:false→ log warning and pass;true→git verify-tag <tag>against the user keyring (ortrustedKeysPathvia$GNUPGHOME). Etherpad's release process does not yet sign tags consistently — turning on by default would block every Tier 2 update, so this is documented as follow-up.A process supervisor (systemd / pm2 / docker
--restart=unless-stopped) is required to apply updates. Without one, exit 75 leaves the instance down. The supervisor requirement and a sample systemd unit live indoc/admin/updates.md.Tier 3 (auto with grace window) and Tier 4 (autonomous within maintenance window) are out of scope for this PR.
Architecture
New atomic units under
src/node/updater/:lock.ts— PID-basedvar/update.lockwith stale-PID reaping.trustedKeys.ts—verifyReleaseTag(GPG viagit verify-tag).preflight.ts— sequenced check pipeline with typed reasons.UpdateExecutor.ts— DI spawn pipeline (snapshot → fetch → checkout → install → build → exit 75).RollbackHandler.ts— boot-time health-check timer + crash-loop guard + restore-from-backup.SessionDrainer.ts— timed broadcasts + accept-flag.updateLog.ts— rollingvar/log/update.log(10 MB × 5) +tailLines(n).New routes in
src/node/hooks/express/updateActions.ts:POST /admin/update/{apply,cancel,acknowledge},GET /admin/update/log— strict admin auth on all four.RollbackHandler.checkPendingVerificationwires into boot insrc/node/updater/index.ts;markBootHealthyis called fromsrc/node/server.tsonce state hitsRUNNINGso/health-being-up is the implicit happy-path signal that cancels the rollback timer.Admin UI (
admin/src/pages/UpdatePage.tsx): renders Apply / Cancel / Acknowledge perexecution.status, polls/admin/update/logwhile in flight, surfaces lastResult and policy denial copy.UpdateBanner.tsxadds a terminal-state alert variant.Pad UI: existing shoutMessage pipeline (
src/static/js/pad.ts) learnsmessageKey + valuesand routes throughhtml10n.get(key, values)soupdate.drain.t60/t30/t10interpolate the seconds remaining and render the localised string. (Avoids the unboundwindow._bug documented in memory.)Test plan
pnpm exec vitest run tests/backend-new/specs/updater— 128 unit tests across 13 files (lock, trustedKeys, preflight, updateLog, SessionDrainer, UpdateExecutor, RollbackHandler, drainer-handshake, UpdatePolicy, index-boot wiring, state validator backfill).pnpm run test --grep updater-integration— 5 mocha integration tests against a tmp git repo: happy path lands on pending-verification, install-fail rollback, build-fail rollback, crash-loop forced rollback, target-sha-doesn't-exist terminalrollback-failed.pnpm run test --grep updateActions— 11 mocha API tests for the four new endpoints (auth, policy, terminal-state acknowledge, content-type).pnpm run test --grep updateStatus— 7 existing PR 1 API tests still pass with the extended status payload.src/tests/frontend-new/admin-spec/update-page-actions.spec.ts— Apply button posts and re-fetches, install-method-not-writable hides Apply + shows denial copy, rollback-failed shows Acknowledge, lockHeld hides Apply.pnpm run ts-checkclean.pnpm --filter admin run buildclean.docs/superpowers/specs/2026-04-25-auto-update-runbook.md— pre-merge gate per design spec § "Phased rollout / PR 2". Cover happy path, both rollback variants, crash-loop guard, rollback-failed terminal, cancel-during-drain, drain announcement renders the localised string in a real pad. Reviewer must run this on a disposable VM before merge.Notes
docs/superpowers/plans/2026-05-08-auto-update-pr2-manual-click.md.execution/bootCount/lastResultfields, regression-tested.🤖 Generated with Claude Code