feat(engine): stop/resume workflow control with provider-level idle detection#53
Open
feat(engine): stop/resume workflow control with provider-level idle detection#53
Conversation
- Separate kill_event from _stop_event to prevent permanent poisoning - Add disconnect_event for auto-resume when browser closes during pause - Race interrupt signal against _execute_with_parse_recovery (Claude parity) - Copy working_messages in _request_partial_output to avoid mutation - Hoist json import to module level, remove scattered lazy imports - Silently consume stale interrupt in web mode between-agent checks - Remove dead _wait_with_interrupt method (merged into idle detection) - Header: useEffect for state resets, error logging, killing state - Updated docstrings for stop/kill terminology consistency
e6bf69f to
8bbdd1e
Compare
added 4 commits
March 22, 2026 20:06
- Separate kill_event from _stop_event to prevent permanent poisoning - Add disconnect_event for auto-resume when browser closes during pause - Race interrupt signal against _execute_with_parse_recovery (Claude parity) - Copy working_messages in _request_partial_output to avoid mutation - Hoist json import to module level, remove scattered lazy imports - Silently consume stale interrupt in web mode between-agent checks - Remove dead _wait_with_interrupt method (merged into idle detection) - Header: useEffect for state resets, error logging, killing state - Updated docstrings for stop/kill terminology consistency - Extended kill test to verify kill_event and bg_event
- Update github-copilot-sdk dependency to >=0.2.0 - Lower idle timeout from 300s to 90s for faster stall recovery - Update provider and tests for SDK v0.2.0 API changes
- Add --locked flag to all uv tool install commands for reproducible deps - Increase idle recovery max attempts from 3 to 5 (7.5min total runway)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stop/Resume workflow control for the web dashboard, provider-level idle detection with interrupt support, and dashboard state resilience across page refreshes.
Changes
Provider-level idle detection + interrupt (Copilot)
interrupt_signalcompletely bypassed idle recovery, causing sessions to hang foreverProvider interrupt during API calls (Claude)
_execute_api_calland_execute_with_parse_recoveryfor provider parityworking_messagesin_request_partial_outputto avoid mutating caller's historyWeb dashboard Stop → Resume
POST /api/stop): Aborts the current agent via interrupt event (not kill workflow)POST /api/resume): Re-executes the paused agentPOST /api/kill): Hard-stop with checkpoint saved for CLI resumekill_eventfrom_stop_eventto prevent permanent poisoning across pause cyclesdisconnect_eventfor auto-resume when browser closes during pause (prevents hanging forever)agent_paused/agent_resumedevents for dashboard state trackingDashboard state resilience (page refresh)
workflow_startedhandler uses event timestamp instead ofDate.now()— elapsed timer survives refreshagent_started/script_startedstorestartedAtfrom event timestamp — node elapsed timers survive refreshreplayStatesetslastEventTimefrom last replayed event — idle timer works after refreshuseLiveElapsedhook uses storedstartedAtinstead ofDate.now()on mountWeb mode interrupt support
interrupt_eventin--web/--web-bgmode so dashboard Stop button reaches the engineinterrupt_eventwith dashboard viaset_interrupt_event()Other
/api/revive, revive watcher) — replaced by idle detection + Stop/Resume_wait_with_interruptmethod (merged into idle detection)jsonimport to module level in workflow engineuseEffectfor button state resets, error logging in catch blocks,killingstateTest results