Conversation
added 4 commits
March 22, 2026 11:07
- Add EventLogSubscriber: writes every workflow event as JSONL to $TMPDIR/conductor/ so diagnostic data is always available, not just when --web or --log-file is passed. - Always create the WorkflowEventEmitter regardless of --web flag, enabling event-driven diagnostics for all runs. - Enrich workflow_failed event with timeout-specific fields (elapsed_seconds, timeout_seconds, current_agent) so timeouts are immediately diagnosable. - Emit new checkpoint_saved event with path, agent, and error type. - Add /api/logs download endpoint to web dashboard. - Show "Download Logs" button in dashboard header after workflow ends. - Enhance error banner with timeout agent name and checkpoint path. - Add 6 tests for EventLogSubscriber.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When a workflow hangs or fails, there is no easy way to diagnose what happened. The event history only exists in-memory (lost on crash), log files require an explicit
--log-fileflag, and the dashboard gives no indication of whether the model is thinking or the connection is dead. Timeout errors say "workflow timed out" but not which agent was stuck or for how long.Solution
Unify all observability through the event emitter — make it always-on, persist events to disk automatically, and surface diagnostic data in the dashboard.
Always-on structured event logging
WorkflowEventEmitteris now created for every run, not just--webmodeEventLogSubscriberwrites every event as JSONL to$TMPDIR/conductor/conductor-<name>-<timestamp>.events.jsonlEngine refactor: events as the single source of truth
_verbose_log*calls and 13 lazy-import wrappers fromworkflow.py(~180 lines deleted)ConsoleEventSubscriberinrun.py— subscribes to the emitter and calls the existingverbose_log_*display functionsRicher failure diagnostics
workflow_failedevent now includeselapsed_seconds,timeout_seconds, andcurrent_agentfor timeout errors — directly answers "which agent was stuck?"checkpoint_savedevent emitted with file path, agent name, and error typeDashboard: idle detection and log access
Xs idleafter 5s of no events, turns amber at 60s — immediately tells you whether the model is thinking or the connection stalledGET /api/logsendpoint returns full event history as downloadable JSONawaiting_modelevent emitted by both providers right before SDK/API calls, marking the exact start of dead zonesProvider parity
awaiting_modelevent to Claude provider (was only in Copilot)Testing
EventLogSubscribertest_for_each_verbose.pyto verify events instead of patching removed functionsawaiting_modelevent sequenceCommits
aaf42dc— Always-on JSONL event logging, richer timeout/checkpoint events1cedfcc— Consolidate verbose logging into ConsoleEventSubscriberb978284— Idle timer, always-on logs button, awaiting_model event (Copilot)24d9e28— Claude provider parity, AGENTS.md provider parity rules