refactor(monitor): extract DeadDetector, RateLimitRetryHandler, StatusBroadcaster#4392
refactor(monitor): extract DeadDetector, RateLimitRetryHandler, StatusBroadcaster#4392OneStepAt4time wants to merge 7 commits into
Conversation
There was a problem hiding this comment.
🔴 Changes Requested — CI Red + Structural Issues
Gate 3: CI ❌
Bundle size 2512KB exceeds the 2496KB threshold. The new files add net ~16KB to compiled output. Bump the threshold if justified, or reduce overhead.
Architectural Concerns
1. Bracket-access pattern is fragile and bypasses TypeScript privacy
The backward-compat accessors reach into private fields via bracket notation:
this.statusBroadcaster["idleNotified"]this.deadDetector["deadNotified"]this.rateLimitHandler["retryAttempts"]
This is as any in disguise — defeats the purpose of extracting modules with proper interfaces. Instead, expose public getters on each extracted class. Any internal rename silently breaks the monitor.
2. Re-wiring deps via bracket mutation
setEventBus, setAlertManager, setMetrics reach into deps objects and overwrite callbacks:
this.statusBroadcaster["deps"].emitApproval = (sid, content) => bus.emitApproval(sid, content);
Extremely fragile — if deps is renamed or restructured, these silently break with no compile-time error. Consider a reconnect(bus) method on each module instead.
3. monitor.ts still 607 lines — over the 500-line gate
The PR description acknowledges this but the file is still over the architectural threshold from Ema directive (2026-05-26). Either finish the extraction in this PR or get explicit Boss/Ema approval for the remaining size.
4. Dead code: STATUS_CHANGE_DEBOUNCE_MS unused
Declared in status-broadcaster.ts but never referenced — the debounce logic stayed in monitor.ts. Remove it or move the debounce logic into StatusBroadcaster.
Summary of Required Changes
- Fix CI — bundle size within threshold
- Replace bracket-access with proper public getters/methods on extracted classes
- Replace deps mutation in setters with explicit reconnect/update methods
- Address the 607-line monitor.ts (continue extraction or get explicit approval)
- Remove unused STATUS_CHANGE_DEBOUNCE_MS constant
There was a problem hiding this comment.
🔄 Review — solid extraction, needs threshold bump + encapsulation fixes.
Architecture: ✅ Clean decomposition pattern — DeadDetector, RateLimitRetryHandler, StatusBroadcaster, metrics-aggregation. Each has a well-defined deps interface. Good separation of concerns.
Issues:
-
Bundle threshold exceeded: 2512KB > 2496KB. New files add ~16KB. Bump to 2516 in both places in
.github/workflows/ci.yml. -
Encapsulation leaks — backward-compat accessors use bracket notation:
this.statusBroadcaster['idleNotified'](line ~102, 105, 498, 502)this.deadDetector['deadNotified'](line ~100)this.rateLimitHandler['retryAttempts'](line ~102)this.statusBroadcaster['statusChangeDebounce'](line ~495, 499)
These access private fields of extracted modules from the parent. This defeats the purpose of extraction. Add proper getter methods on each extracted class:
StatusBroadcaster.getIdleNotified(): Set<string>StatusBroadcaster.getIdleSince(): Map<string, number>StatusBroadcaster.getDebounceMap(): Map<string, NodeJS.Timeout>DeadDetector.getDeadNotified(): Set<string>RateLimitRetryHandler.getRetryAttempts(): Map<string, number>
-
setEventBus()re-wires deps via bracket notation:this.statusBroadcaster['deps'].emitApproval = ... this.deadDetector['deps'].emitDead = ...
Mutating private deps objects is fragile. Add
setEventBus(bus)methods on each extracted class, or make the deps mutable through a proper setter. -
setAlertManager()same pattern — bracket access to['deps'].alertFailure. -
makePayloadduplication: StatusBroadcaster has its ownmakePayload()AND the constructor receives it as a dep AND SessionMonitor still callsthis.makePayload()which delegates tothis.statusBroadcaster.makePayload(). The chain is: SessionMonitor.makePayload → statusBroadcaster.makePayload. The dep callback is never used because the extracted class uses its own. Pick one — either the dep callback or the method, not both. -
No tests for new modules. DeadDetector, RateLimitRetryHandler, StatusBroadcaster are completely untested. At minimum, unit tests for the pure logic (debounce, retry calculation, dead session detection) should exist.
-
Comment cleanup: Many removed comments were issue-reference comments (#89, #2807, #2538, etc.). These are useful for traceability. Consider keeping key ones or adding them to the extracted files.
Fix 1-4 for merge. 5-6 are strongly recommended. 7 is optional.
There was a problem hiding this comment.
👁️ Argus Review — Changes Requested (2nd round)
Summary
Clean extraction of 4 modules from SessionMonitor (871→607) and MetricsCollector (636→475). Architecture is sound. Two issues blocking merge.
Gate 3: ❌ Bundle Size — 2512KB vs 2496KB threshold
The extraction adds ~16KB to compiled output because each new file gets its own module wrapper + re-export indirection. This is expected for refactors but CI requires a threshold bump.
Fix: Bump the bundle threshold in CI from 2496 to 2516 (or 2520 for margin). This should be a one-liner in the GitHub Actions workflow.
Minor Issues (non-blocking but should fix)
-
DeadDetector line 23 —
this.maxDeadAge = opts?.maxDeadAge ?? 60_000— no validation that maxDeadAge > 0. If someone passes 0 or negative, dead detection would fire immediately for any session. AddMath.max(1_000, opts?.maxDeadAge ?? 60_000)or assert. -
StatusBroadcaster makePayload — delegated from SessionMonitor. The comment says
@internalbut this is a public method now. Consider making it private or renaming to_makePayloadto signal it is not part of the public API. -
RateLimitRetryHandler line 89 —
emit("rate-limit", { sessionId, retryAfterSec })— theretryAfterSeccan be undefined if the signal has no header. Downstream consumers should handle this, but worth documenting.
What Looks Good ✅
- Pure extraction — no logic changes, code is verbatim moved
- metrics-aggregation.ts — clean pure function, testable in isolation
- Backward compat —
index.tsre-exports everything, no import breaks - Dependency injection — StatusBroadcaster receives deps, not global state
- Security — no secrets, no input validation gaps, no new attack surface
- Naming — consistent with existing codebase patterns
Action Items
- Bump bundle threshold 2496 → 2516 in CI workflow
- (Optional) Add validation for
maxDeadAge > 0in DeadDetector
Once CI is green, I will approve and merge.
There was a problem hiding this comment.
🔴 Changes Requested — 3 Gates Failing
Gate 2: Merge Conflicts ❌
Branch is CONFLICTING with develop. Rebase required before merge.
Gate 3: CI Red ❌
All 6 CI jobs fail with the same TS2322 error on makePayload mock types in all 3 new test files (dead-detector, rate-limit-retry-handler, status-broadcaster). Commit dac15af attempted fix with as any casts but full CI only ran on commit 748403a. Push empty commit to trigger full CI matrix and validate the fix.
Also: the as any pattern in test mocks is acceptable but consider a single typed helper function to centralize the cast rather than repeating it in each test file.
Gate 7: No Linked Issue ❌
PR body has no Closes #X or Fixes #X reference. Please link the architecture fix plan issue.
Gate 9: Targets develop ✅
Correctly targets develop.
Architecture Notes (non-blocking)
- Extraction is clean with clear single responsibility per module
- Deps interfaces and updateDeps() pattern works but is fragile for late wiring
- monitor.ts still 607 lines (over 500) — acknowledged as follow-up
- Bundle threshold bump 2496 to 2516 is reasonable
Action Items
- Rebase on develop to resolve conflicts
- Ensure full CI passes (push empty commit to trigger full matrix)
- Add linked issue reference to PR body
There was a problem hiding this comment.
Review: PR #4392 — Monitor Split Refactor
Verdict: 🔄 CHANGES REQUESTED
The extraction pattern is solid and well-structured — clear deps interfaces, updateDeps for late wiring, backward-compat accessors, and dedicated tests for each new module. However, this introduces two behavioral regressions that break the "pure refactor, no behavior change" contract.
🔴 Issue 1: maybeInjectFault removed from broadcastStatusChange
The original broadcastStatusChange in monitor.ts had:
await maybeInjectFault('monitor.broadcastStatusChange.start');The new status-broadcaster.ts drops this entirely. Fault injection for status broadcasting is lost. This is a test coverage regression — any fault-injection tests targeting this path will silently stop working.
Fix: Import and call maybeInjectFault at the top of broadcastStatusChange in status-broadcaster.ts.
🔴 Issue 2: recordStatus changes idle-tracking behavior
Original checkSession logic:
} else {
this.idleSince.delete(session.id);
// Only clear idleNotified for working/unknown
if (result.status === 'working' || result.status === 'unknown') {
this.idleNotified.delete(session.id);
}
}New recordStatus in status-broadcaster.ts:
} else {
this.idleSince.delete(sessionId);
this.idleNotified.delete(sessionId); // Clears for ALL non-idle statuses!
}This means transitions to permission_prompt, plan_mode, context_warning, ask_question will now clear idleNotified, potentially causing duplicate idle notifications if the session goes idle again. This is a behavioral regression.
Fix: Restore the original guard — only clear idleNotified for working/unknown statuses.
🟡 Issue 3: Merge conflicts (GATE 2)
mergeStateStatus: CONFLICTING — needs rebase on develop before merge.
🟡 Issue 4: CI incomplete
Only GitGuardian ran; the main CI workflow has not triggered yet. All gates must pass.
✅ What looks good
- Clean extraction pattern with dependency injection via deps interfaces
updateDepsfor late wiring (setEventBus, setAlertManager, setMetrics)- Backward-compatible forwarding accessors on SessionMonitor
- 587 lines of new tests (3 test files) covering all extracted modules
- Pure-function extraction for metrics-aggregation with re-exports
- Bundle threshold bump (+20KB) reasonable for 4 new modules
Summary
Fix the two behavioral regressions, rebase, and confirm CI green. Happy to re-review once addressed.
There was a problem hiding this comment.
🔄 Changes Requested — 3 blockers, 1 observation.
Blockers
1. Merge Conflicts (Gate 2)
Branch refactor/monitor-split has conflicts with develop. Rebase onto develop and resolve before merge.
2. CI Status Incomplete
Only GitGuardian check visible — other CI jobs (test, lint, CodeQL, Trivy, helm-smoke, dashboard-e2e, platform-smoke) are either still running or not triggered. All 14+ checks must pass (Gate 3).
3. monitor.ts Still Over 500 Lines (Architectural Gate)
Post-refactor: 607 lines. The architectural review gate rejects files > 500 lines. The PR body acknowledges further extraction is needed (checkStopSignals extraction planned), but the current state does not pass the gate. Either:
- Extract
checkStopSignals+ signal handling in this PR, OR - Get explicit Boss/Ema approval for 607 lines as an intermediate step.
Observation (non-blocking)
- Bundle threshold bump (2496→2516 KB, +20 KB): Since this is pure extraction (no new logic), the bundle should not grow. The threshold bump suggests the module split added overhead. Worth investigating whether tree-shaking can recover this, but not blocking.
What Looks Good
- Extraction is faithful — all 4 modules (DeadDetector, RateLimitRetryHandler, StatusBroadcaster, computeAggregateMetrics) preserve exact behavior from the original.
- Backward compatibility — re-exports from
metrics.ts+ delegation accessors inSessionMonitorensure no public API breaks. - Test coverage — new test files for dead-detector (146 lines), rate-limit-retry-handler (154 lines), status-broadcaster (287 lines) cover happy paths + edge cases + optional callback handling.
- Deps pattern — the
updateDeps()approach for late-bound callbacks (eventBus, alertManager, metrics) is clean. - metrics-aggregation.ts — pure function extraction, well-typed, clean separation.
Fix the conflicts, get CI green, and address the 500-line gate — then this is ready to merge.
There was a problem hiding this comment.
Review: refactor(monitor): extract DeadDetector, RateLimitRetryHandler, StatusBroadcaster
Verdict: 🔄 Changes Requested
The extraction itself is mechanically clean — well-structured modules, good test coverage, backward-compatible re-exports. However, several gates fail before we can merge.
❌ Gate Failures
1. Merge Conflicts (Gate 2)
mergeable: CONFLICTING — branch needs rebase onto develop before merge is possible.
2. CI Incomplete (Gate 3)
Only GitGuardian ran. No test, build, typecheck, or lint results visible. CI must be fully green.
3. Bundle Size Bump — Needs Justification
THRESHOLD_KB bumped from 2496 → 2516 (+20KB / ~3.3%). This is a refactoring PR that should be size-neutral or smaller — extracting code into separate modules should not increase bundle size. What is causing the 20KB increase? If it is import/export overhead from the new files, document it. If actual code was added, it should be a separate PR.
✅ What Looks Good
- Clean extraction pattern: DeadDetector (107 lines), RateLimitRetryHandler (162 lines), StatusBroadcaster (186 lines), metrics-aggregation (178 lines) — all well under the 500-line gate.
- Tests: 3 new test files (146 + 154 + 287 = 587 lines of tests) covering the extracted modules. Good coverage of edge cases (retry exhaustion, debounce, dedup, optional callbacks).
- Backward compatibility: Re-exports from
metrics.ts, forwarding accessors onSessionMonitorfor test access viaas any— no public API breakage. - Dependency injection pattern:
updateDeps()methods for late-bound callbacks (event bus, alert manager, metrics) — clean. - No
as anyadditions in source code (test mocks use it, which is fine). - monitor.ts 871 → 607 lines — 30% reduction, moving in the right direction.
⚠️ Observations (non-blocking for this PR, but track)
- monitor.ts at 607 lines still exceeds the 500-line Architectural Review Gate. The PR body notes
checkStopSignalscan be extracted further — that should be a follow-up PR in this same refactor series. monitor/index.tsis just a comment file — consider either removing it or adding barrel exports if it is intended as a public API surface.makePayloadis nowpublic(wasprivate) onSessionMonitor. This is fine for the DI pattern but worth noting — if it was intentionally public before for tests, the change is cosmetic.
Action Items
- Rebase onto
developto resolve conflicts - Investigate and justify (or remove) the +20KB bundle threshold bump
- Wait for full CI green
- Re-request review once addressed
There was a problem hiding this comment.
Review: refactor(monitor) extraction
Architecture: ✅ Good separation — Clean DI pattern with Deps interfaces, backward-compat forwarding getters, solid new test coverage (587 lines across 3 test files). The extraction logic itself is correct and faithful to the original.
However, blocking issues:
🔴 Merge Conflicts
PR is CONFLICTING — needs rebase on develop before merge.
🟡 CI Pending
Only GitGuardian passed. All CI must be green before merge.
🟡 monitor.ts still 607 lines — over architectural gate
Per the 500-line file limit directive, monitor.ts at 607 lines still violates the gate. The PR body mentions checkStopSignals can be extracted further — that should be part of this PR or a same-day follow-up. Leaving it at 607 with a "TODO" is not sufficient.
🟡 Bundle threshold bump (2496→2516)
+20KB from new module files. Acceptable but should be called out explicitly.
Minor observations:
DeadDetectorDeps.removeSessiondelegates back toSessionMonitor.removeSession— the circular dependency is manageable but worth documentingRateLimitRetryHandlercorrectly preserves the fire-and-forget retry patternstatus-broadcaster.ts:87— extra blank line
Verdict: REQUEST_CHANGES — Rebase to resolve conflicts, wait for CI green, and address the 607-line remaining issue (extract checkStopSignals or similar to get under 500).
…sBroadcaster Split SessionMonitor (871 → 607 lines) by extracting three self-contained modules under src/monitor/: - dead-detector.ts (107 lines): Dead session detection and cleanup - rate-limit-retry.ts (162 lines): Rate-limit signal handling with exponential backoff and cross-session coordination - status-broadcaster.ts (186 lines): Status change detection, debouncing, idle notification dedup, and auto-approve logic SessionMonitor remains the orchestrator, delegating to these modules while maintaining full backward compatibility with all 263 monitor tests passing. Public API unchanged. Architecture Fix Plan: Priority 6 (monitor.ts split)
…tion.ts Extract the pure-function aggregation logic (163 lines) into metrics-aggregation.ts, reducing metrics.ts from 636 → 475 lines. The class MetricsCollector (378 lines) is now under the 500-line quality gate. Re-exported from metrics.ts for backward compatibility — all consumers import from metrics.js continue to work unchanged. All 42 metrics/aggregation tests pass. Architecture Fix Plan: metrics.ts now under 500-line gate ✅
…eps, bump bundle threshold - Replace all bracket-access patterns (e.g. ['idleNotified']) with proper public getters on extracted classes (getIdleNotified, getDeadNotified, getRetryAttempts, getIdleSince, getDebounceMap) - Add updateDeps() methods on DeadDetector, RateLimitRetryHandler, StatusBroadcaster, and StallDetector for clean dependency re-wiring - Remove setEventBus/setAlertManager/setMetrics bracket mutation — now uses updateDeps() methods - Remove unused STATUS_CHANGE_DEBOUNCE_MS constant - Deduplicate makePayload — StatusBroadcaster no longer has its own copy - Bump bundle threshold from 2496 to 2516 in CI (both matrix jobs)
CI runs strict TS; vi.fn() return type doesn't match SessionEventPayload explicitly. Added `as any` cast to the mock factory return.
3e02c27 to
627b01c
Compare
👁️ CI Still Failing —
|
… tests The vi.fn() mock type is not assignable to the plain function signature in DeadDetectorDeps/RateLimitRetryDeps/StatusBroadcasterDeps under TS 6. Use plain typed function cast instead of vi.fn() for makePayload helpers. Fixes CI failures on refactor/monitor-split branch.
Summary
Split
SessionMonitor(871 → 607 lines) andMetricsCollector(636 → 475 lines) by extracting self-contained modules.Monitor Extraction
monitor/dead-detector.tsmonitor/rate-limit-retry.tsmonitor/status-broadcaster.tsMetrics Extraction
metrics-aggregation.tsArchitecture Fix Plan Progress
monitor.tsmetrics.tsdead-detector.tsrate-limit-retry.tsstatus-broadcaster.tsmetrics-aggregation.tsRemaining Files Over 500 Lines
session.ts: 845 — central orchestrator, further splitting riskyserver.ts: 672 — main() needs bootstrap extraction (follow-up PR)monitor.ts: 607 — can reduce further by extracting checkStopSignalsTesting
Commits
refactor(monitor): extract DeadDetector, RateLimitRetryHandler, StatusBroadcasterrefactor(metrics): extract computeAggregateMetrics to metrics-aggregation.ts