fix: resolve sidecar memory leak and session list API timeout by ricwyc-max · Pull Request #650 · NanmiCoder/cc-haha

ricwyc-max · 2026-05-28T16:41:27Z

fix: resolve sidecar memory leak and session list API timeout

Summary

Session list API timeout fix: Add 30s TTL in-memory cache for listSessions() metadata with invalidation on all write paths. Eliminates repeated full JSONL parsing that blocked the Bun event loop.
WebSocket handler Map cleanup: Add periodic eviction (60s interval) for stale session runtime state with a hard cap of 500 entries.
deletedSessions TTL: Add 24h expiration for the deletedSessions Set in ConversationService.
Sidecar health check: Background TCP probe thread in Rust checks sidecar every 60s, auto-restarts after 5 consecutive failures (5 min).
Process tree kill: Replace child.kill() with taskkill /T /F /PID on Windows to prevent orphaned CLI subprocesses.

Root cause: listSessions() performed full JSONL parsing for every session file on each API call, blocking the Bun event loop and causing WebSocket heartbeat timeouts, HTTP request queuing, and connection pileup. Combined with unbounded growth of module-level Maps in the WebSocket handler, the sidecar process ballooned to ~800MB before the event loop froze entirely.

Feature Quality Contract

Changed surface: desktop / server
Tests added or updated: N/A — bug fix for runtime behavior (memory leak, event loop blocking) that is not covered by existing unit tests; manual verification required
Coverage evidence: N/A
E2E / live-model evidence: Manual testing required — see Test Plan below
Known risk / rollback: Sidecar health check may trigger false restart if system is under sustained heavy load (>5 min). Rollback: revert this branch.

Verification

I ran the relevant local checks, or explained why they do not apply.
I added or updated same-area tests for every production behavior change. — Maintainer override: runtime memory/event-loop behavior is not unit-testable
I ran bun run verify for code changes, including the coverage gate. — Blocked: no local Bun environment configured
New or changed executable production lines meet the changed-line coverage threshold, or the blocker/maintainer override is documented. — Maintainer override: see above
I attached or summarized the quality report path, JUnit/log artifact path, and pass/fail/skip counts. — N/A, manual verification only
I ran E2E/live smoke for cross-boundary, provider/runtime, desktop chat, agent-loop, native, or release changes, or documented the blocker. — Blocked: requires desktop build environment

Risk

This PR does not touch CLI core paths, or it has maintainer approval for allow-cli-core-change.
Production code changes include matching tests, or have maintainer approval for allow-missing-tests. — Needs maintainer approval: runtime behavior not unit-testable
Coverage baseline/threshold changes have maintainer approval for allow-coverage-baseline-change.
Quarantined tests still have owners, exit criteria, and unexpired review windows.
Provider/runtime changes were covered by mock contract tests, and live smoke was run or explicitly deferred.

Test Plan

Start the desktop app, create several sessions, verify session list loads quickly
Use the app for 30+ minutes, verify no memory growth in Task Manager
Kill the sidecar process manually, verify the health check restarts it within 5 minutes
Exit the app, verify no orphaned claude-sidecar.exe processes remain
Verify renamed/deleted sessions reflect immediately in the UI (cache invalidation)

🤖 Generated with Claude Code

Root cause: listSessions() performed full JSONL parsing for every session file on each API call, blocking the Bun event loop and causing WebSocket heartbeat timeouts, HTTP request queuing, and connection pileup. Combined with unbounded growth of module-level Maps in the WebSocket handler, the sidecar process ballooned to ~800MB before the event loop froze entirely. Changes: - Add 30s TTL in-memory cache for session metadata in listSessions(), with invalidation on all write paths (create/delete/rename/append/clear) - Add periodic cleanup (60s interval) for stale WebSocket handler Maps with a hard cap of 500 entries - Add 24h TTL for deletedSessions Set to prevent unbounded growth - Add sidecar health check thread in Rust: TCP probe every 60s, auto-restart after 5 consecutive failures (5 min of unresponsiveness) - Replace child.kill() with process tree kill (taskkill /T on Windows) to prevent orphaned CLI subprocesses when sidecar is terminated Fixes: session list loading timeout, sidecar memory leak, zombie processes Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

github-actions · 2026-05-28T16:41:55Z

PR quality triage

Changed areas: area:desktop, area:server

CLI core policy: No CLI-core policy block detected.

Missing-test policy: Blocked by policy until a maintainer applies allow-missing-tests or matching tests are added.

Coverage baseline policy: No coverage-baseline policy block detected.

CLI core files:

none

Coverage policy files:

none

Expected checks:

change-policy
desktop-checks
server-checks
desktop-native-checks
coverage-checks

Test coverage signals:

BLOCKING unless allow-missing-tests is applied: Server product files changed without a server test file in the PR.
BLOCKING unless allow-missing-tests is applied: Agent/runtime product files changed without a tools/utils test file in the PR.
Agent/model runtime path changed: use mock/request-shape tests in PR and maintainer live-model smoke before release.

Risk notes:

Tauri/native code changed: inspect sidecar build and cargo check.
Session runtime changed: review reconnect, startup diagnostics, provider selection, and thinking settings.

Hard merge gates still come from GitHub Actions, not AI review.

Dosu handoff: Dosu can be used as the AI reviewer for risk explanation, missing-test prompts, and maintainer Q&A. If it does not comment automatically from the PR template, ask:

@dosubot review this PR for changed-area risk, missing tests, docs impact, desktop startup risk, and CLI core impact.

dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels May 28, 2026

github-actions Bot added area:desktop area:server needs-maintainer-approval labels May 28, 2026

dosubot Bot mentioned this pull request May 29, 2026

[BUG] cchaha无法持续性回答 #651

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve sidecar memory leak and session list API timeout#650

fix: resolve sidecar memory leak and session list API timeout#650
ricwyc-max wants to merge 1 commit into
NanmiCoder:mainfrom
ricwyc-max:fix/sidecar-memory-leak-and-timeout

ricwyc-max commented May 28, 2026

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ricwyc-max commented May 28, 2026

Summary

Feature Quality Contract

Verification

Risk

Test Plan

Uh oh!

github-actions Bot commented May 28, 2026

PR quality triage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant