Skip to content

fix: drop SQLite DDL from DailyStatsActor.PreStart + surface wizard crash logs (closes #925, #936)#938

Merged
Aaronontheweb merged 1 commit intonetclaw-dev:devfrom
Aaronontheweb:fix/925-936-stats-daemon-race
May 8, 2026
Merged

fix: drop SQLite DDL from DailyStatsActor.PreStart + surface wizard crash logs (closes #925, #936)#938
Aaronontheweb merged 1 commit intonetclaw-dev:devfrom
Aaronontheweb:fix/925-936-stats-daemon-race

Conversation

@Aaronontheweb
Copy link
Copy Markdown
Collaborator

Summary

  • DailyStatsActor: move SQLite EnsureTable off PreStart hot path #925: DailyStatsActor.PreStart no longer runs synchronous SQLite DDL. Schema is owned by migration files: existing 004_daily_stats.sql + new 005_daily_skill_usage.sql, applied unconditionally by SchemaMigrator before the actor system starts. On Windows CI cold-start (AV scan + first-time fsync) this DDL regularly pushed the first Ask past its 3s timeout.
  • Bump Netclaw.SkillClient from 0.2.1 to 0.3.0 #936: HealthCheckStepViewModel.StartAndPollDaemonAsync now (a) fast-fails when DaemonManager.GetStatus().IsRunning flips false during the polling loop and (b) reads any structured "Daemon startup aborted: …" line from the crash log on poll-timeout, so the wizard reports the real failure reason instead of a generic "Daemon did not become ready" message after 30s of waiting.
  • Drive-by: HealthCheckStepViewModel now takes TimeProvider (constitution compliance), and the "not ready" message is a single named constant.

Why the test was flaking

Both flakes were Windows-CI-specific:

  • DailyStatsActor: the Ask raced the AV-scanned cold-start fsync of the SQLite db file. Issue body has the agent analysis.
  • HealthCheckStep: when WaitForExit(1500) missed a fast-failing fake daemon (cmd.exe startup overhead exceeds 1.5s on a contested Windows runner), Start() returned success and the polling loop spun for 30s before reporting a generic message that hid the crash reason.

The daemon-liveness check is a real production improvement, not just a test-speed tweak — if a user's daemon dies during startup, the wizard now reports it within ~1s instead of waiting 30s.

Test plan

  • dotnet test src/Netclaw.Daemon.Tests/Netclaw.Daemon.Tests.csproj — 504/504 pass
  • dotnet test src/Netclaw.Cli.Tests/Netclaw.Cli.Tests.csproj — 621/621 pass
  • Target tests run fast on Linux (HealthCheckStepViewModelTests.RunWithOrchestrator_PreservesSpecificStartupFailureMessage ~92ms; DailyStatsActorTests.QuerySkillUsageStats… ~490ms)
  • dotnet slopwatch analyze — 0 issues
  • pwsh ./scripts/Add-FileHeaders.ps1 -Verify — clean
  • CI green on Windows (the failure mode this PR targets)

…gs on wizard poll-timeout (closes netclaw-dev#925, netclaw-dev#936)

DailyStatsActor was running CREATE TABLE IF NOT EXISTS in PreStart on a
fresh on-disk SQLite db. On Windows CI cold-start (AV scan + first-time
fsync) this regularly pushed the first Ask past its 3s timeout. The
schema is now owned by the migration files (existing 004_daily_stats.sql
plus new 005_daily_skill_usage.sql), applied unconditionally by
SchemaMigrator before the actor system starts. Test mirrors production
by invoking the migrator before ActorOf.

HealthCheckStepViewModel's daemon-startup polling loop also flaked on
Windows: when WaitForExit(1500) missed a fast-failing fake daemon,
DaemonManager.Start returned success and the loop spun for the full 30s
before reporting a generic "Daemon did not become ready" message,
hiding the real crash reason. Two fixes:

- Fast-fail when DaemonManager.GetStatus().IsRunning flips to false
  during polling (real production improvement: dead daemons no longer
  cost 30s of wall time before reporting).
- After poll exhaustion, read the structured crash-log line via
  DaemonManager.TryReadStartupFailureFromCrashLog and surface it on
  the wizard, matching the immediate-fail path's UX.

Also: HealthCheckStepViewModel now takes TimeProvider (constitution
compliance), and the "not ready" message is a single constant.
@Aaronontheweb Aaronontheweb merged commit dfb4597 into netclaw-dev:dev May 8, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant