Problem
The model currently loads on the first user message instead of at app startup.
This introduces a noticeable delay on first interaction, based on recent local runs:
- ~12.9s from
loading model to generating
- ~16.5s from
loading model to generating
- ~12.2s from
loading model to generating
This impacts first-use UX and makes the app feel unresponsive at the start of a session.
Current behavior
- app starts quickly
- first user request triggers model load
- first response has high latency due to model load + subsequent prefill/generation
Desired behavior
Evaluate whether model initialization should be:
- eagerly loaded at app start
- optionally preloaded
- lazily loaded with clearer user feedback
Constraints
- must not interfere with runtime ownership or tool flow
- must not complicate CLI responsiveness unnecessarily
- must remain compatible with local models (llama.cpp)
Why this is deferred
This is a performance / UX improvement, not a correctness bug.
It is intentionally out of scope for Phase 9 investigation quality work and should be addressed separately.
Evidence
Recent logs show:
loading model at +876ms → generating at +13784ms
loading model at +24576ms → generating at +41069ms
loading model at +1968ms → generating at +14161ms
Problem
The model currently loads on the first user message instead of at app startup.
This introduces a noticeable delay on first interaction, based on recent local runs:
loading modeltogeneratingloading modeltogeneratingloading modeltogeneratingThis impacts first-use UX and makes the app feel unresponsive at the start of a session.
Current behavior
Desired behavior
Evaluate whether model initialization should be:
Constraints
Why this is deferred
This is a performance / UX improvement, not a correctness bug.
It is intentionally out of scope for Phase 9 investigation quality work and should be addressed separately.
Evidence
Recent logs show:
loading modelat+876ms→generatingat+13784msloading modelat+24576ms→generatingat+41069msloading modelat+1968ms→generatingat+14161ms