Lazy vs eager model loading strategy at app startup

## Problem

The model currently loads on the first user message instead of at app startup.

This introduces a noticeable delay on first interaction, based on recent local runs:
- ~12.9s from `loading model` to `generating`
- ~16.5s from `loading model` to `generating`
- ~12.2s from `loading model` to `generating`

This impacts first-use UX and makes the app feel unresponsive at the start of a session.

## Current behavior

- app starts quickly
- first user request triggers model load
- first response has high latency due to model load + subsequent prefill/generation

## Desired behavior

Evaluate whether model initialization should be:
- eagerly loaded at app start
- optionally preloaded
- lazily loaded with clearer user feedback

## Constraints

- must not interfere with runtime ownership or tool flow
- must not complicate CLI responsiveness unnecessarily
- must remain compatible with local models (llama.cpp)

## Why this is deferred

This is a performance / UX improvement, not a correctness bug.
It is intentionally out of scope for Phase 9 investigation quality work and should be addressed separately.

## Evidence

Recent logs show:
- `loading model` at `+876ms` → `generating` at `+13784ms`
- `loading model` at `+24576ms` → `generating` at `+41069ms`
- `loading model` at `+1968ms` → `generating` at `+14161ms`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazy vs eager model loading strategy at app startup #1

Problem

Current behavior

Desired behavior

Constraints

Why this is deferred

Evidence

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Lazy vs eager model loading strategy at app startup #1

Description

Problem

Current behavior

Desired behavior

Constraints

Why this is deferred

Evidence

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions