Skip to content

Lazy vs eager model loading strategy at app startup #1

@brendanddev

Description

@brendanddev

Problem

The model currently loads on the first user message instead of at app startup.

This introduces a noticeable delay on first interaction, based on recent local runs:

  • ~12.9s from loading model to generating
  • ~16.5s from loading model to generating
  • ~12.2s from loading model to generating

This impacts first-use UX and makes the app feel unresponsive at the start of a session.

Current behavior

  • app starts quickly
  • first user request triggers model load
  • first response has high latency due to model load + subsequent prefill/generation

Desired behavior

Evaluate whether model initialization should be:

  • eagerly loaded at app start
  • optionally preloaded
  • lazily loaded with clearer user feedback

Constraints

  • must not interfere with runtime ownership or tool flow
  • must not complicate CLI responsiveness unnecessarily
  • must remain compatible with local models (llama.cpp)

Why this is deferred

This is a performance / UX improvement, not a correctness bug.
It is intentionally out of scope for Phase 9 investigation quality work and should be addressed separately.

Evidence

Recent logs show:

  • loading model at +876msgenerating at +13784ms
  • loading model at +24576msgenerating at +41069ms
  • loading model at +1968msgenerating at +14161ms

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions