Skip to content

Conversation

@lstein
Copy link
Collaborator

@lstein lstein commented Jan 21, 2026

Summary

This PR adds the ability to configure standalone text encoder models to run on CPU exclusively, thereby freeing up VRAM that might otherwise compete with the denoiser and other large models. Users can set a text encoder to run in CPU from the Model Manager, by clicking on a new toggle in the details area shown below:

image

All the text encoders are supported, including CLIPEmbed, T5Encoder, Qwen3Encoder, CLIPVision, SigLIP, and LlavaOnevision. However, Invoke only offers the option of changing the text encoder for some of the more recent main models, chiefly Flux.1, Flux.2 and Z-Image.

In most cases it does not make sense to run the text encoder on CPU, as execution speed suffers greatly (up to 5x slower for Qwen3 encoders). However, for users who have very low VRAM (e.g. 8 GB), this may allow them to run encoder models that would otherwise be inaccessible.

Related Issues / Discussions

Brief discussion on Discord regarding Comfy's use of a similar strategy: https://discord.com/channels/1020123559063990373/1020123559831539744/1462795385469796591

QA Instructions

CPU mode on

  1. Go to the model manager and select one of the standalone encoders, e.g. Z-Image Qwen3 Text Encoder (for Z-Image).
  2. The details panel will show a new setting, "Run text encoder model on CPU only". Turn it on and click "Save".
  3. Go to a generation pane (linear, canvas or workflow) and select a main model that uses this encoder, and then under Advanced select the text encoder you modified.
  4. Run a generation and look at the log messages. You should not see any messages about the text encoder being loaded into the cuda device.
  5. When the generation is finished, examine the performance statistics. The text encoder should have taken an unusually long time to run.

CPU mode off

  1. Repeat the instructions above, but this time turn the CPU toggle off.
  2. You should see log messages about the text encoder loading into cuda.
  3. Execution speed should be fast.

Repeat this with other text encoders and main models.

Merge Plan

Simple merge.

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • ❗Changes to a redux slice have a corresponding migration
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)

Co-authored-by: lstein <111189+lstein@users.noreply.github.com>

Add frontend UI for CPU-only model execution toggle

Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
@github-actions github-actions bot added python PRs that change python files invocations PRs that change invocations backend PRs that change backend files services PRs that change app services frontend PRs that change frontend files labels Jan 21, 2026
@JPPhoto
Copy link
Collaborator

JPPhoto commented Jan 21, 2026

@lstein This is failing a frontend check. Once you resolve that, I'll do a deeper dive.

I think you need import type { FormField } from 'features/modelManagerV2/subpanels/ModelPanel/MainModelDefaultSettings/MainModelDefaultSettings'; at the top of invokeai/frontend/web/src/features/modelManagerV2/subpanels/ModelPanel/EncoderModelSettings/EncoderModelSettings.tsx. Put that right before import { toast } from 'features/toast/toast'; to not get an ordering error.

Copy link
Collaborator

@JPPhoto JPPhoto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After making the changes above, I was able to build and run and it worked as advertised. Approval is pending that fix and successful tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend PRs that change backend files frontend PRs that change frontend files invocations PRs that change invocations python PRs that change python files services PRs that change app services

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants