Skip to content

feat: add Anima model support#8961

Open
4pointoh wants to merge 14 commits intoinvoke-ai:mainfrom
kappacommit:anima-feature
Open

feat: add Anima model support#8961
4pointoh wants to merge 14 commits intoinvoke-ai:mainfrom
kappacommit:anima-feature

Conversation

@4pointoh
Copy link

@4pointoh 4pointoh commented Mar 12, 2026

Summary

Add Anima Model Support

Adds full support for the Anima model architecture — a 2B parameter anime-focused text-to-image model built on
the Cosmos Predict2 DiT backbone with an integrated LLM Adapter.

https://huggingface.co/circlestone-labs/Anima

Backend

  • Model Manager — New BaseModelType.Anima with model config, detection, and auto-classification for
    single-file checkpoints. VAE config for the QwenImage VAE (fine-tuned Wan 2.1).

  • Model Loader — Loads the AnimaTransformer (Cosmos DiT + LLM Adapter) from safetensors,

  • Text Encoder — Dual-conditioning pipeline: Qwen3 0.6B + T5-XXL tokenizer. Both are fused by the LLM Adapter during denoising.

  • Denoise — Supports txt2img, img2img, and inpainting with a custom AnimaInpaintExtension that accounts for the
    shift-corrected schedule.

  • VAE — Encode/decode using AutoencoderKLWan with latent normalization/denormalization. 16 latent channels, 8x
    spatial compression.

  • LoRA — Full support for Kohya, diffusers PEFT, LoKR, and DoRA+LoKR formats. Handles both transformer and
    Qwen3 text encoder LoRA layers with proper key conversion and prefix mapping.

  • Regional Prompting — Runs the LLM Adapter separately per region with cross-attention masking (alternating
    masked/unmasked blocks).

Frontend

  • Graph Builder — Full buildAnimaGraph supporting all four generation modes (txt2img, img2img, inpaint,
    outpaint) with LoRA and regional guidance integration.

  • Queue Readiness — Validates that required VAE and Qwen3 encoder models are selected before generation.

  • Validators — ControlNet and IP Adapter marked as unsupported for Anima. (Looks like diffusers 0.37 adds controlnet support, but I didnt want to include a diffusers version bump in this PR)

  • Starter Models — Anima Preview 2, Qwen3 0.6B encoder, and QwenImage VAE

Not Included (future work)

  • ControlNet / IP Adapter support
  • Diffusers and GGUF format loaders (no Anima models distributed in these formats yet)

Regional Guidance seems to 90% not work. Anecdotally it seems to give a very slight "push" in the direction of the guidance, but I may be imagining that. I worked with Claude to try and understand why, as well as compare to references in other places like Comfy, and the conclusion we've come to is regional guidance in anima is a novel problem that there is no reference for. So it doesnt error out, but it doesn't really seem to work.

Prompt

ye-pop
For Sale: Others by Arun Prem
Abstract, oil painting of three faceless, blue-skinned figures. Left: white, draped figure; center: yellow-shirted, dark-haired figure; right: red-veiled, dark-haired figure carrying another. Bold, textured colors, minimalist style.
37eb11af-13a3-40a0-b906-ec556367d714

Prompt

year 2025, newest, masterpiece, score_9, an orange tabby cat sleeps inside of a bottle. The bottle is a small portion of the image relative to the landscape. The bottle rests on a wonderful sprawling landscape behind, vibrant green hills and deep blue skies
6e8583ef-ecbe-40cb-a1f6-c15c3a2bbc35

With LoRA: https://civitai.com/models/366289/legend-of-korra-series-style-anima-and-il-and-nai-and-pony

image image image

Related Issues / Discussions

QA Instructions

Gen some images with the new model

Merge Plan

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • ❗Changes to a redux slice have a corresponding migration
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)

@github-actions github-actions bot added api python PRs that change python files Root invocations PRs that change invocations backend PRs that change backend files frontend PRs that change frontend files labels Mar 12, 2026
@joshistoast
Copy link
Contributor

Haven't looked too closely at the code yet, but maybe pink isn't a great color since it's very similar to what errors look like (unrecognized models in particular). Maybe try purple?

@4pointoh
Copy link
Author

Haven't looked too closely at the code yet, but maybe pink isn't a great color since it's very similar to what errors look like (unrecognized models in particular). Maybe try purple?

purpleified it - SD3 actually uses 'purple', so I used 'invokePurple' which is a slightly different color from standard purple. We have orange, yellow, gray, and blue as options left that don't overlap. Could have probably done orange or yellow, but they seem to entail 'warning', and gray is used for the other tags.

Easy change if we wanna change it, made it purple for now.

@4pointoh 4pointoh marked this pull request as ready for review March 12, 2026 09:23
@github-actions github-actions bot added the python-tests PRs that change python tests label Mar 12, 2026
@4pointoh 4pointoh marked this pull request as draft March 12, 2026 21:47
@4pointoh
Copy link
Author

4pointoh commented Mar 12, 2026

EDIT - resolved as of next commit

Can't merge as-is, Claude pulled about 200 lines from Comfy's transformer implementation, which isn't kosher because Comfy is GPL-3. Most of the transformer can be pulled from Nvidia's official cosmos implementation (Apache 2 license) - in fact Comfy itself uses this as well - but there's about 200 lines of code around the LLMAdapter where there is no equivalent Nvidia code to reference (nor, to my knowledge, any sort of public documentation or reference material on how to implement it myself.)

There is an Apache 2 licensed diffusers implementation here: https://github.com/hdae/diffusers-anima?tab=readme-ov-file

I will see if I can get it working with that as a reference. If not, we will need to wait for the official diffusers implementation. (Honestly there's not much harm in waiting, I've found this model to be overall very early stage and probably needs another couple releases before it's really worth playing with.)

If an official diffusers implementation is released, it should be quick plug & play with the code in this PR.

@4pointoh
Copy link
Author

I was able to use the Apache 2 diffusers implementation as a reference. It required a few structural changes, but seems to work just fine. This resolve the previously mentioned issue.

@4pointoh 4pointoh marked this pull request as ready for review March 13, 2026 03:46
@4pointoh
Copy link
Author

Anima requires both the T5-XXL and Qwen encoders. I noticed that the current implementation was quietly downloading the T5-XXL tokenizer during its execution. Which wasn't noticeable at first, because it's only 2mb.

However, a side-effect of this was that Anima would no longer be able to generate when offline if you didn't have the T5 already downloaded/cached via the huggingface_hub library.

I updated the implementation to add T5-XXL as part of the Starter Model pack for Anima, and switched the implementation to load the encoder from there, rather than downloading it internally. This makes it follow how all other models work. The only weird quirk is that Anima requires selecting 2 encoders in the Advanced tab - but that's just how the model is built.

Technically speaking, Anima only requires the T5-XXL tokenizer, not the T5-XXL encoder. The tokenizer is 2mb and the encoder is around 9gb. I did try and add only the tokenizer to the model manager / starter pack, but it revealed a cascading tree of issues within files that I certainly did not want to modify.

Namely:

  • The tokenizer is not a model. The model manager generally only handles models. This broke certain parts of the system.
  • There is no huggingface repo for just the tokenizer. It would require some hackery to directly target the tokenizer file within the T5-XXL encoder HF repo and download only that file. I was not convinced this would be robust.

So, I decided we will just accept downloading the full 9gb encoder model, and we will extract the tokenizer out of it. The only downside to this, is wasting about 9gb of space. The upside, is a far simpler implementation. (Side note: If you download any of the Flux 1 family of models, you already download the full 9gb encoder, so there is no wasted space in that case since the model will be shared.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api backend PRs that change backend files frontend PRs that change frontend files invocations PRs that change invocations python PRs that change python files python-tests PRs that change python tests Root

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants