[Bug] Bad performance on Windows: reallocating buffers automatically every step

### Git commit

3d6064b37ef4607917f8acf2ca8c8906d5087413

### Operating System & Version

Windows 11 25H2

### GGML backends

CUDA

### Command-line arguments used

sd-cli.exe --diffusion-model "D:\AI\anima\split_files\diffusion_models\anima-preview3-base.safetensors" --vae "D:\AI\anima\split_files\vae\qwen_image_vae.safetensors" --llm "D:\AI\anima\split_files\text_encoders\qwen_3_06b_base.safetensors" -p "a lovely cat holding a sign says 'anima.cpp'" --cfg-scale 4.5 --fa -H 1024 -W 1024 --steps 20 --sampling-method euler_a --scheduler sgm_uniform -v

### Steps to reproduce

Just clone and build with `-DSD_CUDA=ON`

### What you expected to happen

Inference with high GPU usage.
I have a Linux on the same machine. I built sd.cpp on Linux and use the same command.
On Linux my GPU usage stays more than 80% until inference finished, get 4s/it default or 1s/it with `--type f16`

### What actually happened

On Windows, every steps triggers `graph has different number of nodes` and `reallocating buffers automatically`.
So my GPU works with 1 second 90% and 3 seconds idle, waiting for reallocation.
The first step is 2s/it, and then 6s/it, and then more than 10s/it.

### Logs / error messages / stack trace

```
[DEBUG] ggml_extend.hpp:1883 - anima compute buffer size: 206.05 MB(VRAM)

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_reserve_n_impl: reallocating CUDA0 buffer from size 206.05 MiB to 206.06 MiB

  |==>                                               | 1/20 - 2.07s/it[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

  |=====>                                            | 2/20 - 6.88s/it[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

  |=======>                                          | 3/20 - 8.73s/it[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

  |==========>                                       | 4/20 - 9.64s/it[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

  |============>                                     | 5/20 - 10.16s/it[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

  |===============>                                  | 6/20 - 10.49s/it[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

  |=================>                                | 7/20 - 10.72s/it[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

  |====================>                             | 8/20 - 11.01s/it[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

  |======================>                           | 9/20 - 11.15s/it[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

  |=========================>                        | 10/20 - 11.21s/it[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

  |===========================>                      | 11/20 - 11.30s/it[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_needs_realloc: graph has different number of nodes

[DEBUG] ggml_extend.hpp:58   - ggml_gallocr_alloc_graph: reallocating buffers automatically
^C
```

### Additional context / environment details

CPU models: Intel Ivy Bridge which is lack of AVX2 (the pre-built binary will crash)
GPU: NVIDIA RTX 2080Ti with 22GB VRAM
The behavior is similar whether no quantization or f16 or q8_0.
With or without `--fa` also no impact.
Compile with CUDA 13.2, VS Build tools 2026, cmake 4.3.2 and ninja 1.13.2.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Bad performance on Windows: reallocating buffers automatically every step #1473

Git commit

Operating System & Version

GGML backends

Command-line arguments used

Steps to reproduce

What you expected to happen

What actually happened

Logs / error messages / stack trace

Additional context / environment details

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug] Bad performance on Windows: reallocating buffers automatically every step #1473

Description

Git commit

Operating System & Version

GGML backends

Command-line arguments used

Steps to reproduce

What you expected to happen

What actually happened

Logs / error messages / stack trace

Additional context / environment details

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions