feat(ltx2): LTX-2.3 video generation — conversion script, ltx2.h C API, CPU VAE fix by 64johnlee · Pull Request #7 · tetherto/qvac-ext-stable-diffusion.cpp

64johnlee · 2026-05-30T09:02:25Z

Summary

Adds LTX-2.3 (14B DiT, Gemma 3 text encoder, spatiotemporal Video-VAE) video generation support to the fork, plus M1 deliverables for the Tether LTX-2 bounty.

Changes

Sync upstream — merges 133 commits from `leejet/stable-diffusion.cpp` master, including all LTX-2.3 work (transformer, VAE, temporal upscaler, FLF2V, TAE support, Vulkan/Metal backends).

`script/convert_ltx2.py` — Python conversion script (safetensors → GGUF):

Quantisation levels: `f16`, `q4_0`, `q5_1`, `q8_0`
Selective F16 preservation for norms, biases, embeddings
Optionally bundles VAE into the same GGUF
Validates tensor buffer size vs shape on load (raises ValueError on mismatch)

`include/ltx2.h` — focused public C API for LTX-2 consumers:

`ltx2_new_ctx()` — create context from individual model paths
`ltx2_generate_t2v()` — text-to-video in <10 lines of C
`ltx2_generate_i2v()` — image-to-video in <10 lines of C
Thin façade over `stable-diffusion.h`; zero overhead

Bug fix: CPU VAE im2col assertion crash (upstream leejet#1577)

`ggml_ext_conv_3d`: when weight type is not F16/F32 (i.e. quantised), fall back to the explicit `im2col + mul_mat` path instead of `ggml_conv_3d`
Fixes `GGML_ASSERT(src0->type == GGML_TYPE_F16) failed` in `ggml-cpu/ops.cpp:6260` when running LTX-2.3 VAE on CPU with quantised weights
Scope note: `ggml_ext_conv_3d` is a shared helper used by all model paths (SD, Flux, Wan, etc.), not just LTX-2. The fallback is unconditionally safe for any quantised weight type — non-F16/F32 weights were already broken on this path before this change.

Test plan

`cmake -B build && cmake --build build -j$(nproc)` — clean compile on Linux x86-64
`python script/convert_ltx2.py --model ltx-2.3.safetensors --output ltx-2.3_Q8_0.gguf --type q8_0`
`sd-cli -M vid_gen --diffusion-model ltx-2.3_Q8_0.gguf ... -p "a lovely cat"` — T2V on CPU
`sd-cli -M vid_gen ... -i init.png` — I2V on CPU (verifies [Bug] LTX2.3 VAE does not work on CPU leejet/stable-diffusion.cpp#1577 fix)

Bounty spec: https://tether.dev/grants/bounties/2885262943
Upstream LTX-2.3 issues: [Bug] LTX2.3 VAE does not work on CPU leejet/stable-diffusion.cpp#1577, [Bug] LTX 2.3 dev produces motion distortion (works fine in comfyui). Distill is fine. leejet/stable-diffusion.cpp#1579
M2 (Bare addon): https://github.com/64johnlee/bare-ltx2

🤖 Generated with Claude Code

…sage (leejet#1349)

* feat: add support for the eta parameter to ancestral samplers * feat: Euler Ancestral sampler implementation for flow models * refine flow ancestral sampling and normalize eta defaults --------- Co-authored-by: leejet <leejet714@gmail.com>

Co-authored-by: leejet <leejet714@gmail.com>

* Temporal tile size + overlap * add --extra-tiling-args support --------- Co-authored-by: leejet <leejet714@gmail.com>

Co-authored-by: leejet <leejet714@gmail.com>

…eejet#1564) Co-authored-by: Serge F. Chirik <s.chirik@timbel.info>

…ol crash - script/convert_ltx2.py: safetensors → GGUF at Q4_0/Q5_1/Q8_0/F16 with selective F16 preservation for norms, biases, and embeddings - include/ltx2.h: focused public C API for LTX-2 T2V and I2V inference, wrapping stable-diffusion.h with ltx2_new_ctx / ltx2_generate_t2v / ltx2_generate_i2v helpers - fix(ggml_ext_conv_3d): fall back to explicit im2col+mul_mat when weight type is not F16/F32, fixing assertion crash in ggml_compute_forward_im2col_f16 on CPU with quantized VAE weights (upstream issue leejet#1577) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The previous element-wise Python loop was O(n) in pure Python — too slow for 14B-parameter tensors. Replace with a numpy byte-copy: write the two BF16 bytes into positions [2] and [3] of each uint32 word (BF16 is float32 with the low 16 bits zeroed), then reinterpret as float32. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Three jobs on every push to ltx2-video-generation and on PRs to master: - build-linux: cmake + Ninja on ubuntu-22.04, asserts vid_gen / embeddings-connectors / diffusion-fa flags present in sd-cli --help - convert-script: syntax check + --help + two synthetic GGUF round-trips (F32→Q8_0 and BF16→F16 via KEEP_F16_PATTERNS) - build-macos-arm64: cmake + Metal on macos-14 (ARM64), uploads sd-cli artifact for 7 days Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…f16_to_fp32 safe_open(framework="numpy") doesn't support BF16 tensors because numpy has no bfloat16 dtype. Replace with a hand-rolled parser (_iter_safetensors) that reads the safetensors binary format directly (8-byte LE header size + JSON metadata + raw tensor bytes), eliminating the torch/safetensors dep. Also fix bf16_to_fp32: calling .view(uint8) on a multi-dimensional array gives a multi-dim byte array whose [0::2] slice has the wrong shape. Flatten to 1D first with .ravel() so the byte interleaving works correctly. CI: drop safetensors from pip install since it is no longer imported. Both round-trips (F32→Q8_0 and BF16→F16) verified locally. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

GitHub is forcing Node 24 as default on June 16; set FORCE_JAVASCRIPT_ACTIONS_TO_NODE24 at workflow level to adopt it now. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

sd-cli appends .avi to the -o path unconditionally; update the results ls check to match the actual filenames produced. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds **/*.sh plus explicit test_m2.sh, test_*.sh, and .github/test_*.sh to the on.push paths filter so test scripts (like the recently-added test_m2.sh that didn't trigger CI on commit 259b7ad) participate in the CI gating cycle. The wildcard alone would suffice; the explicit entries are kept as documentation of which scripts we specifically care about.

Silently mismatched data_offsets produced wrong tensor data without error. Now raises ValueError with tensor name, expected bytes, shape, dtype, and actual bytes for fast diagnosis. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

- ltx2_ctx_params_set_defaults: remove schedule/sample_method/cfg_scale which do not exist on sd_ctx_params_t (they live on sd_sample_params_t) - Add ltx2_vid_params_set_defaults() to set LTX-2 sample defaults on sd_vid_gen_params_t.sample_params where they actually belong - Call ltx2_vid_params_set_defaults() in both generate_t2v and generate_i2v - Fix typo: embeddings_connector_path -> embeddings_connectors_path Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

rmatif and others added 30 commits March 10, 2026 00:35

feat: add spectrum caching method (leejet#1322)

dea4980

refactor: remove ununsed encode_video (leejet#1332)

d6dd6d7

docs: add Anima2 gguf download link to anima.md (leejet#1335)

6fa7ca9

feat: add generic DiT support to spectrum cache (leejet#1336)

adfef62

chore: remove SD_FAST_SOFTMAX build flag (leejet#1338)

f6968bc

refactor: move all cache parameter defaults to the library (leejet#1327)

630ee03

ci: add CUDA Dockerfile (leejet#1314)

83eabd7

refactor: optimize the VAE architecture (leejet#1345)

acc3bf1

ci: avoid cuda docker build timeout by using -j16

61d8331

feat: add embedded WebUI (leejet#1207)

862a658

fix: correct encoder channels for flux2 (leejet#1346)

997bb11

style: remove redundant struct qualifiers for consistent C/C++ type u…

84cbd88

…sage (leejet#1349)

perf(z-image): switch to fused SwiGLU kernel (leejet#1302)

5265a5e

refactor: simplify sample cache flow (leejet#1350)

545fac4

docs: update Spectrum info about DiT models (leejet#1360)

6293ab5

refactor: simplify f8_e5m2_to_f16 function a little bit (leejet#1358)

ed88e21

refactor: migrate generation pipeline to sd::Tensor (leejet#1373)

f16a110

sync: update ggml

8f2967c

refactor: move VAE tiling parameters to SDGenerationParams (leejet#1261)

02dd5e5

fix: disable extra T5 mask padding for Wan (leejet#1375)

8d87887

refactor(server): split server endpoint registration (leejet#1376)

83e8f6f

refactor: split and simplify sample_k_diffusion samplers (leejet#1377)

1d6cb0f

chore(server): link winsock2 for non-MSVC windows (leejet#1378)

4d52320

feat(server): add generation metadata to png images (leejet#1217)

4fe7a35

feat: show tensor loading progress in MB/s or GB/s (leejet#1380)

bf02167

fix: use resolved image size in embedded metadata (leejet#1382)

6dfe945

feat(cli): add metadata inspection mode (leejet#1381)

09b12d5

feat: add webp support (leejet#1384)

87ecb95

chore: make libwebp optional and support system libwebp (leejet#1387)

687a81f

Co-authored-by: leejet <leejet714@gmail.com>

leejet and others added 30 commits May 20, 2026 22:27

feat: add LTX spatial latent upscale hires support (leejet#1533)

b3374e6

feat: add graph cut markers for LTXAV transformer (leejet#1534)

ef92a00

feat: add taeltx2_3_wide support (leejet#1535)

47d8198

perf: run LTX audio VAE decode in one ggml graph (leejet#1538)

2e35146

Feat: Temporal tile custom size with overlap (leejet#1510)

adaa599

* Temporal tile size + overlap * add --extra-tiling-args support --------- Co-authored-by: leejet <leejet714@gmail.com>

feat: stream LTX VAE temporal tile decoding (leejet#1539)

449165c

refactor: unify extra argument parsing (leejet#1540)

3a8788c

fix: load TAESD preview-only model correctly (leejet#1547)

8cf55a3

fix: strip trailing latent channels for preview decode (leejet#1548)

cbf9219

feat: add LTX rational latent upscaler (leejet#1549)

645e6e9

feat: add LTX temporal latent upscaler support (leejet#1551)

0baf721

fix: make macOS binaries use relocatable rpaths (leejet#1552)

72e512a

feat: add Longcat-Image / Longcat-Image-Edit support (leejet#1053)

a397e03

Co-authored-by: leejet <leejet714@gmail.com>

fix: use flux flow prediction for LTXAV (leejet#1561)

202c615

fix: package ROCm BLAS runtime in Windows artifacts (leejet#1562)

1ceb5bd

fix: skip permission denied errors in recursive_directory_iterator (l…

07b2b18

…eejet#1564) Co-authored-by: Serge F. Chirik <s.chirik@timbel.info>

feat: add microsoft lens support (leejet#1560)

92dc726

fix: preserve frontend tooling in ROCm CI build (leejet#1568)

8eded49

refactor: simplify diffusion model runner params (leejet#1569)

55c2aed

fix: resolve LLM norm tensor names by architecture (leejet#1570)

29ab511

fix: correct tae for models that use the flux2 vae (leejet#1571)

0e4ee04

ci: opt into Node.js 24 for actions to silence deprecation warning

69c118d

GitHub is forcing Node 24 as default on June 16; set FORCE_JAVASCRIPT_ACTIONS_TO_NODE24 at workflow level to adopt it now. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test: fix M2 script to expect .avi suffix from sd-cli output

259b7ad

sd-cli appends .avi to the -o path unconditionally; update the results ls check to match the actual filenames produced. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ltx2): LTX-2.3 video generation — conversion script, ltx2.h C API, CPU VAE fix#7

feat(ltx2): LTX-2.3 video generation — conversion script, ltx2.h C API, CPU VAE fix#7
64johnlee wants to merge 142 commits into
tetherto:masterfrom
64johnlee:ltx2-video-generation

64johnlee commented May 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

64johnlee commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Related

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

64johnlee commented May 30, 2026 •

edited

Loading