Skip to content

feat(ltx2): LTX-2.3 video generation — conversion script, ltx2.h C API, CPU VAE fix#7

Open
64johnlee wants to merge 142 commits into
tetherto:masterfrom
64johnlee:ltx2-video-generation
Open

feat(ltx2): LTX-2.3 video generation — conversion script, ltx2.h C API, CPU VAE fix#7
64johnlee wants to merge 142 commits into
tetherto:masterfrom
64johnlee:ltx2-video-generation

Conversation

@64johnlee
Copy link
Copy Markdown

@64johnlee 64johnlee commented May 30, 2026

Summary

Adds LTX-2.3 (14B DiT, Gemma 3 text encoder, spatiotemporal Video-VAE) video generation support to the fork, plus M1 deliverables for the Tether LTX-2 bounty.

Changes

Sync upstream — merges 133 commits from `leejet/stable-diffusion.cpp` master, including all LTX-2.3 work (transformer, VAE, temporal upscaler, FLF2V, TAE support, Vulkan/Metal backends).

`script/convert_ltx2.py` — Python conversion script (safetensors → GGUF):

  • Quantisation levels: `f16`, `q4_0`, `q5_1`, `q8_0`
  • Selective F16 preservation for norms, biases, embeddings
  • Optionally bundles VAE into the same GGUF
  • Validates tensor buffer size vs shape on load (raises ValueError on mismatch)

`include/ltx2.h` — focused public C API for LTX-2 consumers:

  • `ltx2_new_ctx()` — create context from individual model paths
  • `ltx2_generate_t2v()` — text-to-video in <10 lines of C
  • `ltx2_generate_i2v()` — image-to-video in <10 lines of C
  • Thin façade over `stable-diffusion.h`; zero overhead

Bug fix: CPU VAE im2col assertion crash (upstream leejet#1577)

  • `ggml_ext_conv_3d`: when weight type is not F16/F32 (i.e. quantised), fall back to the explicit `im2col + mul_mat` path instead of `ggml_conv_3d`
  • Fixes `GGML_ASSERT(src0->type == GGML_TYPE_F16) failed` in `ggml-cpu/ops.cpp:6260` when running LTX-2.3 VAE on CPU with quantised weights
  • Scope note: `ggml_ext_conv_3d` is a shared helper used by all model paths (SD, Flux, Wan, etc.), not just LTX-2. The fallback is unconditionally safe for any quantised weight type — non-F16/F32 weights were already broken on this path before this change.

Test plan

  • `cmake -B build && cmake --build build -j$(nproc)` — clean compile on Linux x86-64
  • `python script/convert_ltx2.py --model ltx-2.3.safetensors --output ltx-2.3_Q8_0.gguf --type q8_0`
  • `sd-cli -M vid_gen --diffusion-model ltx-2.3_Q8_0.gguf ... -p "a lovely cat"` — T2V on CPU
  • `sd-cli -M vid_gen ... -i init.png` — I2V on CPU (verifies [Bug] LTX2.3 VAE does not work on CPU leejet/stable-diffusion.cpp#1577 fix)

Related

🤖 Generated with Claude Code

rmatif and others added 30 commits March 10, 2026 00:35
* feat: add support for the eta parameter to ancestral samplers

* feat: Euler Ancestral sampler implementation for flow models

* refine flow ancestral sampling and normalize eta defaults

---------

Co-authored-by: leejet <leejet714@gmail.com>
leejet and others added 30 commits May 20, 2026 22:27
* Temporal tile size + overlap

* add --extra-tiling-args support

---------

Co-authored-by: leejet <leejet714@gmail.com>
Co-authored-by: leejet <leejet714@gmail.com>
…eejet#1564)

Co-authored-by: Serge F. Chirik <s.chirik@timbel.info>
…ol crash

- script/convert_ltx2.py: safetensors → GGUF at Q4_0/Q5_1/Q8_0/F16 with
  selective F16 preservation for norms, biases, and embeddings
- include/ltx2.h: focused public C API for LTX-2 T2V and I2V inference,
  wrapping stable-diffusion.h with ltx2_new_ctx / ltx2_generate_t2v /
  ltx2_generate_i2v helpers
- fix(ggml_ext_conv_3d): fall back to explicit im2col+mul_mat when weight
  type is not F16/F32, fixing assertion crash in ggml_compute_forward_im2col_f16
  on CPU with quantized VAE weights (upstream issue leejet#1577)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous element-wise Python loop was O(n) in pure Python — too slow
for 14B-parameter tensors. Replace with a numpy byte-copy: write the two
BF16 bytes into positions [2] and [3] of each uint32 word (BF16 is float32
with the low 16 bits zeroed), then reinterpret as float32.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three jobs on every push to ltx2-video-generation and on PRs to master:
- build-linux: cmake + Ninja on ubuntu-22.04, asserts vid_gen /
  embeddings-connectors / diffusion-fa flags present in sd-cli --help
- convert-script: syntax check + --help + two synthetic GGUF round-trips
  (F32→Q8_0 and BF16→F16 via KEEP_F16_PATTERNS)
- build-macos-arm64: cmake + Metal on macos-14 (ARM64), uploads sd-cli
  artifact for 7 days

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…f16_to_fp32

safe_open(framework="numpy") doesn't support BF16 tensors because numpy
has no bfloat16 dtype. Replace with a hand-rolled parser (_iter_safetensors)
that reads the safetensors binary format directly (8-byte LE header size +
JSON metadata + raw tensor bytes), eliminating the torch/safetensors dep.

Also fix bf16_to_fp32: calling .view(uint8) on a multi-dimensional array
gives a multi-dim byte array whose [0::2] slice has the wrong shape. Flatten
to 1D first with .ravel() so the byte interleaving works correctly.

CI: drop safetensors from pip install since it is no longer imported.
Both round-trips (F32→Q8_0 and BF16→F16) verified locally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GitHub is forcing Node 24 as default on June 16; set
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24 at workflow level to adopt it now.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
sd-cli appends .avi to the -o path unconditionally; update the
results ls check to match the actual filenames produced.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds **/*.sh plus explicit test_m2.sh, test_*.sh, and .github/test_*.sh
to the on.push paths filter so test scripts (like the recently-added
test_m2.sh that didn't trigger CI on commit 259b7ad) participate in
the CI gating cycle. The wildcard alone would suffice; the explicit
entries are kept as documentation of which scripts we specifically
care about.
Silently mismatched data_offsets produced wrong tensor data without
error. Now raises ValueError with tensor name, expected bytes, shape,
dtype, and actual bytes for fast diagnosis.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- ltx2_ctx_params_set_defaults: remove schedule/sample_method/cfg_scale
  which do not exist on sd_ctx_params_t (they live on sd_sample_params_t)
- Add ltx2_vid_params_set_defaults() to set LTX-2 sample defaults on
  sd_vid_gen_params_t.sample_params where they actually belong
- Call ltx2_vid_params_set_defaults() in both generate_t2v and generate_i2v
- Fix typo: embeddings_connector_path -> embeddings_connectors_path

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.