ci(linux): build fat package with GGML_BACKEND_DL + GGML_CPU_ALL_VARIANTS#5
Open
ci(linux): build fat package with GGML_BACKEND_DL + GGML_CPU_ALL_VARIANTS#5
Conversation
Replace the AVX2/FMA/F16C portable baseline (#3) with a fat-package build that produces one libstable-diffusion.so plus a libggml-cpu-*.so per CPU variant — sandybridge, haswell, skylakex (AVX-512F), icelake (AVX-512 + VNNI), alderlake (AVX-512 + VNNI + DOTPROD), and a pure-x64 fallback. At runtime ggml dlopens the variants and picks the highest-tier one the host CPU supports. AVX-512 hosts get AVX-512 perf; older boxes fall back gracefully — no -march=native runner lottery, no SIGILL. Tradeoff: zip grows from ~12 MB → ~50–80 MB. Acceptable for a one-time download, especially since downstream consumers (Lemonade) cache the extracted directory across model loads. Applied to ubuntu-latest-cmake (CPU) and ubuntu-latest-rocm (HIP), since the HIPBLAS build still uses ggml CPU ops for parts of the pipeline. Windows AVX2 already pins GGML_NATIVE=OFF + AVX2 only, and macOS arm64 shares a uniform NEON+DOTPROD+i8mm+bf16 baseline across all Apple Silicon generations, so neither needs the same treatment. Upstream PR leejet#1448 (commit b8079e2) wired the runtime backend discovery code into libstable-diffusion.so already; this just enables the build flag that produces the variant .so files.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Switch the Linux x86_64 `ubuntu-latest-cmake` (cpu) and `ubuntu-latest-rocm` (HIP) builds from the AVX2/FMA/F16C portable baseline (#3) to a fat-package build:
```
-DGGML_NATIVE=OFF
-DGGML_BACKEND_DL=ON
-DGGML_CPU_ALL_VARIANTS=ON
```
The build now produces:
At runtime, `ggml_backend_load_all_from_path` (already wired by upstream PR leejet#1448) dlopens each variant, queries `__builtin_cpu_supports`, and picks the highest-tier match. Same zip works on a 2014 Sandy Bridge laptop and a 2024 Alder Lake server — without the runner-of-the-day AVX-512 lottery that crashed master-593.
Tradeoff
Linux x86_64 zip: ~12 MB → ~50–80 MB. Acceptable IMO — Lemonade and similar consumers cache the extracted dir across model loads, so this is paid once at install. In exchange you stop choosing between portability and AVX-512 perf — you get both.
Why not Windows / macOS
Verification plan
Lineage
Replaces #3's portable AVX2 baseline. #3 fixed the SIGILL but left AVX-512-class hosts running AVX2 code; this gets full perf on those hosts.