ci: build and publish the parakeet-cli container image to ghcr by mudler · Pull Request #7 · mudler/parakeet.cpp

mudler · 2026-06-02T20:56:18Z

Add a multi-stage Dockerfile and a docker workflow that builds the parakeet-cli image and pushes it to ghcr.io//parakeet.cpp-cli.

Dockerfile: fat build stage compiles parakeet-cli plus the ggml backends, a slim runtime stage carries only the binary and the ggml .so files. One Dockerfile, CPU and CUDA variants selected via BUILD_BASE / RUNTIME_BASE / CMAKE_EXTRA_ARGS build args. GGML_NATIVE=OFF so the image is portable across x86-64 hosts. The ggml submodule is re-inited as a throwaway git repo in the build stage so the CMake-driven patch step (git apply) works regardless of how the submodule arrived in the context.
docker.yml: matrix over cpu/cuda, builds on every push/PR (build-only gate on PRs), pushes to ghcr on master + tags + dispatch. Tags via metadata-action: latest / sha / vX.Y.Z, with a -cuda suffix for the CUDA variant. Uses GITHUB_TOKEN, gha build cache.
.dockerignore keeps the context small (excludes .git, build dirs, models, benchmark media) while keeping the ggml source.
README: Docker section with CPU and CUDA run examples.

Verified the CPU image end to end: builds at 127 MB, parakeet-cli runs, and transcribing tests/fixtures/speech.wav with a mounted q5_k 110m model yields the exact NeMo reference transcript.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Add a multi-stage Dockerfile and a docker workflow that builds the parakeet-cli image and pushes it to ghcr.io/<owner>/parakeet.cpp-cli. - Dockerfile: fat build stage compiles parakeet-cli plus the ggml backends, a slim runtime stage carries only the binary and the ggml .so files. One Dockerfile, CPU and CUDA variants selected via BUILD_BASE / RUNTIME_BASE / CMAKE_EXTRA_ARGS build args. GGML_NATIVE=OFF so the image is portable across x86-64 hosts. The ggml submodule is re-inited as a throwaway git repo in the build stage so the CMake-driven patch step (git apply) works regardless of how the submodule arrived in the context. - docker.yml: matrix over cpu/cuda, builds on every push/PR (build-only gate on PRs), pushes to ghcr on master + tags + dispatch. Tags via metadata-action: latest / sha / vX.Y.Z, with a -cuda suffix for the CUDA variant. Uses GITHUB_TOKEN, gha build cache. - .dockerignore keeps the context small (excludes .git, build dirs, models, benchmark media) while keeping the ggml source. - README: Docker section with CPU and CUDA run examples. Verified the CPU image end to end: builds at 127 MB, parakeet-cli runs, and transcribing tests/fixtures/speech.wav with a mounted q5_k 110m model yields the exact NeMo reference transcript. Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Publish each variant (cpu, cuda) as a multi-arch manifest covering linux/amd64 and linux/arm64. The arm64 CUDA image runs natively on Grace / GB10-class hosts. Every arch is built natively, no QEMU: amd64 on ubuntu-24.04, arm64 on the ubuntu-24.04-arm hosted runner (free for public repos). Emulated nvcc builds would be far too slow. The per-arch images are pushed by digest and a merge job stitches them into one manifest per variant, tagged via metadata-action. Verified the arm64 CPU image builds and runs (aarch64) under emulation locally, and confirmed the ubuntu and nvidia/cuda base images all ship arm64. Assisted-by: Claude:claude-opus-4-8 [Claude Code]

…ark) CUDA 12.6 tops out at sm_90, so the CUDA images would not run on GB10 / Grace-Blackwell. The vendored ggml's CUDA CMake adds 120a-real at CUDA >= 12.8 and 121a-real (GB10 / DGX Spark / Thor) at CUDA >= 12.9, all under our GGML_NATIVE=OFF default. Bumping both arches to nvidia/cuda:13.0.1 therefore compiles Turing through Blackwell with no manual arch list: amd64 picks up Hopper / Ada / RTX 50, arm64 picks up GH200 (sm_90 PTX) and GB10 (sm_121). Assisted-by: Claude:claude-opus-4-8 [Claude Code]

…nly PR gate The CUDA builds failed at link: libggml-cuda.so had undefined references to the CUDA driver API (cuMemCreate, cuMemMap, cuDeviceGet, ...). Those come from ggml's VMM memory pool, which links libcuda -- a lib a GPU-less build container does not have. Build with -DGGML_CUDA_NO_VMM=ON: every cuMem* call is under #if defined(GGML_USE_VMM), which this flag disables, so the symbols and the libcuda link dependency both go away. Verified locally: the amd64 CUDA image now links clean, ships libggml-cuda.so, and resolves libcudart / libcublas from the CUDA 13 runtime base. Also cut build time, which had blown out to 43 min on the arm64 CUDA job: - arm64 CUDA targets only Grace GPUs now (CUDA_ARCHS=90;121-real -> GH200 + GB10/Spark) instead of ggml's full 7-arch list. Added a dedicated quoted CUDA_ARCHS build-arg so the ';' list separator survives the shell (the unquoted CMAKE_EXTRA_ARGS would split it as a command separator). - pull_request now builds the CPU variant only (fast Dockerfile gate) via a dynamic matrix from a setup job. CUDA builds only on push / tag / dispatch, which also publish. Use workflow_dispatch to exercise CUDA before merging. Assisted-by: Claude:claude-opus-4-8 [Claude Code]

mudler added 3 commits June 2, 2026 20:46

mudler force-pushed the add-docker-ghcr branch from d9c6e65 to e56ccd9 Compare June 2, 2026 21:10

mudler merged commit b11fe5b into master Jun 2, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: build and publish the parakeet-cli container image to ghcr#7

ci: build and publish the parakeet-cli container image to ghcr#7
mudler merged 4 commits into
masterfrom
add-docker-ghcr

mudler commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mudler commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant