ci: build and publish the parakeet-cli container image to ghcr#7
Merged
Conversation
Add a multi-stage Dockerfile and a docker workflow that builds the parakeet-cli image and pushes it to ghcr.io/<owner>/parakeet.cpp-cli. - Dockerfile: fat build stage compiles parakeet-cli plus the ggml backends, a slim runtime stage carries only the binary and the ggml .so files. One Dockerfile, CPU and CUDA variants selected via BUILD_BASE / RUNTIME_BASE / CMAKE_EXTRA_ARGS build args. GGML_NATIVE=OFF so the image is portable across x86-64 hosts. The ggml submodule is re-inited as a throwaway git repo in the build stage so the CMake-driven patch step (git apply) works regardless of how the submodule arrived in the context. - docker.yml: matrix over cpu/cuda, builds on every push/PR (build-only gate on PRs), pushes to ghcr on master + tags + dispatch. Tags via metadata-action: latest / sha / vX.Y.Z, with a -cuda suffix for the CUDA variant. Uses GITHUB_TOKEN, gha build cache. - .dockerignore keeps the context small (excludes .git, build dirs, models, benchmark media) while keeping the ggml source. - README: Docker section with CPU and CUDA run examples. Verified the CPU image end to end: builds at 127 MB, parakeet-cli runs, and transcribing tests/fixtures/speech.wav with a mounted q5_k 110m model yields the exact NeMo reference transcript. Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Publish each variant (cpu, cuda) as a multi-arch manifest covering linux/amd64 and linux/arm64. The arm64 CUDA image runs natively on Grace / GB10-class hosts. Every arch is built natively, no QEMU: amd64 on ubuntu-24.04, arm64 on the ubuntu-24.04-arm hosted runner (free for public repos). Emulated nvcc builds would be far too slow. The per-arch images are pushed by digest and a merge job stitches them into one manifest per variant, tagged via metadata-action. Verified the arm64 CPU image builds and runs (aarch64) under emulation locally, and confirmed the ubuntu and nvidia/cuda base images all ship arm64. Assisted-by: Claude:claude-opus-4-8 [Claude Code]
…ark) CUDA 12.6 tops out at sm_90, so the CUDA images would not run on GB10 / Grace-Blackwell. The vendored ggml's CUDA CMake adds 120a-real at CUDA >= 12.8 and 121a-real (GB10 / DGX Spark / Thor) at CUDA >= 12.9, all under our GGML_NATIVE=OFF default. Bumping both arches to nvidia/cuda:13.0.1 therefore compiles Turing through Blackwell with no manual arch list: amd64 picks up Hopper / Ada / RTX 50, arm64 picks up GH200 (sm_90 PTX) and GB10 (sm_121). Assisted-by: Claude:claude-opus-4-8 [Claude Code]
…nly PR gate The CUDA builds failed at link: libggml-cuda.so had undefined references to the CUDA driver API (cuMemCreate, cuMemMap, cuDeviceGet, ...). Those come from ggml's VMM memory pool, which links libcuda -- a lib a GPU-less build container does not have. Build with -DGGML_CUDA_NO_VMM=ON: every cuMem* call is under #if defined(GGML_USE_VMM), which this flag disables, so the symbols and the libcuda link dependency both go away. Verified locally: the amd64 CUDA image now links clean, ships libggml-cuda.so, and resolves libcudart / libcublas from the CUDA 13 runtime base. Also cut build time, which had blown out to 43 min on the arm64 CUDA job: - arm64 CUDA targets only Grace GPUs now (CUDA_ARCHS=90;121-real -> GH200 + GB10/Spark) instead of ggml's full 7-arch list. Added a dedicated quoted CUDA_ARCHS build-arg so the ';' list separator survives the shell (the unquoted CMAKE_EXTRA_ARGS would split it as a command separator). - pull_request now builds the CPU variant only (fast Dockerfile gate) via a dynamic matrix from a setup job. CUDA builds only on push / tag / dispatch, which also publish. Use workflow_dispatch to exercise CUDA before merging. Assisted-by: Claude:claude-opus-4-8 [Claude Code]
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add a multi-stage Dockerfile and a docker workflow that builds the parakeet-cli image and pushes it to ghcr.io//parakeet.cpp-cli.
Verified the CPU image end to end: builds at 127 MB, parakeet-cli runs, and transcribing tests/fixtures/speech.wav with a mounted q5_k 110m model yields the exact NeMo reference transcript.
Assisted-by: Claude:claude-opus-4-8 [Claude Code]