Skip to content

ci: build and publish the parakeet-cli container image to ghcr#7

Merged
mudler merged 4 commits into
masterfrom
add-docker-ghcr
Jun 2, 2026
Merged

ci: build and publish the parakeet-cli container image to ghcr#7
mudler merged 4 commits into
masterfrom
add-docker-ghcr

Conversation

@mudler
Copy link
Copy Markdown
Owner

@mudler mudler commented Jun 2, 2026

Add a multi-stage Dockerfile and a docker workflow that builds the parakeet-cli image and pushes it to ghcr.io//parakeet.cpp-cli.

  • Dockerfile: fat build stage compiles parakeet-cli plus the ggml backends, a slim runtime stage carries only the binary and the ggml .so files. One Dockerfile, CPU and CUDA variants selected via BUILD_BASE / RUNTIME_BASE / CMAKE_EXTRA_ARGS build args. GGML_NATIVE=OFF so the image is portable across x86-64 hosts. The ggml submodule is re-inited as a throwaway git repo in the build stage so the CMake-driven patch step (git apply) works regardless of how the submodule arrived in the context.
  • docker.yml: matrix over cpu/cuda, builds on every push/PR (build-only gate on PRs), pushes to ghcr on master + tags + dispatch. Tags via metadata-action: latest / sha / vX.Y.Z, with a -cuda suffix for the CUDA variant. Uses GITHUB_TOKEN, gha build cache.
  • .dockerignore keeps the context small (excludes .git, build dirs, models, benchmark media) while keeping the ggml source.
  • README: Docker section with CPU and CUDA run examples.

Verified the CPU image end to end: builds at 127 MB, parakeet-cli runs, and transcribing tests/fixtures/speech.wav with a mounted q5_k 110m model yields the exact NeMo reference transcript.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

mudler added 3 commits June 2, 2026 20:46
Add a multi-stage Dockerfile and a docker workflow that builds the
parakeet-cli image and pushes it to ghcr.io/<owner>/parakeet.cpp-cli.

- Dockerfile: fat build stage compiles parakeet-cli plus the ggml backends,
  a slim runtime stage carries only the binary and the ggml .so files. One
  Dockerfile, CPU and CUDA variants selected via BUILD_BASE / RUNTIME_BASE /
  CMAKE_EXTRA_ARGS build args. GGML_NATIVE=OFF so the image is portable
  across x86-64 hosts. The ggml submodule is re-inited as a throwaway git
  repo in the build stage so the CMake-driven patch step (git apply) works
  regardless of how the submodule arrived in the context.
- docker.yml: matrix over cpu/cuda, builds on every push/PR (build-only gate
  on PRs), pushes to ghcr on master + tags + dispatch. Tags via
  metadata-action: latest / sha / vX.Y.Z, with a -cuda suffix for the CUDA
  variant. Uses GITHUB_TOKEN, gha build cache.
- .dockerignore keeps the context small (excludes .git, build dirs, models,
  benchmark media) while keeping the ggml source.
- README: Docker section with CPU and CUDA run examples.

Verified the CPU image end to end: builds at 127 MB, parakeet-cli runs, and
transcribing tests/fixtures/speech.wav with a mounted q5_k 110m model yields
the exact NeMo reference transcript.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Publish each variant (cpu, cuda) as a multi-arch manifest covering
linux/amd64 and linux/arm64. The arm64 CUDA image runs natively on Grace /
GB10-class hosts.

Every arch is built natively, no QEMU: amd64 on ubuntu-24.04, arm64 on the
ubuntu-24.04-arm hosted runner (free for public repos). Emulated nvcc builds
would be far too slow. The per-arch images are pushed by digest and a merge
job stitches them into one manifest per variant, tagged via metadata-action.

Verified the arm64 CPU image builds and runs (aarch64) under emulation
locally, and confirmed the ubuntu and nvidia/cuda base images all ship arm64.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
…ark)

CUDA 12.6 tops out at sm_90, so the CUDA images would not run on GB10 /
Grace-Blackwell. The vendored ggml's CUDA CMake adds 120a-real at CUDA >= 12.8
and 121a-real (GB10 / DGX Spark / Thor) at CUDA >= 12.9, all under our
GGML_NATIVE=OFF default. Bumping both arches to nvidia/cuda:13.0.1 therefore
compiles Turing through Blackwell with no manual arch list: amd64 picks up
Hopper / Ada / RTX 50, arm64 picks up GH200 (sm_90 PTX) and GB10 (sm_121).

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
@mudler mudler force-pushed the add-docker-ghcr branch from d9c6e65 to e56ccd9 Compare June 2, 2026 21:10
…nly PR gate

The CUDA builds failed at link: libggml-cuda.so had undefined references to
the CUDA driver API (cuMemCreate, cuMemMap, cuDeviceGet, ...). Those come from
ggml's VMM memory pool, which links libcuda -- a lib a GPU-less build
container does not have. Build with -DGGML_CUDA_NO_VMM=ON: every cuMem* call
is under #if defined(GGML_USE_VMM), which this flag disables, so the symbols
and the libcuda link dependency both go away. Verified locally: the amd64
CUDA image now links clean, ships libggml-cuda.so, and resolves libcudart /
libcublas from the CUDA 13 runtime base.

Also cut build time, which had blown out to 43 min on the arm64 CUDA job:
- arm64 CUDA targets only Grace GPUs now (CUDA_ARCHS=90;121-real -> GH200 +
  GB10/Spark) instead of ggml's full 7-arch list. Added a dedicated quoted
  CUDA_ARCHS build-arg so the ';' list separator survives the shell (the
  unquoted CMAKE_EXTRA_ARGS would split it as a command separator).
- pull_request now builds the CPU variant only (fast Dockerfile gate) via a
  dynamic matrix from a setup job. CUDA builds only on push / tag / dispatch,
  which also publish. Use workflow_dispatch to exercise CUDA before merging.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
@mudler mudler merged commit b11fe5b into master Jun 2, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant