A walk-through for the people who'll touch this code: contributors to CppInterOp, clad, and cppyy on one side, and people who maintain the recipe definitions on the other. Read this before reaching for the README — the README is a quick reference, this is the why.
Every CI run on CppInterOp / clad / cppyy spends most of its wall clock building LLVM. That cost is mostly redundant: the LLVM tree is the same across most matrix rows, and even when it isn't, the same config is rebuilt across every PR for every project, every push, on every runner. apt-llvm.org and Homebrew solve this for vanilla LLVM, but the variants we actually need — sanitizer-instrumented LLVM, LLVM cross-compiled to run inside wasm, the cling fork, eventually MSan stacks and sanitizer-CPython — aren't redistributed by anyone upstream. So we end up rebuilding them.
This repository caches those variants. The contract is small: a
recipe is a directory under recipes/ with two files
(recipe.yaml for metadata, build.sh for the build), and the
cache is a content-addressed store of tarballs keyed by a hash of
that directory plus (version, os, arch). Same inputs → same key
→ same artifact, regardless of which CI run produced it.
Nothing about the cache is magical. A cached recipe is a
<key>.tar.zst plus a <key>.manifest.json, attached to a GitHub
Release on this repository. setup-recipe is a thin Action that
knows how to compute the key and HEAD-probe the asset.
publish-recipe is its inverse — runs the recipe's build.sh,
tar/zstd's the result, uploads it.
The recipe directory's content is the only knob. Edit
recipes/llvm-asan/build.sh — the key changes for every cell that
recipe produces, the next push to main rebuilds them, the cache
repopulates. Bump the LLVM version in your client repo's matrix
from '22' to '23' — the key changes (different version
input), publish-recipe builds the new cell, leaves the old one
alone until prune-cache garbage-collects it after caps.grace_days.
Things that don't move the key, deliberately:
- The runner image SHA. GitHub bumps these often; invalidating every cell on every bump would mean rebuilding LLVM on every Tuesday. The runner image is recorded in the manifest for forensics, so when something does break post-bump you can correlate.
- External action versions (
ccache-action,checkout). Pinned to floating tags during iteration; sha-pin before the v1 contract freezes. - Wall clock. Reproducibility outweighs freshness here.
The honest summary: the key tracks inputs we control inside the recipe directory. Everything else gets logged but doesn't invalidate.
You'll touch the cache in one of three ways depending on what you're doing.
Add a setup-recipe step to your matrix row:
- uses: compiler-research/ci-workflows/actions/setup-recipe@<sha>
with:
recipe: llvm-asan
version: '22'
os: ${{ matrix.os }}
arch: x86_64On a hit you get the LLVM tree at $GITHUB_WORKSPACE/llvm-project
in seconds — a curl | tar | zstd pipe, nothing else. On a miss
the action falls through to building inline so your job doesn't
break before the cache is warmed; expect ~30 min for a full
asan-LLVM build, less for cached partial work via ccache.
The cache-base input controls where to look. By default it's
this repository's Releases. Set it to file:///abs/path/ to point
at a local directory (under act) or to
http://lab.example.org/recipes/ to point at a team-internal
HTTP cache. The same key works against all of them.
Two triggers feed publish-recipe.yml:
- Push to
mainthat touchesrecipes/,actions/setup-recipe/,actions/publish-recipe/, or the workflow file. Iterates the cell matrix automatically;skip-if-existskeeps it idempotent so a no-op push costs one HEAD probe per cell. - Manual
workflow_dispatchfor one-off cell warming — retrying a flaky build, populating a cell that just got added.
You almost never invoke publish-recipe directly. Most of the
time, when you change a recipe, the next push to main does the
right thing.
This is the part worth knowing about even if you never push to
ci-workflows. The bin/recipe-cache CLI is a self-contained
shell script that wraps the same code paths as the actions.
Defaults the backend to file:// in ~/.cache/recipe-cache, and
exposes the same operations:
# Run the recipe end-to-end. Real ~30-min asan build.
bin/recipe-cache build llvm-asan 22 ubuntu-24.04 x86_64
# Treat an existing build as if a recipe had produced it. Useful
# when you want to test the cache layer without paying for a
# fresh build.
bin/recipe-cache pack llvm-asan 22 darwin arm64 \
--from /Users/me/work/builds/llvm-22-release
# Fetch + extract.
bin/recipe-cache get llvm-asan 22 ubuntu-24.04 x86_64 --out /tmp/llvm
bin/recipe-cache list # show what's cached
bin/recipe-cache key llvm-asan 22 ubuntu-24.04 x86_64
bin/recipe-cache rm <full-key>The cache directory is ~/.cache/recipe-cache (override with
RECIPE_CACHE_DIR). It's just <key>.tar.zst and
<key>.manifest.json files — no database, no daemon, no lock
file.
Mockups aren't safe to share. A
recipe-cache packtarball bears the same key shape as a real publish —setup-recipewill happily download and trust it. The manifest'skind: mockupfield is documentation only, not enforced. Treat~/.cache/recipe-cacheas machine-local; don't rsync mockup entries to a shared cache. (The publish path on the action side won't accept a mockup, but a hand-crafted upload could.)
To point a CppInterOp / clad / cppyy job at your local cache
when you run it under act:
env:
RECIPE_CACHE_BASE: file:///root/.cache/recipe-cache/The same content addressing means "if your local cache holds this key, the workflow will see a hit" — without ever touching GitHub. Useful for testing recipe changes before pushing, for working offline, for reproducing a CI failure on bare metal.
The cache works because it isn't trying to be too clever. There are four limits worth knowing about up front.
Build trees aren't relocatable. LLVMConfig.cmake stores
absolute paths to its imported targets. When you rsync your local
cache to a colleague whose home directory differs, cmake will
configure cleanly but ninja will fail at link time. For the
GHA case this is invisible — every runner extracts to
$GITHUB_WORKSPACE/llvm-project. For local use on the same
machine, also invisible. Cross-machine cache sharing is the case
that doesn't work today; we'll revisit if it actually matters.
The first miss after a flag bump pays the build cost. When
you edit build.sh, the key changes for every cell that uses
that recipe. The push-to-main triggers publish-recipe to refill
them all in parallel — typically ~30 min end-to-end, ccache
makes most of it cheap. Until that finishes, downstream PRs that
hit the new key fall through to inline build (build-on-miss: true is the default). You may want to wait for the ci-workflows
merge to settle before merging downstream PRs touching the same
recipe.
There is no auth on https:// reads. A team-internal HTTP
cache without TLS or with basic auth needs a wrapper. The lib's
curl invocation is bare; we'll add RECIPE_CACHE_AUTH_HEADER
env-var support when someone deploys a private host. Not a
priority until then.
Recipe builds aren't host-portable for free. The first cell
of a new recipe needs verification on each platform you intend to
publish for — cmake flag differences, ninja target name
differences, available libraries. cells.yaml enumerates which
combinations are first-class; every cell expansion is a manual
integration step done by adding a row to cells.yaml and either
dispatching publish-recipe once for that cell or letting the
push trigger pick it up.
A recipe is a directory under recipes/. Two files:
recipe.yaml— metadata read bybuild-manifest.sh. Keep it minimal. Today onlyrecipe,description, andsource.{repo, branch_template}are read; the verify workflow'srecipe-yaml-no-dead-fieldscheck enforces this.build.sh— the imperative build. ReceivesRECIPE_VERSION,WORK_DIR,OUT_DIRenv vars; writes its result to$OUT_DIR/llvm-project/(or whatever subdirectory tree your recipe wants —setup-recipeand the CLI both surface the tarball root verbatim).
Verify locally with bin/recipe-cache build before pushing.
The verify workflow will catch the rest at PR time:
actionlintover your edits to action / workflow files.compute-key-parity— your new key is stable across invocation contexts.manifest-schema— emitting valid JSON.tar-zstd-round-trip— the publish/consume pipelines round-trip bytewise.end-to-end-fixture— the CLI builds + caches + extracts a synthetic recipe.
When the recipe lands, add a row to publish-recipe.yml's
push-trigger matrix so main warms it on every relevant push.
For now, edit the matrix in publish-recipe.yml. Add a row
matching the new (version, os, arch) tuple. The push trigger
takes care of the build on the next merge that touches the
recipe directory or the workflow file.
Change the version input on the setup-recipe call in your
client repo's matrix. The key changes; on the first PR run after
the bump, the recipe builds inline (build-on-miss: true does
the right thing); on the next push to ci-workflows main the new
cell gets warmed. The previous version's cell stays cached until
either it ages past caps.grace_days or you remove it from
cells.yaml, at which point prune-cache drops it. To evict
orphans before they age out — e.g. when storage is over hard_gb
and the grace window is holding too much back — dispatch
prune-cache manually with the force input enabled; this
bypasses grace_days for that one run and may break in-flight PRs
that referenced the dropped keys (they fall back to building from
source).
If your matrix targets a [self-hosted, ...] runner that isn't
always on (a Dell box on someone's desk, a workstation that
sleeps), actions/wake-on-lan sends the magic packet from a
spotter runner and waits for SSH (TCP port 22) on the target:
jobs:
wake-runner:
# Spotter runner shares a LAN with the dell so the magic packet
# reaches it via subnet broadcast.
runs-on: [self-hosted, spotter]
steps:
- uses: compiler-research/ci-workflows/actions/wake-on-lan@<sha>
with:
mac: <hardware address>
target-host: <ip address>
# target-port: 22 # default; SSH = "ready"
# broadcast: 192.168.100.255 # default derived from IPv4 target
# port: 9 # UDP WoL port; some old routers use 7
# timeout-seconds: 240 # 4 minutes, checking every 10 s
build:
needs: wake-runner
runs-on: [self-hosted, dell]
...The action makes no assumptions about act -- it just sends the packet. Consumers whose self-hosted runner is unreachable from act (the typical case) don't need any guarding; act-only repro paths target hosted-runner jobs that don't need the wake at all.
What the action does:
- Masks MAC/IP/broadcast in the run log (
::add-mask::). - Pre-checks the target via
bash /dev/tcp/$host/$port-- skips the magic packet if the host is already responsive on the readiness port. - Sends the magic packet via pure-stdlib Python UDP broadcast
(no
apt-get install wakeonlan, nosudo-- UDP sendto doesn't require privileges). - Waits for the readiness port to start accepting TCP connects.
bash /dev/tcp is the portable readiness probe across GHA images
that lack nc or ping. Default port 22 corresponds to SSH being
up, which is the strongest signal that the runner is ready to
register itself with GitHub.
The manifest sibling tells you what produced any given tarball:
gh release view cache -R compiler-research/ci-workflows \
| grep manifest.json
gh release download cache -R compiler-research/ci-workflows \
-p '<key>.manifest.json'
jq . <key>.manifest.jsonManifests record: the source repository and commit, the recipe file content hashes, the runner image and version, the ci-workflows commit that built it, the build timestamp. If a cached binary surprises you in the field, the manifest is where you start.
End-to-end:
- CI fails on a downstream PR. You'd rather not push another branch every iteration.
- From the failing-PR repo:
bin/repro --list. Failed rows on the current branch are tagged red; pick the row you care about. bin/repro <row-name>runs that exact row inside docker via nektos/act. The shortcut handles workflow / job / matrix / container-arch / pre-flight collision detection.- The post-run shell drops you inside the container. Edit code,
recompile, rerun the tests. On shell exit you're prompted to
dump
git diff HEADto/tmp/repro-<row>.patchon the host;git apply <patch>brings the edits back to your working tree. - To test changes to ci-workflows itself (this repo) without
pushing a branch, pass
--ci-workflows <local-path>-- bin/repro overlays your localactions/on the workflow. - Iterate. Push when green.
- Copies every
actions/<name>/from the local checkout to<downstream>/.github/act-ci-workflows-stage/<name>/(a copy rather than a symlink, because act doesn't follow directory symlinks for local actions). - Writes a temp workflow beside the original with each
uses: compiler-research/ci-workflows/actions/<name>@<ref>rewritten touses: ./.github/act-ci-workflows-stage/<name>. - Runs act on the temp workflow; removes the stage and temp file at exit.
~/.cache/act/ is untouched, so you can keep multiple
ci-workflows checkouts on different branches and switch which one
bin/repro consumes via --ci-workflows <path>.
- The downstream's
runs-on:slugs need to dispatch under act (Linux containers; macOS / Windows rows skip). <row-name>resolves via fnmatch against whatact -n --jsonenumerates; ambiguous matches print the candidates instead of running.- act bind-mounts the consumer working tree, so workflow side
effects (
build/,llvm-project/,__ci_workflows__/) persist on the host after the container is removed. The workspace-clash pre-flight catches these on the next run; clean them up by hand for a pristine tree. Stage and temp workflow ARE cleaned at exit; if a run is killed hard, remove.github/act-ci-workflows-stage/and.github/workflows/act-*-localized-*.ymlby hand.
bin/repro <cell> --devshell is a different mode: it doesn't run
a workflow. It downloads the cell's published install +
sibling-ccache + manifest, shallow-clones llvm-project at the
manifest's pinned SRC_COMMIT, and drops you into a long-lived
container ready for incremental rebuilds against the producer's
ccache.
Use it when:
- A workflow ran clean in CI but you want to edit something in
LLVM itself and rebuild fast (the
bin/repro <row>shell only reproduces the row's own build, which doesn't iterate well). - You're triaging a cppyy / CppInterOp issue that needs a patched LLVM.
- You want to verify a recipe's published install actually compiles the next dependent layer (CppInterOp, cling) before relying on it.
Either form works:
- A matrix-row name from a consumer repo (
bin/repro --listfrom that repo enumerates them). Looked up againstact -n --json, which givesbin/reprothe recipe coord to download. - A direct
recipe/version/os/archcoord, e.g.llvm-release/22/ubuntu-24.04/x86_64. Use this when no consumer matrix references the cell yet (e.g. you just published it and haven't migrated downstreamsetup-llvmcallers).
The cell is validated against cells.yaml; a typo fails fast
rather than 404'ing on Releases.
~/.cache/ci-workflows/devshell/<cell>/
_recipe_out/llvm-project/ install tree (LLVM_PREFIX)
.ccache/ producer's sibling ccache
_recipe_work/llvm-project/ shallow llvm-project @ SRC_COMMIT
manifest.json producer manifest
The container (devshell-<cell>) bind-mounts this directory at the
recipe's runner workspace path (read from
manifest.build_env.ccache.base_dir), so ccache's recorded paths
match. Re-invoking bin/repro <cell> --devshell re-uses the
container; --devshell-rm deletes it but keeps the workdir.
| flag | effect |
|---|---|
--devshell-rm |
remove the container; workdir is kept |
--devshell-refetch |
re-download install/ccache/manifest |
--devshell-script PATH |
run host PATH inside the container, exit with its rc (batch mode) |
Idempotent — runs once per fetch, no-ops on rebuild:
- apt deps: same set as
install-build-depsLinux step (clang, cmake, ninja, ccache, libedit-dev, ...). - libstdc++ auto-detect: reads
manifest.cmake_state.CMakeCXXCompiler.cmake, extracts theCMAKE_CXX_IMPLICIT_INCLUDE_DIRECTORIESpath, and apt-installs the matchinglibstdc++-N-devif it isn't local. Catches the catthehackerlibstdc++-13vs GHAlibstdc++-14drift that makes every C++ TU's preprocessed output diverge — 100% ccache-miss against the producer cache. For pre-cmake_statemanifests, defaults tolibstdc++-14-dev(matches the GHAubuntu-24.04runner the recipes target). - package-drift warning: diffs the producer's
manifest.build_env.installed_packagesagainst localdpkg-queryoutput, filtered to dev / clang / cmake / ninja / ccache / lld packages. Surfaces a::warning::line per divergent package; no auto-install. - ccache
compiler_check: applies the producer's value verbatim (exported bybin/reprofrommanifest.build_env.ccache.compiler_check). Warns when the consumer's$CC --versiondiverges. - cmake configure: replays the recipe's own cmake invocation
from
manifest.cmake_args, substitutingCMAKE_INSTALL_PREFIXand the source path. Pre-cmake_argsmanifests fall back tollvm_build.base_cmake_args() + LLVM_ENABLE_PROJECTS=clang. - smoke compile: builds
lib/Support/CMakeFiles/LLVMSupport.dir/Allocator.cpp.o. Zero ccache hits ⇒ producer cache isn't reaching the consumer (drift the earlier checks didn't catch); surfaces a::warning::rather than aborting.
- Linux Ubuntu cells only (
ubuntu-22.04,ubuntu-24.04); other cell OSes refuse with a clear error. - macOS hosts work via the Linux container, paying Rosetta emulation overhead on Apple Silicon.
- Pre-portable-ccache manifests (no
build_env.ccache) provision correctly but miss on the first compile until a republish writes the portable-hashing config alongside.