Skip to content

ci: Replace QEMU with native ARM64 runners for release builds#1952

Open
wrn14897 wants to merge 3 commits intomainfrom
warren/speed-up-ci-build-time
Open

ci: Replace QEMU with native ARM64 runners for release builds#1952
wrn14897 wants to merge 3 commits intomainfrom
warren/speed-up-ci-build-time

Conversation

@wrn14897
Copy link
Member

Summary

  • Replace QEMU-emulated multi-platform builds with native ARM64 runners for both release.yml and release-nightly.yml, significantly speeding up CI build times
  • Each architecture (amd64/arm64) now builds in parallel on native hardware, then a manifest-merge job combines them into a multi-arch Docker tag using docker buildx imagetools create
  • Migrate from raw Makefile docker buildx build commands to docker/build-push-action@v6 for better GHA integration

Changes

.github/workflows/release.yml

  • Removed QEMU setup entirely
  • Replaced single release matrix job with per-image build+publish job pairs:
    • build-otel-collector / publish-otel-collector (runners: ubuntu-latest / ubuntu-latest-arm64)
    • build-app / publish-app (runners: Large-Runner-x64-32 / Large-Runner-ARM64-32)
    • build-local / publish-local (runners: Large-Runner-x64-32 / Large-Runner-ARM64-32)
    • build-all-in-one / publish-all-in-one (runners: Large-Runner-x64-32 / Large-Runner-ARM64-32)
  • Added check_version job to centralize skip-if-exists logic (replaces per-image docker manifest inspect in Makefile)
  • Removed check_release_app_pushed artifact upload/download — publish-app now outputs app_was_pushed directly
  • Scoped GHA build cache per image+arch (e.g. scope=app-amd64) to avoid collisions
  • All 4 images build in parallel (8 build jobs total), then 4 manifest-merge jobs, then downstream notifications

.github/workflows/release-nightly.yml

  • Same native runner pattern (no skip logic since nightly always rebuilds)
  • 8 build + 4 publish jobs running in parallel
  • Slack failure notification and OTel trace export now depend on publish jobs

Makefile

  • Removed release-* and release-*-nightly targets (lines 203-361) — build logic moved into workflow YAML
  • Local build-* targets preserved for developer use

Architecture

Follows the same pattern as release-ee.yml in the EE repo:

check_changesets → check_version
                        │
    ┌───────────────────┼───────────────────┬───────────────────┐
    v                   v                   v                   v
build-app(x2)   build-otel(x2)    build-local(x2)    build-aio(x2)
    │                   │                   │                   │
publish-app      publish-otel       publish-local      publish-aio
    │                   │                   │                   │
    └─────────┬─────────┴───────────────────┴───────────────────┘
              v
     notify_helm_charts / notify_clickhouse_clickstack
              │
     otel-cicd-action

Notes

  • --squash flag dropped — it's an experimental Docker feature incompatible with build-push-action in multi-platform mode. sbom and provenance are preserved via action params.
  • Per-arch intermediate tags (e.g. hyperdx/hyperdx:2.21.0-amd64) remain visible on DockerHub — this is standard practice.
  • Dual DockerHub namespace tagging (hyperdx/* + clickhouse/clickstack-*) preserved.

Build each architecture (amd64/arm64) on native runners in parallel
instead of emulating ARM via QEMU on x86, then merge into multi-arch
manifests using docker buildx imagetools create.

- Replace QEMU-based multi-platform builds with per-arch matrix jobs
  on native runners (ubuntu-latest/ubuntu-latest-arm64 for otel-collector,
  Large-Runner-x64-32/Large-Runner-ARM64-32 for app/local/all-in-one)
- Switch from Makefile release targets to docker/build-push-action@v6
- Add manifest-merge publish jobs for each image
- Scope GHA build cache per image+arch to avoid collisions
- Centralize skip-if-exists check in a single check_version job
- Remove release-* and release-*-nightly Makefile targets (local
  build-* targets preserved)
- Apply same changes to both release.yml and release-nightly.yml
@vercel
Copy link

vercel bot commented Mar 20, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hyperdx-oss Ready Ready Preview, Comment Mar 20, 2026 7:39pm

Request Review

@changeset-bot
Copy link

changeset-bot bot commented Mar 20, 2026

⚠️ No Changeset found

Latest commit: dcdaee3

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions
Copy link
Contributor

github-actions bot commented Mar 20, 2026

PR Review

  • ⚠️ docker/build-push-action@v6 and other third-party actions are pinned by tag not SHA → Pin to commit SHAs (e.g., docker/build-push-action@v6docker/build-push-action@<sha>) to prevent supply-chain risk from a compromised tag.

  • ⚠️ Intermediate per-arch tags (e.g., hyperdx/hyperdx:2.21.0-amd64) are permanently pushed to DockerHub — PR notes this is "standard practice" but these unqualified tags may confuse users pulling them directly on the wrong arch. Consider a less prominent naming scheme (e.g., using a -tmp- prefix) or cleaning them up post-merge if this is a concern.

  • ✅ The check_version/should_release output propagation, publish-appapp_was_pushed output, and all needs: dependency chains look structurally correct based on the described architecture.

  • ✅ Dockerfile common-utils-builder stage and --mount=type=cache for Yarn are valid BuildKit patterns; no issues with the caching approach.

  • ✅ Dropping --squash (experimental flag incompatible with multi-platform manifests via imagetools create) is correct and expected.

  • ✅ Removal of Makefile release-* targets is safe — local build-* targets are preserved for developer use.

Large runners have sufficient disk space, so the jlumbroso/free-disk-space
action is unnecessary overhead (~30s per job).
@github-actions
Copy link
Contributor

github-actions bot commented Mar 20, 2026

E2E Test Results

All tests passed • 90 passed • 3 skipped • 986s

Status Count
✅ Passed 90
❌ Failed 0
⚠️ Flaky 3
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

…dent caching

Extract common-utils build into a dedicated Docker stage
(common-utils-builder) in all three Dockerfiles so its build layer is
cached independently of API/App source changes. When only API or App
source changes, the common-utils build layer is served directly from
Docker cache.

Also optimizes the base stages to only copy common-utils/package.json
(instead of the full source) for yarn install, preventing common-utils
source changes from invalidating the yarn install layer.

Adds BuildKit cache mounts for Yarn download cache so unchanged
packages aren't re-downloaded even when yarn.lock changes and the
layer is invalidated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant