Skip to content

BUILD-10724: Import GitHub cache to S3 (migration fallback)#45

Open
julien-carsique-sonarsource wants to merge 1 commit intomasterfrom
feat/jcarsique/BUILD-10724-migrationGh2s3
Open

BUILD-10724: Import GitHub cache to S3 (migration fallback)#45
julien-carsique-sonarsource wants to merge 1 commit intomasterfrom
feat/jcarsique/BUILD-10724-migrationGh2s3

Conversation

@julien-carsique-sonarsource
Copy link
Contributor

@julien-carsique-sonarsource julien-carsique-sonarsource commented Mar 18, 2026

Summary

When migrating from GitHub cache to S3, the S3 bucket starts empty. This causes all runners to re-download dependencies from scratch until their first S3 cache save completes. This PR addresses that by automatically importing existing GitHub cache entries into S3 during the migration window.

Changes

action.yml — new import-github-cache input + migration steps

New input: import-github-cache (default: enabled)

Resolution order (mirrors the existing CACHE_BACKEND / backend pattern):

  1. Action input import-github-cache
  2. Environment variable CACHE_IMPORT_GITHUB (can be set from a repo variable via ${{ vars.CACHE_IMPORT_GITHUB }})
  3. Default: true

New steps (S3 path only):

Step Description
Determine GitHub cache import mode Resolves the effective import mode from input → env var → default
Import GitHub cache to S3 (migration fallback) Runs actions/cache/restore with the original unprefixed key when S3 misses and import mode is active. The S3 post-job save then persists the restored content to S3.
Enforce fail-on-cache-miss after GitHub import fallback Fails explicitly only after both S3 and GitHub have been tried, so fail-on-cache-miss: true is still respected.

fail-on-cache-miss and lookup-only are correctly propagated to the GitHub fallback step.

.github/workflows/check-cache-migration.yml — migration completion detector

Manually triggered (workflow_dispatch) workflow to determine when the migration is complete and automatically opt out of the import fallback.

It:

  1. Lists GitHub cache entries (paginated), filtering to long-lived branches (main, master, branch-*, dogfood-on-*, feature/long/*) and excluding transient keys (build-number-*, mise-*)
  2. Lists S3 objects (paginated via AWS CLI)
  3. Compares: for each included GitHub entry, checks for {ref}/{key} in S3
  4. If 100% of entries are present in S3: creates or updates repository variable CACHE_IMPORT_GITHUB=false, disabling the fallback automatically

Behaviour matrix

S3 hit Import mode GH hit Result fail-on-cache-miss
active S3 content used, no GH attempt per input
active GH content restored → saved to S3 post-job pass
active miss fail if flag set
inactive S3 content used per input
inactive miss per input (unchanged)

Testing

Dogfood PR: SonarSource/sonar-dummy-js#125

Jira

BUILD-10724 — child of BUILD-10684

@julien-carsique-sonarsource julien-carsique-sonarsource requested a review from a team as a code owner March 18, 2026 17:13
@sonar-review-alpha
Copy link

sonar-review-alpha bot commented Mar 18, 2026

Summary

Adds automatic migration of GitHub Actions cache entries to S3 during the transition period. When using the S3 backend and an entry isn't found in S3, the action falls back to restoring from GitHub Actions cache (using the original unprefixed key), then saves the restored content to S3 post-job. This is enabled by default but can be disabled via import-github-cache: 'false', the CACHE_IMPORT_GITHUB environment variable, or automatically once migration is complete. Also includes a manual detection workflow that compares GitHub and S3 entries and auto-disables the fallback when migration finishes.

What reviewers should know

Start here: Review the three new steps in action.yml (lines 129–161): they implement the core logic. The key insight is that fail-on-cache-miss is conditionally suppressed on the S3 step when import mode is active (line 172), then re-enforced only after both S3 and GitHub have been tried (lines 174–188). This ensures the flag is still respected but only after the fallback is attempted.

Test scenarios: The test-cache-migration-gh2s3.yml workflow covers four cases: import enabled (success), import disabled, disabling via env var, and S3 hit (verifies GitHub is NOT tried). Run through these to confirm the behavior matrix.

Migration detector: check-cache-migration.yml is per-repository and must be copied to each repo using S3. It filters branches (includes long-lived ones, excludes transients) and auto-disables the fallback by setting CACHE_IMPORT_GITHUB=false once migration is 100% complete. The comparison logic checks for {ref}/{key} pattern in S3.

Non-obvious decisions: (1) The output resolver now includes steps.github-import.outputs.cache-hit (action.yml line 49), so downstream steps see hits from either backend. (2) GitHub fallback uses the original unprefixed key—it doesn't know about S3 path prefixes. (3) The resolution order (input → env var → true) mirrors the existing CACHE_BACKEND pattern for consistency.

Gotcha: The CACHE_IMPORT_GITHUB repository variable only takes effect if referenced in the workflow with ${{ vars.CACHE_IMPORT_GITHUB }}; it won't auto-apply. See README for the recommended pattern.


  • Generate Walkthrough
  • Generate Diagram

🗣️ Give feedback

@julien-carsique-sonarsource julien-carsique-sonarsource marked this pull request as draft March 18, 2026 17:13
Copy link

@sonar-review-alpha sonar-review-alpha bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conclusion: The migration logic is sound and fits the existing patterns well, but there is one security issue that needs fixing before merge.

SonarCloud recommendations: Even though action.yml:191 is not failing the quality gate, I strongly recommend fixing it because ${{ inputs.key }} is interpolated directly into the shell run: block inside double quotes — a key value containing a " character can break out of the string and execute arbitrary shell commands. Fix by passing the key through an environment variable:

env:
  CACHE_KEY: ${{ inputs.key }}
run: |
  echo "::error::Cache miss: no cache found in S3 or GitHub for key '${CACHE_KEY}'"
  exit 1

🗣️ Give feedback

@julien-carsique-sonarsource julien-carsique-sonarsource force-pushed the feat/jcarsique/BUILD-10724-migrationGh2s3 branch 3 times, most recently from 74563fd to 261d1d0 Compare March 23, 2026 18:04
@julien-carsique-sonarsource julien-carsique-sonarsource marked this pull request as ready for review March 23, 2026 18:08
Copy link

@sonar-review-alpha sonar-review-alpha bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conclusion: The migration logic in action.yml is sound and handles the fail-on-cache-miss edge cases correctly, but the completion-detection workflow has a pagination bug that would prevent it from ever auto-disabling the fallback in production.

🗣️ Give feedback

@julien-carsique-sonarsource julien-carsique-sonarsource force-pushed the feat/jcarsique/BUILD-10724-migrationGh2s3 branch from 261d1d0 to c0f18eb Compare March 23, 2026 18:20
Copy link

@sonar-review-alpha sonar-review-alpha bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conclusion: The new commit adds solid test coverage (content verification, the S3-hit scenario, and key fixes like dropping ${{ runner.os }} from keys). However, the S3 pagination bug flagged in the previous review remains unaddressed.

🗣️ Give feedback

@julien-carsique-sonarsource julien-carsique-sonarsource force-pushed the feat/jcarsique/BUILD-10724-migrationGh2s3 branch from c0f18eb to 2afad67 Compare March 23, 2026 18:34
Copy link

@sonar-review-alpha sonar-review-alpha bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! ✅

🗣️ Give feedback

…tion mode)

When the S3 backend is used and the S3 cache misses, automatically attempt to restore the cache from GitHub using the original unprefixed key.
The S3 post-job step will then save the restored content to S3, pre-provisioning it for subsequent runs.

The feature is enabled by default. Resolution order to disable it:
  1. Action input `import-github-cache: 'false'`
  2. Environment variable `CACHE_IMPORT_GITHUB=false`
  3. Default: true

`fail-on-cache-miss` and `lookup-only` are propagated to the GitHub fallback step.
When `fail-on-cache-miss` is set and import mode is active, failure is deferred until both S3 and GitHub have been tried.

Also adds `.github/workflows/check-cache-migration.yml`: a manually-triggered workflow that compares GitHub cache entries to S3 objects across
target branches (main, master, branch-*, dogfood-on-*, feature/long/*), ignoring transient keys (build-number-*, mise-*). When 100% of entries
are found in S3, it automatically sets the CACHE_IMPORT_GITHUB=false repository variable to disable the import fallback (this requires the
`CACHE_IMPORT_GITHUB` environment variable to be set from the repository variable via `${{ vars.CACHE_IMPORT_GITHUB }}`).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@julien-carsique-sonarsource julien-carsique-sonarsource force-pushed the feat/jcarsique/BUILD-10724-migrationGh2s3 branch from 2afad67 to 0d796ab Compare March 24, 2026 09:13
@sonarqube-cloud-us
Copy link

@sonarqubecloud
Copy link

Copy link

@sonar-review-alpha sonar-review-alpha bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conclusion: The action.yml logic is solid and the fail-on-cache-miss deferral pattern is correctly implemented. However, the migration detector has a key-format mismatch that will cause it to permanently misreport migration status for any caches saved from PR-triggered workflows on dogfood-on-* or feature/long/* branches.

🗣️ Give feedback

REF=$(echo "$ENTRY" | jq -r '.ref')
KEY=$(echo "$ENTRY" | jq -r '.key')
# Expected S3 key: {ref}/{key} (e.g. refs/heads/main/my-gradle-abc123)
S3_KEY="${REF}/${KEY}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The S3 key comparison assumes all S3 keys have a refs/heads/ prefix, but that depends on how the cache was originally saved.

prepare-keys.sh derives the branch prefix as ${GITHUB_HEAD_REF:-$GITHUB_REF}. For push events GITHUB_HEAD_REF is unset, so the prefix is GITHUB_REF = refs/heads/main → S3 key: refs/heads/main/{key}. For PR events GITHUB_HEAD_REF is just the short branch name (e.g. feature/long/my-branch), so the S3 key is feature/long/my-branch/{key}no refs/heads/ prefix.

The GitHub cache API always returns .ref in full form (refs/heads/feature/long/my-branch), so S3_KEY = "refs/heads/feature/long/my-branch/{key}" will never match feature/long/my-branch/{key} in the S3 listing. Any GitHub cache entry for dogfood-on-* or feature/long/* that was originally saved from a PR workflow will be permanently reported as "missing", and the detector will never declare migration complete as long as those entries exist.

Fix: strip the refs/heads/ prefix from REF before building S3_KEY, and also try the prefix-less form:

REF_SHORT=$(echo "$REF" | sed 's|^refs/heads/||')
S3_KEY_FULL="${REF}/${KEY}"
S3_KEY_SHORT="${REF_SHORT}/${KEY}"
if grep -qxF "$S3_KEY_FULL" /tmp/s3_keys.txt || grep -qxF "$S3_KEY_SHORT" /tmp/s3_keys.txt; then

Or, simpler: always strip the prefix when building the S3 key, since prepare-keys.sh only uses the full refs/heads/ form when GITHUB_HEAD_REF is absent (push events), and in those cases the S3 key is refs/heads/main/{key} — which would also match a prefix-stripped comparison of main/{key}. The safest approach is to check both forms.

  • Mark as noise

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant