[CI] Add parity auto-trigger workflow#3231
Open
ethanwee1 wants to merge 1 commit into
Open
Conversation
Add a scheduled scanner that dispatches one parity report per ready upstream PyTorch main commit, with PR dry-runs to validate readiness without creating reports.
|
Jenkins build for f4dfbd8845f2d05dd28225ca78af48c1926d9e31 commit finished as FAILURE |
There was a problem hiding this comment.
Pull request overview
Adds a new GitHub Actions workflow to automatically scan recent completed trunk.yml push runs on main, determine when the relevant ROCm + CUDA check-runs for a given upstream SHA are fully complete, and then dispatch parity.yml (with a PR-only dry-run mode).
Changes:
- Introduces a scheduled (every 10 minutes) + manual + PR dry-run “parity auto-trigger” workflow.
- Implements SHA deduplication by checking existing
parity.ymlrun titles in the current repo. - Gates dispatch on completion of ROCm arch shard check-runs (only for arch workflows detected as having run) plus specific CUDA check-runs used by parity.
Comment on lines
+130
to
+140
| COMMITS=$(gh api \ | ||
| "repos/$UPSTREAM/actions/workflows/trunk.yml/runs?branch=$BRANCH&event=push&status=completed&per_page=$MAX_COMMITS" \ | ||
| --jq ' | ||
| reduce .workflow_runs[] as $run ({seen:{}, rows:[]}; | ||
| if .seen[$run.head_sha] then . | ||
| else .seen[$run.head_sha] = true | .rows += [$run] | ||
| end | ||
| ) | ||
| | .rows[] | ||
| | "\(.head_sha) \(.created_at)" | ||
| ') |
Comment on lines
+55
to
+63
| description: 'JSON: arch -> PCRE regex that matches the check-run names of that arch''s ROCm test shards on pytorch/pytorch. An arch is considered "ready" only when every check-run whose name matches has status=completed (so we wait for all test shards, not just workflow completion).' | ||
| required: false | ||
| default: '{"mi355":"rocm.*mi355.*/ test [(](default|distributed|inductor),","mi300":"rocm.*mi300.*/ test [(](default|distributed|inductor),","mi200":"(rocm.*(mi200|mi210).*/ test [(](default|distributed|inductor),|linux-jammy-rocm-py3[.]10 / test [(](default|distributed|inductor),)","navi31":"rocm.*navi31.*/ test [(]default,","nightly":"rocm-nightly.*/ test [(](default|distributed|inductor),"}' | ||
| type: string | ||
| arch_workflow_regex_map: | ||
| description: 'JSON: arch -> PCRE regex that matches workflow file paths for upstream ROCm workflows that mean this arch ran on the SHA. Missing workflows mean the arch is not expected for that commit.' | ||
| required: false | ||
| default: '{"mi355":"(^|/)(trunk|rocm-mi355|periodic-rocm-mi355|inductor-rocm-mi355)[.]yml$","mi300":"(^|/)(rocm-mi300|periodic-rocm-mi300|inductor-rocm-mi300)[.]yml$","mi200":"(^|/)(trunk-rocm-sandbox|rocm-mi200|periodic-rocm-mi200|inductor-rocm-mi200)[.]yml$","navi31":"(^|/)(rocm-navi31|periodic-rocm-navi31|inductor-rocm-navi31)[.]yml$","nightly":"(^|/)rocm-nightly[.]yml$"}' | ||
| type: string |
Comment on lines
+148
to
+162
| # Pull recent parity runs. Run titles look like: | ||
| # "<csv_name or SHA> · mi355, mi300, mi200" | ||
| # Once any parity run exists for a SHA, we do not dispatch another | ||
| # report for that SHA. This keeps the dashboard to one report per | ||
| # upstream commit. | ||
| EXISTING=$(gh run list \ | ||
| --repo "$GITHUB_REPOSITORY" \ | ||
| --workflow parity.yml \ | ||
| --limit 1000 \ | ||
| --json displayTitle 2>/dev/null || echo '[]') | ||
|
|
||
| sha_already_dispatched() { | ||
| local sha="$1" | ||
| echo "$EXISTING" | jq -e --arg sha "$sha" \ | ||
| 'any(.[]; .displayTitle | contains($sha))' >/dev/null |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
pytorch/pytorchmaintrunk.ymlpushes and dispatchesparity.ymlonce per ready upstream SHA.pull_requestdry-run path with a smaller scan window to validate the scanner without creating parity reports from PR CI.How it works
pytorch/pytorchtrunk.ymlpush runs onmain. Those trunk runs provide the candidate upstream SHAs to evaluate.ROCm/pytorchparity.ymlrun titles. If any existing parity run already contains that SHA, the SHA is skipped so we keep one report per upstream commit.mainbranch of pytorch/pytorch in any 10-minute intervalstatus=completed. It also waits for the CUDA default, distributed, and inductor check-runs consumed by parity.mem_leak_checkandrerun_disabled_testsare ignored because the parity report does not consume them.parity.ymlwith the ready arch list and a CSV prefix containing the upstream SHA, for exampleautoparity-YYYYMMDD-<sha>.dry_run=true, so they exercise the scanner and log would-be dispatches without creating reports. Scheduled and manually dispatched runs can create real parity reports.Test plan
yaml.BaseLoaderandbash -n.dry_run=false, scanned 20 recent upstream trunk runs, skipped SHAs with pending parity check-runs, dispatched 5 ready SHAs, and stopped atmax_dispatches=5.d76e83ef/mi355: https://github.com/ROCm/pytorch/actions/runs/26041518406457e1890/mi355: https://github.com/ROCm/pytorch/actions/runs/2604152899660f38508/mi355: https://github.com/ROCm/pytorch/actions/runs/26041541647d1d96569/mi355: https://github.com/ROCm/pytorch/actions/runs/260415518546e3cf2e4/mi355, mi300, mi200: https://github.com/ROCm/pytorch/actions/runs/26041618237Dispatch cadence note
max_dispatches=5only to avoid flooding ROCm/pytorch during manual testing.max_dispatches=50,max_commits=200, andmax_age_hours=72unless manually overridden.