[CI] Add parity auto-trigger workflow by ethanwee1 · Pull Request #3231 · ROCm/pytorch

ethanwee1 · 2026-05-18T14:26:03Z

Summary

Add a scheduled parity auto-trigger that scans completed pytorch/pytorch main trunk.yml pushes and dispatches parity.yml once per ready upstream SHA.
Gate dispatch on the ROCm arch workflows that actually ran for a SHA, plus the CUDA jobs consumed by parity, so partial reports are avoided.
Add a pull_request dry-run path with a smaller scan window to validate the scanner without creating parity reports from PR CI.

How it works

The workflow runs every 10 minutes and queries recent completed pytorch/pytorch trunk.yml push runs on main. Those trunk runs provide the candidate upstream SHAs to evaluate.
For each candidate SHA, it first checks recent ROCm/pytorch parity.yml run titles. If any existing parity run already contains that SHA, the SHA is skipped so we keep one report per upstream commit.
Maximum number of dispatches of parity.yml are 50, which is comfortably above the maximum number of commits to main branch of pytorch/pytorch in any 10-minute interval
It then lists all upstream workflow runs for that SHA and determines which ROCm arches actually ran. Missing periodic arch workflows are not treated as pending work; only arches with matching workflow files are expected in that report.
For the arches that did run, it lists upstream check-runs and waits for the matching ROCm test shards to reach status=completed. It also waits for the CUDA default, distributed, and inductor check-runs consumed by parity.
Auxiliary shards such as mem_leak_check and rerun_disabled_tests are ignored because the parity report does not consume them.
Once all relevant ROCm and CUDA check-runs are complete, it dispatches parity.yml with the ready arch list and a CSV prefix containing the upstream SHA, for example autoparity-YYYYMMDD-<sha>.
Pull request runs are forced to dry_run=true, so they exercise the scanner and log would-be dispatches without creating reports. Scheduled and manually dispatched runs can create real parity reports.

Test plan

Validated workflow YAML and embedded shell locally with yaml.BaseLoader and bash -n.
PR dry-run workflow succeeded: https://github.com/ROCm/pytorch/actions/runs/26039732579
Full non-dry-run workflow_dispatch succeeded: https://github.com/ROCm/pytorch/actions/runs/26041358738
The full run used dry_run=false, scanned 20 recent upstream trunk runs, skipped SHAs with pending parity check-runs, dispatched 5 ready SHAs, and stopped at max_dispatches=5.
Dispatched parity reports all completed successfully:
- d76e83ef / mi355: https://github.com/ROCm/pytorch/actions/runs/26041518406
- 457e1890 / mi355: https://github.com/ROCm/pytorch/actions/runs/26041528996
- 60f38508 / mi355: https://github.com/ROCm/pytorch/actions/runs/26041541647
- d1d96569 / mi355: https://github.com/ROCm/pytorch/actions/runs/26041551854
- 6e3cf2e4 / mi355, mi300, mi200: https://github.com/ROCm/pytorch/actions/runs/26041618237

Dispatch cadence note

The full validation run used max_dispatches=5 only to avoid flooding ROCm/pytorch during manual testing.
The production scheduled workflow runs every 10 minutes and defaults to max_dispatches=50, max_commits=200, and max_age_hours=72 unless manually overridden.

Add a scheduled scanner that dispatches one parity report per ready upstream PyTorch main commit, with PR dry-runs to validate readiness without creating reports.

rocm-repo-management-api · 2026-05-18T14:37:05Z

Jenkins build for f4dfbd8845f2d05dd28225ca78af48c1926d9e31 commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

Copilot

Pull request overview

Adds a new GitHub Actions workflow to automatically scan recent completed trunk.yml push runs on main, determine when the relevant ROCm + CUDA check-runs for a given upstream SHA are fully complete, and then dispatch parity.yml (with a PR-only dry-run mode).

Changes:

Introduces a scheduled (every 10 minutes) + manual + PR dry-run “parity auto-trigger” workflow.
Implements SHA deduplication by checking existing parity.yml run titles in the current repo.
Gates dispatch on completion of ROCm arch shard check-runs (only for arch workflows detected as having run) plus specific CUDA check-runs used by parity.

+          COMMITS=$(gh api \
+            "repos/$UPSTREAM/actions/workflows/trunk.yml/runs?branch=$BRANCH&event=push&status=completed&per_page=$MAX_COMMITS" \
+            --jq '
+              reduce .workflow_runs[] as $run ({seen:{}, rows:[]};
+                if .seen[$run.head_sha] then .
+                else .seen[$run.head_sha] = true | .rows += [$run]
+                end
+              )
+              | .rows[]
+              | "\(.head_sha) \(.created_at)"
+            ')


+        description: 'JSON: arch -> PCRE regex that matches the check-run names of that arch''s ROCm test shards on pytorch/pytorch. An arch is considered "ready" only when every check-run whose name matches has status=completed (so we wait for all test shards, not just workflow completion).'
+        required: false
+        default: '{"mi355":"rocm.*mi355.*/ test [(](default|distributed|inductor),","mi300":"rocm.*mi300.*/ test [(](default|distributed|inductor),","mi200":"(rocm.*(mi200|mi210).*/ test [(](default|distributed|inductor),|linux-jammy-rocm-py3[.]10 / test [(](default|distributed|inductor),)","navi31":"rocm.*navi31.*/ test [(]default,","nightly":"rocm-nightly.*/ test [(](default|distributed|inductor),"}'
+        type: string
+      arch_workflow_regex_map:
+        description: 'JSON: arch -> PCRE regex that matches workflow file paths for upstream ROCm workflows that mean this arch ran on the SHA. Missing workflows mean the arch is not expected for that commit.'
+        required: false
+        default: '{"mi355":"(^|/)(trunk|rocm-mi355|periodic-rocm-mi355|inductor-rocm-mi355)[.]yml$","mi300":"(^|/)(rocm-mi300|periodic-rocm-mi300|inductor-rocm-mi300)[.]yml$","mi200":"(^|/)(trunk-rocm-sandbox|rocm-mi200|periodic-rocm-mi200|inductor-rocm-mi200)[.]yml$","navi31":"(^|/)(rocm-navi31|periodic-rocm-navi31|inductor-rocm-navi31)[.]yml$","nightly":"(^|/)rocm-nightly[.]yml$"}'
+        type: string


+          # Pull recent parity runs. Run titles look like:
+          #   "<csv_name or SHA> · mi355, mi300, mi200"
+          # Once any parity run exists for a SHA, we do not dispatch another
+          # report for that SHA. This keeps the dashboard to one report per
+          # upstream commit.
+          EXISTING=$(gh run list \
+            --repo "$GITHUB_REPOSITORY" \
+            --workflow parity.yml \
+            --limit 1000 \
+            --json displayTitle 2>/dev/null || echo '[]')
+
+          sha_already_dispatched() {
+            local sha="$1"
+            echo "$EXISTING" | jq -e --arg sha "$sha" \
+              'any(.[]; .displayTitle | contains($sha))' >/dev/null


[CI] Add parity auto-trigger workflow

f4dfbd8

Add a scheduled scanner that dispatches one parity report per ready upstream PyTorch main commit, with PR dry-runs to validate readiness without creating reports.

jithunnair-amd requested a review from Copilot May 20, 2026 00:23

Copilot started reviewing on behalf of jithunnair-amd May 20, 2026 00:25 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Add parity auto-trigger workflow#3231

[CI] Add parity auto-trigger workflow#3231
ethanwee1 wants to merge 1 commit into
developfrom
ethanwee/parity-auto-every-commit

ethanwee1 commented May 18, 2026 •

edited by jithunnair-amd

Loading

Uh oh!

rocm-repo-management-api Bot commented May 18, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ethanwee1 commented May 18, 2026 • edited by jithunnair-amd Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How it works

Test plan

Dispatch cadence note

Uh oh!

rocm-repo-management-api Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ethanwee1 commented May 18, 2026 •

edited by jithunnair-amd

Loading

rocm-repo-management-api Bot commented May 18, 2026 •

edited

Loading