Skip to content

[AUTOGENERATED] release/2.12_IFU_20260521#3248

Open
rocm-repo-management-api-6[bot] wants to merge 88 commits into
release/2.12from
release/2.12_IFU_20260521
Open

[AUTOGENERATED] release/2.12_IFU_20260521#3248
rocm-repo-management-api-6[bot] wants to merge 88 commits into
release/2.12from
release/2.12_IFU_20260521

Conversation

@rocm-repo-management-api-6
Copy link
Copy Markdown

rocm_base: 26872de

atalman and others added 30 commits February 16, 2026 13:13
* [RELEASE 2.11] Release only changes

* remove_file

* Trigger rebuild
Update inductor expected accuracy files (pytorch#175041)

## Summary

This PR updates the expected accuracy CSV files for inductor benchmarks based on CI results from PyTorch commit 93dd774.

These files serve as reference points for dynamo/inductor CI to track:
- Graph breaks
- Model accuracy

## Changes

- Updated CUDA expected accuracy files in `benchmarks/dynamo/ci_expected_accuracy/`
- Updated ROCm expected accuracy files in `benchmarks/dynamo/ci_expected_accuracy/rocm/`

## Test Plan

- [ ] Verify that the CI jobs pass with the updated expected accuracy files
- [ ] Review the diff to ensure changes are reasonable and expected
- [ ] Check that no unexpected regressions are being marked as "expected"

Pull Request resolved: pytorch#175041
Approved by: https://github.com/atalman

(cherry picked from commit f90c091)
…ch#172373)" (pytorch#175094)

This reverts commit 7072636.

Reverted pytorch#172373 on behalf of https://github.com/jeffdaily due to PR claims to fix ROCm DISABLED issue but it did not ([comment](pytorch#172373 (comment)))

Co-authored-by: PyTorch MergeBot <pytorchmergebot@users.noreply.github.com>
…ytorch#174596)" (pytorch#175095)

This reverts commit 781b5d1.

Reverted pytorch#174596 on behalf of https://github.com/jeffdaily due to This broke ROCm dynamo benchmarks.  Lots of permission denied errors. ([comment](pytorch#174596 (comment)))

Co-authored-by: PyTorch MergeBot <pytorchmergebot@users.noreply.github.com>
Fix macOS arm64 libtorch release upload failure (pytorch#175100)

**Summary**

  Failures introduced by following PR: pytorch#173541

  The change from RENAME_WHEEL=true to RENAME_WHEEL=false as the default in
  build_wheel.sh (landed in the 2026-01-31 nightly) broke libtorch builds on
   macOS arm64. The elif branch at line 220 was missing a BUILD_PYTHONLESS
  guard, so libtorch builds (BUILD_PYTHONLESS=1) entered the wheel-copy path
   instead of the libtorch zip-packaging path. This caused the build to
  produce a .whl artifact instead of the expected .zip files, and the upload
   script then failed because it looks for *.zip files.

  The fix adds -z "$BUILD_PYTHONLESS" to the elif condition, matching the
  guard already present on the if branch.

  Failures can be seen here: https://hud.pytorch.org/hud/pytorch/pytorch/nightly/1?per_page=50&name_filter=macos-arm64-binary-libtorch-release%20%2F%20libtorch-cpu
  Failing run: https://github.com/pytorch/pytorch/actions/runs/21541142799/job/62076418921
  Successful run (previous nightly): https://github.com/pytorch/pytorch/actions/runs/21508411052/job/61971405484

**Test plan**
	In CI run ciflow/binaries. Make sure the Rename/Copy log is same as successful run above
Pull Request resolved: pytorch#175100
Approved by: https://github.com/huydhn, https://github.com/isuruf

(cherry picked from commit bad1df7)

Co-authored-by: atalman <atalman@fb.com>
pytorch#175299)

[benchmark] Skip pytorch_CycleGAN_and_pix2pix from inductor benchmarks (pytorch#175066)

## Summary

Skip the `pytorch_CycleGAN_and_pix2pix` benchmark model from the inductor benchmark suite.

This legacy 2017 model has been failing with `eager_fail_to_run` on 100%
of commits since mid-2025, providing zero CI signal while consuming
~5.3M GPU-seconds/week across 7+ benchmark jobs on CUDA, CPU, and ROCm.

**Estimated savings: ~310 GPU-hours/week (~1,240 GPU-hours/month)**

Skip it in `torchbench.yaml` and remove its entries from all 31 expected
accuracy CSV files. Also remove it from the `higher_fp16` tolerance list.

See P2188981399 for the full CI workflow analysis.

## Test Plan

- CI should pass with CycleGAN skipped (it was already failing 100% of the time)
- No other benchmark models affected

Pull Request resolved: pytorch#175066
Approved by: https://github.com/huydhn, https://github.com/malfet

(cherry picked from commit 688c943)

Co-authored-by: Eli Uriegas <eliuriegas@meta.com>
…rch#175300)

[CI] Move CUDA 12.8 GPU tests from per-commit trunk to periodic (pytorch#175067)

## Summary

Move CUDA 12.8 GPU tests from per-commit trunk CI to periodic (~3x/day on weekdays).

Both CUDA 12.8 and 13.0 are shipping wheel targets (nightly ships cu126, cu128, cu129, cu130), but their trunk CI test suites have **85-90% failure correlation** -- they almost always fail together. Over a 30-day analysis window covering 97 reverts and 38 significant regression events, **CUDA 12.8 never uniquely caught a regression that 13.0 missed**.

CUDA 13.0 is kept per-commit because:
- It is the **newest** shipping CUDA version
- Most likely to surface **novel breakage** from new CUDA runtime behavior
- Forward-looking CI should protect what's coming, not what's already stable

CUDA 12.8 is moved to periodic because:
- It is **mature and well-understood** -- breakage is less likely and less urgent
- The rare 12.8-only regression can tolerate the ~8-hour periodic detection window
- The 12.8 build job **remains in trunk** because `cross-compile-linux-test` depends on its artifacts

**Estimated savings: ~1,270 GPU-hours/week (~5,080 GPU-hours/month)**

This is the #2 savings opportunity from a broader CI workflow analysis (P2188981399) covering 128 PR+trunk jobs over 30 days. Combined with pytorch#175066 (CycleGAN skip, ~310 GPU-hours/week), total savings from this stack: **~1,580 GPU-hours/week (~6,320 GPU-hours/month)**.

### Changes
- `trunk.yml`: remove CUDA 12.8 test job (5 default + 3 distributed + 1 pr_time_benchmarks + 1 libtorch shards) and no-ops build
- `periodic.yml`: add default (5 GPU shards on g6.4xlarge) and distributed (3 multi-GPU shards on g4dn.12xlarge) to existing CUDA 12.8 periodic entry

## Test Plan

- CUDA 12.8 GPU tests continue to run in periodic (3x/day weekdays)
- CUDA 13.0 per-commit coverage is unchanged
- Cross-compile-linux-test continues to work (12.8 build job kept)

Pull Request resolved: pytorch#175067
Approved by: https://github.com/malfet
ghstack dependencies: pytorch#175066

(cherry picked from commit ef0353f)

Co-authored-by: Eli Uriegas <eliuriegas@meta.com>
[BE] Remove cuda 12.4 periodic tests (pytorch#175170)

These tests are either timing out or failing for couple of month now. No reason to keep them around:
https://hud.pytorch.org/hud/pytorch/pytorch/main/2?per_page=50&name_filter=12.4

Failures go back as far as 9.29.2025 : https://hud.pytorch.org/pytorch/pytorch/commit/efd7fd5ed5ac7ec03201a546a09fb19ec59de431
Pull Request resolved: pytorch#175170
Approved by: https://github.com/malfet

(cherry picked from commit 174157a)

Co-authored-by: atalman <atalman@fb.com>
[CI] Add CUDA 13 periodic tests (pytorch#174850)

pytorch#173950
To prepare moving CUDA 13 wheels to stable wheels, need to add CUDA 13 periodic cuda tests.
Pull Request resolved: pytorch#174850
Approved by: https://github.com/atalman


(cherry picked from commit 7cdd4b1)

Co-authored-by: Ting Lu <tingl@nvidia.com>
Co-authored-by: Andrey Talman <atalman@fb.com>
[ROCm] forward fix pytorch#174087, take 4 (pytorch#175098)

vllm build broke due to missing getCurrentHIPStreamMasqueradingAsCUDA.

Though it existed in the header aten/src/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h, this header was not included directly or indirectly by vllm.  PR pytorch#174087 subtly broke this even when trying to be backward compatible.  Moving the declarations of these Masquerading functions into c10/cuda/CUDAStream.h (c10/hip/HIPStream.h when hipified) fixes the vllm build.  Any external projects that had included the HIPStreamMasqueradingAsCUDA.h forward to c10/hip/HIPStream.h anyway.

Pull Request resolved: pytorch#175098
Approved by: https://github.com/atalman

(cherry picked from commit e6d6f04)

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
…pytorch#175580)

[MPS] Fix 2-pass SDPA memory corruption by forcing float accumulators (pytorch#174945)

Ensure `sums` and `maxs` buffers in `sdpa_vector_2pass_mps` are allocated as `kFloat` instead of inheriting the input dtype. This fixes out-of-bounds memory access and nondeterministic/corrupt results, as reported in pytorch#174861 (reproducible with bf16/fp16 and GQA, seq_len > 1023).

Adds a regression test covering bf16/fp16/fp32 and relaxes tolerance for bf16 to validate numerical correctness and determinism on MPS.

Fixes pytorch#174861
Pull Request resolved: pytorch#174945
Approved by: https://github.com/malfet

(cherry picked from commit c68a1d2)

Co-authored-by: Roy Hvaara <roy@lightyear.no>
Disable einops 0.8.2 check on PyTorch (pytorch#175351)

Partially revert pytorch#173611 and fallback to the previous behavior on einops, which uses `allow_in_graph`.

**Context**

* Dynamo does not trace into `@lru_cache` and warns on any usage.
* einops uses `@lru_cache` as part of `_prepare_transformation_recipe`.
* Every einops op goes through this function.
* Dynamo warns on every einops op trace and this creates a logspam
  problem.

Pull Request resolved: pytorch#175351
Approved by: https://github.com/Lucaskabela

(cherry picked from commit 1fe0f51)

Co-authored-by: Guilherme Leobas <gleobas@quansight.com>
…erator[] access (pytorch#175579)

[CPUBLAS] Fix UB: use vector::resize() instead of reserve() before operator[] access (pytorch#175315)

Fixes pytorch#175302

## Summary
`reserve(1)` → `resize(1)`. See issue for details.

Pull Request resolved: pytorch#175315
Approved by: https://github.com/zou3519, https://github.com/malfet

(cherry picked from commit f08aafa)

Co-authored-by: mulatta <67085791+mulatta@users.noreply.github.com>
Remove python constraint on setuptools (pytorch#175577)

Fixes pytorch#173823
Dependency on setuptools was added 8 years ago here: pytorch#5207
This issue remained hidden since we run smoke test in conda env. Conda create env installs setuptools by default. This became apparent when testing using uv

Pull Request resolved: pytorch#175577
Approved by: https://github.com/malfet, https://github.com/seemethere

(cherry picked from commit eaa0221)

Co-authored-by: atalman <atalman@fb.com>
Supports custom empty tensor in InputObserver (pytorch#174964)

When running a LLM handling images and text (Gemma3), the first call to the forward method has input_ids, pixel_values and but no past_key_values. Next calls do not have pixel_values but have past_key_values. The InputObserver knows the whole list of inputs but since, there is only one example of input_pixel (and the batch dimension is usually constant accross all calls), we need to way to tell the InputObserver what a empty tensor for pixel_values when it is missing.
Pull Request resolved: pytorch#174964
Approved by: https://github.com/titaiwangms, https://github.com/justinchuby



(cherry picked from commit bc9adaa)

Co-authored-by: Xavier Dupré <xadupre@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
Bump transformers version to 5.2.0 (pytorch#175274)

Take over the Dependabot PR from pytorch#175147 to fix the failures there

Pull Request resolved: pytorch#175274
Approved by: https://github.com/xmfan, https://github.com/malfet

(cherry picked from commit 268cfa7)

Co-authored-by: Huy Do <huydhn@gmail.com>
…75781)

[CI] Switch vLLM test and benchmark workflows to CUDA 13.0 (pytorch#175393)

We should run vLLM test and benchmark on CUDA 13.0 now
Pull Request resolved: pytorch#175393
Approved by: https://github.com/zou3519

(cherry picked from commit 72d0e64)

Co-authored-by: Huy Do <huydhn@gmail.com>
Two tweaks:

* Move some tests around to match what they are in vLLM.  I'll work on a proper fix for this later to avoid the need to do this manually
* Fix 12.8 build. See vllm-project/vllm#34791

Pull Request resolved: pytorch#175238
Approved by: https://github.com/angelayi, https://github.com/zou3519

Co-authored-by: PyTorch UpdateBot <pytorchupdatebot@users.noreply.github.com>
[ROCm][CI] Upgrade ROCm CI to 7.2 - 4/N (pytorch#173188)

In parallel with pytorch#173187

Pull Request resolved: pytorch#173188
Approved by: https://github.com/jeffdaily



(cherry picked from commit 8301e14)

Co-authored-by: Jithun Nair <jithun.nair@amd.com>
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Co-authored-by: Jack Taylor <jack.taylor@amd.com>
[ROCm] Added CUDA check to test_pattern_matcher (pytorch#175092)

Forward fix to pytorch#173856.

Pull Request resolved: pytorch#175092
Approved by: https://github.com/jeffdaily, https://github.com/Skylion007

(cherry picked from commit f6dcaa3)

Co-authored-by: Arash Pakbin <arash.pakbin@amd.com>
…ytorch#175672)

* [WINDOWS][cuDNN] Fix cuDNN version mismatch in Windows (pytorch#175547)

Authored with claude code
Previous PRs such as pytorch#174310 updated cuDNN versions for Linux builds but neglected to do so for Windows.

Claude wrote all of the lintrunner additions for consistency checking
Pull Request resolved: pytorch#175547
Approved by: https://github.com/Skylion007, https://github.com/atalman, https://github.com/malfet

* [cuDNN] Upgrade cuDNN to 9.19 for 12.8 and 13.0 wheels (pytorch#174310)

Currently being tested internally, currently looks OK

also needed for pytorch#172108

Pull Request resolved: pytorch#174310
Approved by: https://github.com/Skylion007, https://github.com/ngimel, https://github.com/malfet
Fix pep517 release handling (pytorch#175635)

Fix pep517 release handling

Fix sdist upload: correct PEP 440 version and file path
PYTORCH_BUILD_VERSION was being set unconditionally to the raw tag/branch
name (including 'v' prefix for tags), which fails PEP 440 validation in
get_torch_version(), and was not exported so Python subprocesses couldn't
see it anyway.

Fix both issues: set and export PYTORCH_BUILD_VERSION only for release/RC
tags, stripping the 'v' prefix and converting '-rc' to 'rc' for PEP 440
compliance. For branch pushes and PRs, leave it unset so get_torch_version
falls back to version.txt.

Also fix the sdist upload path: python -m build places the sdist in dist/,
so move it to the workspace root for consistency with all upload steps
(release, GHA artifact, and S3).

These fixes are tested/verified in the second PR in this stack.

This commit was created with the help of Claude Sonnet 4.6.

Pull Request resolved: pytorch#175635
Approved by: https://github.com/atalman, https://github.com/malfet

(cherry picked from commit 11eba5b)

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@quansight.com>
…75955)

1. Docker image switch — All workflows that used pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-inductor-benchmarks now use the cuda13.0 variant. The
  unused CUDA 12.8 image definition was removed from .ci/docker/build.sh and its duplicate entry dropped from docker-builds.yml.
  3. Duplicate cleanup — Five workflows previously had both a CUDA 12.8 build and a separate -cuda13 build. After migrating the main build to CUDA 13.0, the
   -cuda13 duplicates were removed:
    - inductor-periodic.yml — removed periodic-dynamo-benchmarks-build-cuda13 + test
    - inductor-micro-benchmark.yml — removed build-cuda13 + test-cuda13
    - inductor-perf-compare.yml — removed build-cuda13 + test-cuda13
    - inductor-perf-test-nightly.yml — removed build-cuda13 + 3 test jobs
    - trunk.yml — removed inductor-build-cuda13
Pull Request resolved: pytorch#175826
Approved by: https://github.com/atalman

Signed-off-by: Huy Do <huydhn@gmail.com>
…76408)

update previous version 2.10 installation in get start xpu  (pytorch#176141)

update previous version 2.10 installation in get start xpu for release 2.11
Pull Request resolved: pytorch#176141
Approved by: https://github.com/EikanWang

(cherry picked from commit 14f828c)

Co-authored-by: ZhaoqiongZ <106125927+ZhaoqiongZ@users.noreply.github.com>
…6783)

[inductor] Fix Identity comparability and evalf recursion (pytorch#175975)

Fixes pytorch#175856

## Summary

This PR adds a narrow `Identity._eval_evalf(self, prec)` override in
`torch/utils/_sympy/functions.py` to fix the SymPy recursion/comparison failure
seen in Inductor simplification (e.g. `Max(0, Identity(-6))`).

The implementation only unwraps comparable integer constants:

```python
def _eval_evalf(self, prec):
    arg = self.args[0]
    if arg.is_Integer and arg.is_comparable:
        return arg
    return None
```

This keeps the fix minimal for the index-math path involved in the bug.

Tests
Added targeted tests in test/inductor/test_utils.py:

`testIdentityComparisonNoRecursion`
`testIdentityComparableNumbersInMinMax`
`testIdentityEvalfIntegerOnly`

Validation
Repro fails on unpatched builds in the same SymPy/Inductor path.
Repro passes with this fix applied.

Pull Request resolved: pytorch#175975
Approved by: https://github.com/azahed98, https://github.com/laithsakka

(cherry picked from commit cea64de)

Co-authored-by: bhack <bhack@users.noreply.github.com>
…nge (pytorch#175333)

[XPU] Fix SyclExtension Windows build for oneAPI 2025.3+ breaking change (pytorch#170701)

## Summary
Fixes SyclExtension compilation on Windows when using oneAPI 2025.3 or higher.

## Problem
oneAPI 2025.3 introduced a breaking change in how include paths are ordered to align with MSVC behavior. This causes build failures when compiling SyclExtension on Windows.

The issue occurs because MSVC include directories are explicitly passed on the compiler command line. With the new include path ordering in oneAPI 2025.3, this causes the wrong std headers included.

These MSVC directories are already added as correctly-ordered implicit include paths by the compiler, so they should not need to be passed explicitly on the command line. Passing them explicitly disrupts the intended include order.

## Solution
When building SYCL extensions on Windows with oneAPI version >= 2025.3, filter out Microsoft Visual Studio paths from the compiler's include directories.

The fix is version-gated to only apply for oneAPI 2025.3+ to avoid affecting users on older oneAPI versions.

Fixes:
intel/torch-xpu-ops#2574

Pull Request resolved: pytorch#170701
Approved by: https://github.com/dvrogozh, https://github.com/EikanWang, https://github.com/atalman

(cherry picked from commit a09b29e)

Co-authored-by: astachowiczhabana <adam.stachowicz@intel.com>
…n. (pytorch#176410)

[Inductor] Reject non-contiguous subnode fusion in mix-order reduction. (pytorch#176131)

We observed assert error after PR pytorch#174947 on XPU in intel/torch-xpu-ops#2932:
The assert error in line L2125:
https://github.com/pytorch/pytorch/blob/f99ab991dcd3719ee25dd3377a53ea12e518308e/torch/_inductor/scheduler.py#L2122-L2125

which is caused by:
https://github.com/pytorch/pytorch/blob/f99ab991dcd3719ee25dd3377a53ea12e518308e/torch/_inductor/scheduler.py#L2200-L2203

Root cause:
- MixOrderReduction.can_fuse is a pre-fusion heuristic; it only checks static conditions (both reductions, reversed orders, common reads, one contiguous pre-fusion, size/heuristics). It cannot see access-pattern changes introduced by backend.fuse.
- In the failing case, self.node1=op1115 (reduction, contiguous=True) is fused with other=op1123 (pointwise, contiguous=False), producing fused_node=op1115_op1123 (non-contiguous). self.node2=op1117_op1119 is already non-contiguous. The mix-order reduction invariant (at least one side contiguous) is violated, so FusedMixOrderReductions would assert.
```
self.node1 = op1115  (SchedulerNode, reduction, contiguous=True)
other      = op1123  (SchedulerNode, pointwise, contiguous=False)

backend.fuse(self.node1, other)
        |
        v
fused_node = op1115_op1123 (FusedSchedulerNode, reduction+pointwise, contiguous=False)

self.node2 = op1117_op1119 (FusedSchedulerNode, reduction+reduction, contiguous=False)

mix-order reduction attempt:
fused_node  +  self.node2  ->  FusedMixOrderReductions  (assert fails)
```

Fix:
- Add a general post-fusion validation in FusedMixOrderReductions.fuse_with: after backend.fuse, re-check the contiguity invariant and reject the fusion if both sides are non-contiguous.
- Implement a FusionRejected signal and catch it in Scheduler.fuse_two_nodes to keep nodes unfused.

Test:
- Added a regression test which reproduced the assert error on **cuda/xpu** and pass with this PR.
-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Pull Request resolved: pytorch#176131
Approved by: https://github.com/shunting314

(cherry picked from commit 5a6d6b3)

Co-authored-by: xinan.lin <xinan.lin@intel.com>
…6228) (pytorch#176495)

The default >1 num_stages have causing multiple out of shared memory issues. Make it to be 1 by default.

We could explore other alternatives
1. always add a config with num_stages=1 while keeping the current heuristics. Could increase compilation time
2. dynamically scale down num-stages if all config fail to compile due to out of shared memory
3. minic Triton logic to estimate the amount of shared memory needed per stage and set num-stages accordingly based on smem capacity.

Pull Request resolved: pytorch#176228
Approved by: https://github.com/eellison, https://github.com/drisspg, https://github.com/jansel

(cherry picked from commit ab17a38)
Fix the torch.Stream context manager reentrance (pytorch#176568)

# Motivation
This PR aims to fix `torch.Stream` as a context manager nested/reentrance scenario. `torch.cuda.stream` and `torch.xpu.stream` could support these usages.

The following scenario would be fixed with this PR:
```python
import torch
s0 = torch.Stream()
with s0, s0:
    pass
```
```python
import torch
s0 = torch.Stream()
s1 = torch.Stream()
with s0, s1:
    with s0, s1:
        pass
```

# Addtional Context
Fix pytorch#176560

Pull Request resolved: pytorch#176568
Approved by: https://github.com/albanD

(cherry picked from commit d43570c)

Co-authored-by: Yu, Guangye <guangye.yu@intel.com>
ethanwee1 and others added 25 commits April 3, 2026 07:14
…eCUDA::test_flash_attn_backward_mixed_strides_cuda#179086 (#3127)

`dv` tensor should be created with `empty_like(v)` rather than
`empty_like(k)`.

This fixes pytorch#168540, pytorch#168541, and supersedes pytorch#178499

This is cherry-picked from upstream PR
pytorch#179086
Build validation: 
http://rocm-ci.amd.com/job/pytorch2.11-manylinux-wheels_rel-7.2/7/ :
Connection issues

https://github.com/ROCm/TheRock/actions/runs/23953043418/job/69864879059
: Build succeeded

---------

Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
## Motivation

Fix numpy compatibility for Python 3.14 for release/2.11

## Technical Details

- `numpy==2.1.2` has no cp314 wheels on PyPI, causing Python 3.14 builds
in TheRock CI to fail with a meson/sccache error when pip falls back to
building numpy from source
- Add `python_version` markers to use `numpy==2.4.3` for Python 3.14+,
while keeping the existing `numpy==2.1.2` pin for older Python versions

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Co-authored-by: Subodh Dubey <Subodh.Dubey@amd.com>
….sh before sourcing (#3163)

## Summary

Fixes the `pytorch_ut` failure introduced in PyTorch 2.11 where
`test.sh` exits immediately with code 1 before any tests run.

**Root cause:** PR pytorch#168377 added `source
/etc/rocm_env.sh` to `.ci/pytorch/common.sh` targeting AMD's internal
Jenkins CI, which provisions this file. When cherry-picked into
`release/2.11`, this line breaks all TheRock Docker-based CI
environments that do **not** provision `/etc/rocm_env.sh`. Since `set
-e` is active in `test.sh`, the script exits before a single test runs —
causing 0-pass, 1-fail on every host.

**The fix:** Add a `[[ -f /etc/rocm_env.sh ]]` existence check so
environments without the file skip sourcing it gracefully, while Jenkins
CI (which does provision the file) continues working as before. This
matches the fix already present on `pytorch/pytorch main`.

```bash
# Before (broken):
if [[ "${BUILD_ENVIRONMENT}" == *rocm* ]]; then
  source /etc/rocm_env.sh
fi

# After (fixed):
if [[ "${BUILD_ENVIRONMENT}" == *rocm* ]] && [[ -f /etc/rocm_env.sh ]]; then
  source /etc/rocm_env.sh
fi
```

**Impact without this fix:**
- 86/97 `pytorch_ut` runs failed on TheRock build 7.13.0-1208
- Affects all GFX variants and Python versions (3.11, 3.12, 3.13)
- PyTorch 2.10 is unaffected (does not have `source /etc/rocm_env.sh`)

**References:**
- Jira: ROCM-21809
- Upstream issue: pytorch#170983
- Regression introduced by: pytorch#168377
) (#3164)

On Windows with HIP/ROCm, std::memcpy is a __host__ function and cannot
be called from __device__ code. Use raw memcpy (which the HIP compiler
provides as a device builtin) when building on Windows.

This will allow builds for of pytorch for gfx942 on Windows. gfx950 is
yet to be tested but it should likely build as well.

Pull Request resolved: pytorch#175410
Approved by: https://github.com/jeffdaily

Co-authored-by: Aaryaman Vasishta <aaryaman.vasishta@amd.com>
…ch#178195) (#3169)

Cherry-pick of upstream pytorch#178195
into `release/2.11`.

Related PR:
- #3168

## Motivation

For MI350, FP64 is supported in hipBLASLt. This PR enables FP64 on
hipBLASLt in TunableOp and re-enables the FP64 unit test on MI350.

## Technical Details

- Map `double` GEMM to `HIPBLAS_COMPUTE_64F` via a new
`HipBlasComputeTypeFor<CT>()` helper (defaults to `HIPBLAS_COMPUTE_32F`,
specialized to `HIPBLAS_COMPUTE_64F` for `double`).
- Use `at::opmath_type<T>`-typed `alpha` / `beta` in the hipBLASLt path
so FP64 tuning and execution use consistent compute semantics.
- Set the matmul descriptor scale type with
`HipDataTypeFor<opmath_t>()`.
- Guard the TF32 override with `if constexpr (std::is_same_v<CT,
float>)` so FP64 doesn't get downgraded.
- Removes the MI350 skip on
`test_matmul_small_brute_force_tunableop_cuda_float64`.

The cherry-pick applied cleanly (no conflicts).

## Test Plan

Build PyTorch on MI350 with ROCm, then run:

\`\`\`
PYTORCH_TEST_WITH_ROCM=1 python test/test_linalg.py -v -k tunableop
\`\`\`

## Test Result

\`\`\`
Ran 69 tests in 156.726s

OK (skipped=42)
\`\`\`

All tunableop tests pass. Skipped tests are CPU-only variants and
gfx942-only variants (FP8/TF32).

Upstream PR: pytorch#178195
Upstream commit: 0550897

Made with [Cursor](https://cursor.com)
#3148)

- This PR updates the Numba version constraints to correctly handle
Python 3.14 and aligns the platform conditions with Numba’s current
support matrix.
- Add a new rule selecting numba==0.64.0 for Python ≥ 3.14

---------

Co-authored-by: sohbodas <Soham.Bodas@gmail.com>
Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
…3181)

<h2>Fix MIOpen CTC loss access violation on Windows discrete GPUs</h2>

<h3>Problem</h3>

<p>A failing unit test on Windows started showing a couple weeks ago and
a missing <code>#include</code> was added in
[](pytorch#178284), but CI on TheRock
kept failing. The fix was tested on gfx1151 (APU), where the test
passed, but CI showed failures on gfx1100. </p>

<p><code>test_CTCLoss_no_batch_dim</code> (and any code path hitting
<code>miopen_ctc_loss</code>) crashes with a fatal access violation on
Windows systems with discrete AMD GPUs:</p>

<pre><code>Windows fatal exception: access violation Exception Code:
0xC0000005
#0 miopen::CTCLossDescriptor::GetCTCLossWorkspaceSize
(MIOpen.dll+0x14fde4) #1 miopenGetCTCLossWorkspaceSize
(MIOpen.dll+0x150912) #2 at::native::miopen_ctc_loss (torch_hip.dll)
</code></pre>

<h3>Root Cause</h3>

<p><code>miopenGetCTCLossWorkspaceSize</code> and
<code>miopenCTCLoss</code> read the <code>labels</code>,
<code>label_lengths</code>, and <code>input_lengths</code> arrays
<strong>on the host side</strong> to plan the computation and calculate
workspace requirements. The existing code copies these arrays to GPU
memory and passes device pointers:</p>

<pre><code>Tensor labels_gpu = targets_t.to(Device(at::kCUDA),
at::kInt); // ... hipMemcpy to GPU ...
MIOPEN_CHECK(miopenGetCTCLossWorkspaceSize(...,
    labels_gpu.data_ptr&lt;int&gt;(),          // device pointer
    label_lengths_gpu.data_ptr&lt;int&gt;(),   // device pointer
    input_lengths_gpu.data_ptr&lt;int&gt;()    // device pointer
));
</code></pre>

<p>This works on:</p>
<ul>
<li><strong>Linux</strong> — HSA (Heterogeneous System Architecture)
maps GPU allocations into the process virtual address space, making
device pointers host-readable</li> <li><strong>Windows APUs</strong> —
CPU and iGPU share system RAM, so device pointers point to
host-accessible memory</li> </ul>

<p>This crashes on:</p>
<ul>
<li><strong>Windows dGPUs</strong> — GPU has dedicated VRAM across PCIe;
device pointers are opaque handles that cannot be dereferenced from host
code</li> </ul>

<h3>Verification</h3>

<p>Tested on gfx1201:</p>

<table border="1" cellpadding="6" cellspacing="0">
<tr><th>Check</th><th>Result</th></tr>

<tr><td><code>hipDeviceAttributeIntegrated</code></td><td><code>0</code>
(discrete GPU)</td></tr>
<tr><td><code>hipDeviceAttributeCanUseHostPointerForRegisteredMem</code></td><td><code>0</code></td></tr>
<tr><td><code>hipDeviceAttributeManagedMemory</code></td><td><code>0x7FFFFFFF</code>
(unsupported)</td></tr>
<tr><td><code>hipDeviceAttributeUnifiedAddressing</code></td><td><code>0x7FFFFFFF</code>
(unsupported)</td></tr> <tr><td>Host read of <code>hipMalloc</code>
pointer via <code>ctypes</code></td><td>Access violation</td></tr>
<tr><td>CTC loss with CPU pointers</td><td>Pass (forward +
backward)</td></tr> </table>

<h3>Fix</h3>

<p>Use host pointers since this is what MIOpen expects should be
used.</p>

<h3>Testing</h3>

<p>Run all existing CTCLoss unit tests.</p>

Pull Request resolved: pytorch#179264
Approved by: https://github.com/jeffdaily

Co-authored-by: Milica Stankovic <mstankov@amd.com>
…ch (#3161)

Cherry pick of pytorch#178284

Fixes ROCm/TheRock#3987

Co-authored-by: Milica Stankovic <milica.stankovic@amd.com>
Cherry pick of pytorch#176024

Co-authored-by: nkhasbag <nkhasbag@nvidia.com>
Co-authored-by: PyTorch MergeBot <pytorchmergebot@users.noreply.github.com>
Co-authored-by: Nikita Shulga <nshulga@meta.com>
…l_count_tunableop to correctly extract kernel names for RDNA (#3185)

Cherry-pick of #2954

Co-authored-by: Uros Markovic <umarkovi@amd.com>
#3144)

## Motivation

- Enabling gfx103X-all wheels in TheRock is currently blocked due to
PyTorch CI failures caused by a lack of `gfx1033` support in CK.
ROCm/rocm-libraries#5141 resolves these issues.

## Technical Details

- The aforementioned fix has been cherrypicked into the
`pytorch/release/2.11/` branch of ROCm/composable_kernel - this PR bumps
the `third_party/composable_kernel` branch to pick up these changes.

## Test Plan

- Trigger a build and verify it passes

## Test Result

- Build succeeds for `cherrypick-gfx1033-CK-support-torch2.11` branch.
https://github.com/ROCm/TheRock/actions/runs/24195531659/job/70624339554

  - Testing
  Pasting offline comments from @harkgill-amd 
> In
https://github.com/ROCm/TheRock/actions/runs/24906345786/job/72942139688
Pytorch 3.10  + release/2.11 -> Pass
Pytorch 3.11 + release/2.11 -> TestNN.test_Embedding_discontiguous_cuda
failed but this seems to be a known flaky test and will be disabled with
ROCm/TheRock#4775
Pytorch 3.12  + release/2.11 -> Pass
Pytorch 3.13  + release/2.11 -> Pass
In
https://github.com/ROCm/TheRock/actions/runs/25002732513/job/73225027260
Pytorch 3.14 + release/2.11 -> The failing tests here all share the same
miopenStatusUnknownError message. These are the same failures as seen in
the main branch run here
https://github.com/ROCm/TheRock/actions/runs/24985367049 so they aren't
related to my PR

## Submission Checklist

- [X] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
`my_lib` in `test_storage_preserve_nonhermetic_in_hermetic_context`
leaks into global op space after the test ends and affect subsequent
tests in the same process using dynamo.

Without the fix, running any tests requiring checkpoint/compile or
dynamo-related after
`test_storage_preserve_nonhermetic_in_hermetic_context` fails with
```
torch._dynamo.exc.BackendCompilerFailed: backend='aot_eager' raised:
TypeError: 'CustomDecompTable' object is not a mapping
```
e.g. `python -m pytest -v
pytorch/test/test_torch.py::TestTorch::test_storage_preserve_nonhermetic_in_hermetic_context
pytorch/test/test_autograd.py::TestAutograd::test_checkpoint_compile_no_recompile`

Upstream PR: pytorch#180998

Signed-off-by: zichguan-amd <zichuan.guan@amd.com>
…on mask optimization pytorch#176269 (#3156)

Cherry-pick of #3055

Co-authored-by: Strahinja Stamenkovic <sstamenk@amd.com>
#3191)

Fixes a bug where FlexibleLayout on a ReinterpretView incorrectly
returns underlying physical buffer strides (e.g., 4D) instead of logical
view strides (3D).

This patch skips speculative layout and constraint tracking for
ReinterpretView nodes, forcing the use of node.get_stride() to prevent
Illegal Memory Access (IMA) on ROCm.

Manual backport from PyTorch 2.12.
Ref commit:
pytorch@0e1f562

## Motivation

<!-- Explain the purpose of this PR and the goals it aims to achieve.
-->

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
## PR Summary

Fixes pytorch#178455 ignore_logger_methods was renamed to
ignore_logging_functions in torch 2.11 but wasn't added to blocklist in
_get_dynamo_config_for_logging()

## Repro
```
  import torch
  import torch._dynamo.config
  import torch._dynamo.utils

  torch._dynamo.config.ignore_logging_functions.add(print)
  torch._dynamo.utils._get_dynamo_config_for_logging()
```
## Changes

* Include `ignore_logging_functions` from
`_get_dynamo_config_for_logging()` (consistent with existing
`ignore_logger_methods`)
* Add a regression test to ensure no crash when logging config includes
builtin functions *

Added a test that:

* Inserts `print` into `ignore_logging_functions`
* Verifies `_get_dynamo_config_for_logging()` returns valid JSON without
errors

related issue: pytorch#178455

Pull Request resolved: pytorch#178506
Approved by: https://github.com/Lucaskabela


(cherry picked from commit 7eea8ea)

## Motivation

<!-- Explain the purpose of this PR and the goals it aims to achieve.
-->

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Co-authored-by: vvvdwbvvv <vvvdwbvvv@gmail.com>
Co-authored-by: PyTorch MergeBot <pytorchmergebot@users.noreply.github.com>
…h#180998) (#3221)

Cherry pick to 2.11 release

`my_lib` in `test_storage_preserve_nonhermetic_in_hermetic_context`
leaks into global op space after the test ends and affect subsequent
tests in the same process using dynamo.

Without the fix, running any tests requiring checkpoint/compile or
dynamo-related after
`test_storage_preserve_nonhermetic_in_hermetic_context` fails with
```
torch._dynamo.exc.BackendCompilerFailed: backend='aot_eager' raised:
TypeError: 'CustomDecompTable' object is not a mapping
```
e.g. `python -m pytest -v
pytorch/test/test_torch.py::TestTorch::test_storage_preserve_nonhermetic_in_hermetic_context
pytorch/test/test_autograd.py::TestAutograd::test_checkpoint_compile_no_recompile`

Pull Request resolved: pytorch#180998
Approved by: https://github.com/albanD, https://github.com/ezyang

---------

Co-authored-by: Claude Opus 4 <noreply@anthropic.com>
…aLauncher (#3238)

## Summary
- Backports upstream PyTorch PR pytorch#183926 to ROCm
release/2.11.
- Uses `hipModuleLoadData` for ROCm static launcher module loading to
avoid retaining open HSACO file descriptors.
- Leaves the CUDA/NVIDIA path unchanged.
- Resolves Jira https://amd-hub.atlassian.net/browse/ROCM-24659,
https://amd-hub.atlassian.net/browse/ROCM-24664

Made with [Cursor](https://cursor.com)

Co-authored-by: PyTorch MergeBot <pytorchmergebot@users.noreply.github.com>
#3245)

…Class (pytorch#180736)

TestPrologueFusion and TestEpilogueFusionStaticAnalysis both use
ExitStack in setUpClass to apply config.patch(), but neither defined
tearDownClass to close the stack. When TestPrologueFusion runs before
TestEpilogueFusionStaticAnalysis in the same process, config values like
max_autotune_gemm_backends="TRITON" leak through, removing the aten
kernel choice from autotuning and causing test failures.

Fixes pytorch#179693

Pull Request resolved: pytorch#180736
Approved by: https://github.com/Skylion007

## Motivation

<!-- Explain the purpose of this PR and the goals it aims to achieve.
-->

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Co-authored-by: NikhilAPatel <nikhilap@meta.com>
pytorch#175817) (#3241)

## Motivation

Aimed as a fix for test
`TestMemPool.test_graph_capture_reclaim_shared_pool` failing in TheRock
wheels: ROCm/TheRock#4925

The test was brought into `release/2.11` by the cherry-pick of upstream
pytorch#176024 in #3182, but the allocator fix from upstream
pytorch#175817 was not.

Without this fix, `endAllocateToPool` (called from
`CUDAGraph::capture_end`) does not reclaim `record_stream`-deferred
blocks, so a second graph capture into the same shared pool cannot reuse
the block freed in the first capture.

## Technical Details

Cherry-pick of upstream pytorch#175817 (commit
`b55e5314fb72f1ea782f72a6c9728a40c12678ea`) on top of `release/2.11`.

## Test Plan

- Build PyTorch wheels from this branch and verify that the test
`TestMemPool.test_graph_capture_reclaim_shared_pool` is now passing.

## Test Result

- `TestMemPool.test_graph_capture_reclaim_shared_pool` passed for torch
2.11:
https://github.com/ROCm/TheRock/actions/runs/26116907093/job/76816330885

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Co-authored-by: Frank Lin <eee4017@gmail.com>
…2_IFU_20260521

# Conflicts:
#	.ci/docker/build.sh
#	.ci/docker/ci_commit_pins/huggingface-requirements.txt
#	.ci/docker/ci_commit_pins/triton.txt
#	.ci/docker/common/install_cuda.sh
#	.ci/docker/requirements-ci.txt
#	.ci/docker/requirements-docs.txt
#	.ci/lumen_cli/cli/lib/core/vllm/vllm_test_library.yaml
#	.ci/manywheel/build_cuda.sh
#	.ci/pytorch/common.sh
#	.ci/pytorch/common_utils.sh
#	.ci/pytorch/windows/internal/cuda_install.bat
#	.github/ci_commit_pins/vllm.txt
#	.github/ci_commit_pins/xla.txt
#	.github/scripts/build_triton_wheel.py
#	.github/scripts/filter_test_configs.py
#	.github/scripts/generate_binary_build_matrix.py
#	.github/templates/common.yml.j2
#	.github/templates/linux_binary_build_workflow.yml.j2
#	.github/templates/macos_binary_build_workflow.yml.j2
#	.github/templates/windows_binary_build_workflow.yml.j2
#	.github/workflows/_bazel-build-test.yml
#	.github/workflows/_binary-build-flash-attention-wheel-linux.yml
#	.github/workflows/_binary-build-flash-attention-wheel-windows.yml
#	.github/workflows/_binary-build-linux.yml
#	.github/workflows/_binary-test-linux.yml
#	.github/workflows/_binary-upload.yml
#	.github/workflows/_docs.yml
#	.github/workflows/_link_check.yml
#	.github/workflows/_linux-build.yml
#	.github/workflows/_linux-test-stable-fa3.yml
#	.github/workflows/_linux-test.yml
#	.github/workflows/_mac-build.yml
#	.github/workflows/_mac-test.yml
#	.github/workflows/_rocm-test.yml
#	.github/workflows/_runner-determinator.yml
#	.github/workflows/_vllm-benchmark.yml
#	.github/workflows/_win-build.yml
#	.github/workflows/_win-test.yml
#	.github/workflows/_xpu-test.yml
#	.github/workflows/b200-distributed.yml
#	.github/workflows/b200-symm-mem.yml
#	.github/workflows/build-almalinux-images.yml
#	.github/workflows/build-libtorch-images.yml
#	.github/workflows/build-manywheel-images-s390x.yml
#	.github/workflows/build-manywheel-images.yml
#	.github/workflows/build-triton-wheel.yml
#	.github/workflows/build-vllm-wheel.yml
#	.github/workflows/claude-code.yml
#	.github/workflows/claude-issue-triage-run.yml
#	.github/workflows/close-nonexistent-disable-issues.yml
#	.github/workflows/create_release.yml
#	.github/workflows/docker-builds.yml
#	.github/workflows/docker-cache-rocm.yml
#	.github/workflows/docker-release.yml
#	.github/workflows/dynamo-unittest.yml
#	.github/workflows/generated-linux-aarch64-binary-manywheel-nightly.yml
#	.github/workflows/generated-linux-binary-libtorch-nightly.yml
#	.github/workflows/generated-linux-binary-manywheel-nightly.yml
#	.github/workflows/generated-linux-s390x-binary-manywheel-nightly.yml
#	.github/workflows/generated-macos-arm64-binary-libtorch-release-nightly.yml
#	.github/workflows/generated-windows-arm64-binary-libtorch-debug-nightly.yml
#	.github/workflows/generated-windows-arm64-binary-libtorch-release-nightly.yml
#	.github/workflows/generated-windows-arm64-binary-wheel-nightly.yml
#	.github/workflows/generated-windows-binary-libtorch-debug-nightly.yml
#	.github/workflows/generated-windows-binary-libtorch-release-nightly.yml
#	.github/workflows/generated-windows-binary-wheel-nightly.yml
#	.github/workflows/h100-cutlass-backend.yml
#	.github/workflows/h100-distributed.yml
#	.github/workflows/h100-symm-mem.yml
#	.github/workflows/inductor-micro-benchmark.yml
#	.github/workflows/inductor-nightly.yml
#	.github/workflows/inductor-pallas.yml
#	.github/workflows/inductor-perf-compare.yml
#	.github/workflows/inductor-perf-test-b200.yml
#	.github/workflows/inductor-perf-test-nightly-aarch64.yml
#	.github/workflows/inductor-perf-test-nightly-h100.yml
#	.github/workflows/inductor-perf-test-nightly-rocm-mi300.yml
#	.github/workflows/inductor-perf-test-nightly-rocm-mi355.yml
#	.github/workflows/inductor-perf-test-nightly-x86-zen.yml
#	.github/workflows/inductor-perf-test-nightly-x86.yml
#	.github/workflows/inductor-perf-test-nightly-xpu.yml
#	.github/workflows/inductor-perf-test-nightly.yml
#	.github/workflows/inductor-periodic.yml
#	.github/workflows/inductor-rocm-mi200.yml
#	.github/workflows/inductor-rocm-mi300.yml
#	.github/workflows/inductor-rocm-mi355.yml
#	.github/workflows/inductor-unittest.yml
#	.github/workflows/inductor.yml
#	.github/workflows/lint-autoformat.yml
#	.github/workflows/lint-bc.yml
#	.github/workflows/lint.yml
#	.github/workflows/linux-aarch64.yml
#	.github/workflows/llm_td_retrieval.yml
#	.github/workflows/nightly-s3-uploads.yml
#	.github/workflows/nightly.yml
#	.github/workflows/nitpicker.yml
#	.github/workflows/operator_microbenchmark.yml
#	.github/workflows/periodic-rocm-mi200.yml
#	.github/workflows/periodic-rocm-mi300.yml
#	.github/workflows/periodic-rocm-mi355.yml
#	.github/workflows/periodic.yml
#	.github/workflows/pull.yml
#	.github/workflows/quantization-periodic.yml
#	.github/workflows/rocm-mi200.yml
#	.github/workflows/rocm-mi300.yml
#	.github/workflows/rocm-mi355.yml
#	.github/workflows/rocm-navi31.yml
#	.github/workflows/rocm-nightly.yml
#	.github/workflows/slow-rocm-mi200.yml
#	.github/workflows/slow.yml
#	.github/workflows/target-determination-indexer.yml
#	.github/workflows/target_determination.yml
#	.github/workflows/test-b200.yml
#	.github/workflows/test-check-binary.yml
#	.github/workflows/test-h100.yml
#	.github/workflows/tools-unit-tests.yml
#	.github/workflows/torchbench.yml
#	.github/workflows/trunk-rocm-sandbox.yml
#	.github/workflows/trunk.yml
#	.github/workflows/unstable.yml
#	.github/workflows/update-viablestrict.yml
#	.github/workflows/update_pytorch_labels.yml
#	.github/workflows/upload-test-stats-while-running.yml
#	.github/workflows/upload-test-stats.yml
#	.github/workflows/upload-torch-dynamo-perf-stats.yml
#	.github/workflows/upload_test_stats_intermediate.yml
#	.github/workflows/vllm-benchmark.yml
#	.github/workflows/weekly.yml
#	.github/workflows/xpu.yml
#	aten/src/ATen/native/cuda/Sorting.cu
#	aten/src/ATen/native/cuda/SortingRadixSelect.cuh
#	aten/src/ATen/native/cuda/TensorTopK.cu
#	benchmarks/dynamo/timm_models.py
#	c10/cuda/CUDAAllocatorConfig.h
#	c10/cuda/CUDACachingAllocator.cpp
#	related_commits
#	requirements-build.txt
#	requirements.txt
#	test/cpp_extensions/test_libtorch_agnostic.py
#	test/distributed/tensor/test_tensor_ops.py
#	test/distributed/test_dynamo_distributed.py
#	test/inductor/test_ck_backend.py
#	test/inductor/test_mix_order_reduction.py
#	test/inductor/test_mps_basic.py
#	test/inductor/test_pattern_matcher.py
#	test/inductor/test_torchinductor_dynamic_shapes.py
#	test/test_cuda.py
#	test/test_mps.py
#	test/test_transformers.py
#	third_party/composable_kernel
#	tools/stats/import_test_stats.py
#	torch/_inductor/config.py
#	torch/_inductor/fx_passes/post_grad.py
#	torch/_inductor/runtime/triton_heuristics.py
#	torch/_inductor/select_algorithm.py
#	torch/testing/_internal/common_methods_invocations.py
#	torch/testing/_internal/opinfo/definitions/linalg.py
#	version.txt
@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api Bot commented May 21, 2026

Jenkins build for f28bb501cbe78f75c6d82f300813a254b63dfec1 commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.