[AUTOGENERATED] release/2.12_IFU_20260521#3248
Open
rocm-repo-management-api-6[bot] wants to merge 88 commits into
Open
[AUTOGENERATED] release/2.12_IFU_20260521#3248rocm-repo-management-api-6[bot] wants to merge 88 commits into
rocm-repo-management-api-6[bot] wants to merge 88 commits into
Conversation
* [RELEASE 2.11] Release only changes * remove_file * Trigger rebuild
Update inductor expected accuracy files (pytorch#175041) ## Summary This PR updates the expected accuracy CSV files for inductor benchmarks based on CI results from PyTorch commit 93dd774. These files serve as reference points for dynamo/inductor CI to track: - Graph breaks - Model accuracy ## Changes - Updated CUDA expected accuracy files in `benchmarks/dynamo/ci_expected_accuracy/` - Updated ROCm expected accuracy files in `benchmarks/dynamo/ci_expected_accuracy/rocm/` ## Test Plan - [ ] Verify that the CI jobs pass with the updated expected accuracy files - [ ] Review the diff to ensure changes are reasonable and expected - [ ] Check that no unexpected regressions are being marked as "expected" Pull Request resolved: pytorch#175041 Approved by: https://github.com/atalman (cherry picked from commit f90c091)
…ch#172373)" (pytorch#175094) This reverts commit 7072636. Reverted pytorch#172373 on behalf of https://github.com/jeffdaily due to PR claims to fix ROCm DISABLED issue but it did not ([comment](pytorch#172373 (comment))) Co-authored-by: PyTorch MergeBot <pytorchmergebot@users.noreply.github.com>
…ytorch#174596)" (pytorch#175095) This reverts commit 781b5d1. Reverted pytorch#174596 on behalf of https://github.com/jeffdaily due to This broke ROCm dynamo benchmarks. Lots of permission denied errors. ([comment](pytorch#174596 (comment))) Co-authored-by: PyTorch MergeBot <pytorchmergebot@users.noreply.github.com>
Fix macOS arm64 libtorch release upload failure (pytorch#175100) **Summary** Failures introduced by following PR: pytorch#173541 The change from RENAME_WHEEL=true to RENAME_WHEEL=false as the default in build_wheel.sh (landed in the 2026-01-31 nightly) broke libtorch builds on macOS arm64. The elif branch at line 220 was missing a BUILD_PYTHONLESS guard, so libtorch builds (BUILD_PYTHONLESS=1) entered the wheel-copy path instead of the libtorch zip-packaging path. This caused the build to produce a .whl artifact instead of the expected .zip files, and the upload script then failed because it looks for *.zip files. The fix adds -z "$BUILD_PYTHONLESS" to the elif condition, matching the guard already present on the if branch. Failures can be seen here: https://hud.pytorch.org/hud/pytorch/pytorch/nightly/1?per_page=50&name_filter=macos-arm64-binary-libtorch-release%20%2F%20libtorch-cpu Failing run: https://github.com/pytorch/pytorch/actions/runs/21541142799/job/62076418921 Successful run (previous nightly): https://github.com/pytorch/pytorch/actions/runs/21508411052/job/61971405484 **Test plan** In CI run ciflow/binaries. Make sure the Rename/Copy log is same as successful run above Pull Request resolved: pytorch#175100 Approved by: https://github.com/huydhn, https://github.com/isuruf (cherry picked from commit bad1df7) Co-authored-by: atalman <atalman@fb.com>
pytorch#175299) [benchmark] Skip pytorch_CycleGAN_and_pix2pix from inductor benchmarks (pytorch#175066) ## Summary Skip the `pytorch_CycleGAN_and_pix2pix` benchmark model from the inductor benchmark suite. This legacy 2017 model has been failing with `eager_fail_to_run` on 100% of commits since mid-2025, providing zero CI signal while consuming ~5.3M GPU-seconds/week across 7+ benchmark jobs on CUDA, CPU, and ROCm. **Estimated savings: ~310 GPU-hours/week (~1,240 GPU-hours/month)** Skip it in `torchbench.yaml` and remove its entries from all 31 expected accuracy CSV files. Also remove it from the `higher_fp16` tolerance list. See P2188981399 for the full CI workflow analysis. ## Test Plan - CI should pass with CycleGAN skipped (it was already failing 100% of the time) - No other benchmark models affected Pull Request resolved: pytorch#175066 Approved by: https://github.com/huydhn, https://github.com/malfet (cherry picked from commit 688c943) Co-authored-by: Eli Uriegas <eliuriegas@meta.com>
…rch#175300) [CI] Move CUDA 12.8 GPU tests from per-commit trunk to periodic (pytorch#175067) ## Summary Move CUDA 12.8 GPU tests from per-commit trunk CI to periodic (~3x/day on weekdays). Both CUDA 12.8 and 13.0 are shipping wheel targets (nightly ships cu126, cu128, cu129, cu130), but their trunk CI test suites have **85-90% failure correlation** -- they almost always fail together. Over a 30-day analysis window covering 97 reverts and 38 significant regression events, **CUDA 12.8 never uniquely caught a regression that 13.0 missed**. CUDA 13.0 is kept per-commit because: - It is the **newest** shipping CUDA version - Most likely to surface **novel breakage** from new CUDA runtime behavior - Forward-looking CI should protect what's coming, not what's already stable CUDA 12.8 is moved to periodic because: - It is **mature and well-understood** -- breakage is less likely and less urgent - The rare 12.8-only regression can tolerate the ~8-hour periodic detection window - The 12.8 build job **remains in trunk** because `cross-compile-linux-test` depends on its artifacts **Estimated savings: ~1,270 GPU-hours/week (~5,080 GPU-hours/month)** This is the #2 savings opportunity from a broader CI workflow analysis (P2188981399) covering 128 PR+trunk jobs over 30 days. Combined with pytorch#175066 (CycleGAN skip, ~310 GPU-hours/week), total savings from this stack: **~1,580 GPU-hours/week (~6,320 GPU-hours/month)**. ### Changes - `trunk.yml`: remove CUDA 12.8 test job (5 default + 3 distributed + 1 pr_time_benchmarks + 1 libtorch shards) and no-ops build - `periodic.yml`: add default (5 GPU shards on g6.4xlarge) and distributed (3 multi-GPU shards on g4dn.12xlarge) to existing CUDA 12.8 periodic entry ## Test Plan - CUDA 12.8 GPU tests continue to run in periodic (3x/day weekdays) - CUDA 13.0 per-commit coverage is unchanged - Cross-compile-linux-test continues to work (12.8 build job kept) Pull Request resolved: pytorch#175067 Approved by: https://github.com/malfet ghstack dependencies: pytorch#175066 (cherry picked from commit ef0353f) Co-authored-by: Eli Uriegas <eliuriegas@meta.com>
[BE] Remove cuda 12.4 periodic tests (pytorch#175170) These tests are either timing out or failing for couple of month now. No reason to keep them around: https://hud.pytorch.org/hud/pytorch/pytorch/main/2?per_page=50&name_filter=12.4 Failures go back as far as 9.29.2025 : https://hud.pytorch.org/pytorch/pytorch/commit/efd7fd5ed5ac7ec03201a546a09fb19ec59de431 Pull Request resolved: pytorch#175170 Approved by: https://github.com/malfet (cherry picked from commit 174157a) Co-authored-by: atalman <atalman@fb.com>
[CI] Add CUDA 13 periodic tests (pytorch#174850) pytorch#173950 To prepare moving CUDA 13 wheels to stable wheels, need to add CUDA 13 periodic cuda tests. Pull Request resolved: pytorch#174850 Approved by: https://github.com/atalman (cherry picked from commit 7cdd4b1) Co-authored-by: Ting Lu <tingl@nvidia.com> Co-authored-by: Andrey Talman <atalman@fb.com>
[ROCm] forward fix pytorch#174087, take 4 (pytorch#175098) vllm build broke due to missing getCurrentHIPStreamMasqueradingAsCUDA. Though it existed in the header aten/src/ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h, this header was not included directly or indirectly by vllm. PR pytorch#174087 subtly broke this even when trying to be backward compatible. Moving the declarations of these Masquerading functions into c10/cuda/CUDAStream.h (c10/hip/HIPStream.h when hipified) fixes the vllm build. Any external projects that had included the HIPStreamMasqueradingAsCUDA.h forward to c10/hip/HIPStream.h anyway. Pull Request resolved: pytorch#175098 Approved by: https://github.com/atalman (cherry picked from commit e6d6f04) Co-authored-by: Jeff Daily <jeff.daily@amd.com>
…pytorch#175580) [MPS] Fix 2-pass SDPA memory corruption by forcing float accumulators (pytorch#174945) Ensure `sums` and `maxs` buffers in `sdpa_vector_2pass_mps` are allocated as `kFloat` instead of inheriting the input dtype. This fixes out-of-bounds memory access and nondeterministic/corrupt results, as reported in pytorch#174861 (reproducible with bf16/fp16 and GQA, seq_len > 1023). Adds a regression test covering bf16/fp16/fp32 and relaxes tolerance for bf16 to validate numerical correctness and determinism on MPS. Fixes pytorch#174861 Pull Request resolved: pytorch#174945 Approved by: https://github.com/malfet (cherry picked from commit c68a1d2) Co-authored-by: Roy Hvaara <roy@lightyear.no>
Disable einops 0.8.2 check on PyTorch (pytorch#175351) Partially revert pytorch#173611 and fallback to the previous behavior on einops, which uses `allow_in_graph`. **Context** * Dynamo does not trace into `@lru_cache` and warns on any usage. * einops uses `@lru_cache` as part of `_prepare_transformation_recipe`. * Every einops op goes through this function. * Dynamo warns on every einops op trace and this creates a logspam problem. Pull Request resolved: pytorch#175351 Approved by: https://github.com/Lucaskabela (cherry picked from commit 1fe0f51) Co-authored-by: Guilherme Leobas <gleobas@quansight.com>
…erator[] access (pytorch#175579) [CPUBLAS] Fix UB: use vector::resize() instead of reserve() before operator[] access (pytorch#175315) Fixes pytorch#175302 ## Summary `reserve(1)` → `resize(1)`. See issue for details. Pull Request resolved: pytorch#175315 Approved by: https://github.com/zou3519, https://github.com/malfet (cherry picked from commit f08aafa) Co-authored-by: mulatta <67085791+mulatta@users.noreply.github.com>
Remove python constraint on setuptools (pytorch#175577) Fixes pytorch#173823 Dependency on setuptools was added 8 years ago here: pytorch#5207 This issue remained hidden since we run smoke test in conda env. Conda create env installs setuptools by default. This became apparent when testing using uv Pull Request resolved: pytorch#175577 Approved by: https://github.com/malfet, https://github.com/seemethere (cherry picked from commit eaa0221) Co-authored-by: atalman <atalman@fb.com>
Supports custom empty tensor in InputObserver (pytorch#174964) When running a LLM handling images and text (Gemma3), the first call to the forward method has input_ids, pixel_values and but no past_key_values. Next calls do not have pixel_values but have past_key_values. The InputObserver knows the whole list of inputs but since, there is only one example of input_pixel (and the batch dimension is usually constant accross all calls), we need to way to tell the InputObserver what a empty tensor for pixel_values when it is missing. Pull Request resolved: pytorch#174964 Approved by: https://github.com/titaiwangms, https://github.com/justinchuby (cherry picked from commit bc9adaa) Co-authored-by: Xavier Dupré <xadupre@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
Bump transformers version to 5.2.0 (pytorch#175274) Take over the Dependabot PR from pytorch#175147 to fix the failures there Pull Request resolved: pytorch#175274 Approved by: https://github.com/xmfan, https://github.com/malfet (cherry picked from commit 268cfa7) Co-authored-by: Huy Do <huydhn@gmail.com>
…75781) [CI] Switch vLLM test and benchmark workflows to CUDA 13.0 (pytorch#175393) We should run vLLM test and benchmark on CUDA 13.0 now Pull Request resolved: pytorch#175393 Approved by: https://github.com/zou3519 (cherry picked from commit 72d0e64) Co-authored-by: Huy Do <huydhn@gmail.com>
Two tweaks: * Move some tests around to match what they are in vLLM. I'll work on a proper fix for this later to avoid the need to do this manually * Fix 12.8 build. See vllm-project/vllm#34791 Pull Request resolved: pytorch#175238 Approved by: https://github.com/angelayi, https://github.com/zou3519 Co-authored-by: PyTorch UpdateBot <pytorchupdatebot@users.noreply.github.com>
[ROCm][CI] Upgrade ROCm CI to 7.2 - 4/N (pytorch#173188) In parallel with pytorch#173187 Pull Request resolved: pytorch#173188 Approved by: https://github.com/jeffdaily (cherry picked from commit 8301e14) Co-authored-by: Jithun Nair <jithun.nair@amd.com> Co-authored-by: Jeff Daily <jeff.daily@amd.com> Co-authored-by: Jack Taylor <jack.taylor@amd.com>
[ROCm] Added CUDA check to test_pattern_matcher (pytorch#175092) Forward fix to pytorch#173856. Pull Request resolved: pytorch#175092 Approved by: https://github.com/jeffdaily, https://github.com/Skylion007 (cherry picked from commit f6dcaa3) Co-authored-by: Arash Pakbin <arash.pakbin@amd.com>
…ytorch#175672) * [WINDOWS][cuDNN] Fix cuDNN version mismatch in Windows (pytorch#175547) Authored with claude code Previous PRs such as pytorch#174310 updated cuDNN versions for Linux builds but neglected to do so for Windows. Claude wrote all of the lintrunner additions for consistency checking Pull Request resolved: pytorch#175547 Approved by: https://github.com/Skylion007, https://github.com/atalman, https://github.com/malfet * [cuDNN] Upgrade cuDNN to 9.19 for 12.8 and 13.0 wheels (pytorch#174310) Currently being tested internally, currently looks OK also needed for pytorch#172108 Pull Request resolved: pytorch#174310 Approved by: https://github.com/Skylion007, https://github.com/ngimel, https://github.com/malfet
Fix pep517 release handling (pytorch#175635) Fix pep517 release handling Fix sdist upload: correct PEP 440 version and file path PYTORCH_BUILD_VERSION was being set unconditionally to the raw tag/branch name (including 'v' prefix for tags), which fails PEP 440 validation in get_torch_version(), and was not exported so Python subprocesses couldn't see it anyway. Fix both issues: set and export PYTORCH_BUILD_VERSION only for release/RC tags, stripping the 'v' prefix and converting '-rc' to 'rc' for PEP 440 compliance. For branch pushes and PRs, leave it unset so get_torch_version falls back to version.txt. Also fix the sdist upload path: python -m build places the sdist in dist/, so move it to the workspace root for consistency with all upload steps (release, GHA artifact, and S3). These fixes are tested/verified in the second PR in this stack. This commit was created with the help of Claude Sonnet 4.6. Pull Request resolved: pytorch#175635 Approved by: https://github.com/atalman, https://github.com/malfet (cherry picked from commit 11eba5b) Co-authored-by: Klaus Zimmermann <klaus.zimmermann@quansight.com>
…75955) 1. Docker image switch — All workflows that used pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-inductor-benchmarks now use the cuda13.0 variant. The unused CUDA 12.8 image definition was removed from .ci/docker/build.sh and its duplicate entry dropped from docker-builds.yml. 3. Duplicate cleanup — Five workflows previously had both a CUDA 12.8 build and a separate -cuda13 build. After migrating the main build to CUDA 13.0, the -cuda13 duplicates were removed: - inductor-periodic.yml — removed periodic-dynamo-benchmarks-build-cuda13 + test - inductor-micro-benchmark.yml — removed build-cuda13 + test-cuda13 - inductor-perf-compare.yml — removed build-cuda13 + test-cuda13 - inductor-perf-test-nightly.yml — removed build-cuda13 + 3 test jobs - trunk.yml — removed inductor-build-cuda13 Pull Request resolved: pytorch#175826 Approved by: https://github.com/atalman Signed-off-by: Huy Do <huydhn@gmail.com>
…76408) update previous version 2.10 installation in get start xpu (pytorch#176141) update previous version 2.10 installation in get start xpu for release 2.11 Pull Request resolved: pytorch#176141 Approved by: https://github.com/EikanWang (cherry picked from commit 14f828c) Co-authored-by: ZhaoqiongZ <106125927+ZhaoqiongZ@users.noreply.github.com>
…6783) [inductor] Fix Identity comparability and evalf recursion (pytorch#175975) Fixes pytorch#175856 ## Summary This PR adds a narrow `Identity._eval_evalf(self, prec)` override in `torch/utils/_sympy/functions.py` to fix the SymPy recursion/comparison failure seen in Inductor simplification (e.g. `Max(0, Identity(-6))`). The implementation only unwraps comparable integer constants: ```python def _eval_evalf(self, prec): arg = self.args[0] if arg.is_Integer and arg.is_comparable: return arg return None ``` This keeps the fix minimal for the index-math path involved in the bug. Tests Added targeted tests in test/inductor/test_utils.py: `testIdentityComparisonNoRecursion` `testIdentityComparableNumbersInMinMax` `testIdentityEvalfIntegerOnly` Validation Repro fails on unpatched builds in the same SymPy/Inductor path. Repro passes with this fix applied. Pull Request resolved: pytorch#175975 Approved by: https://github.com/azahed98, https://github.com/laithsakka (cherry picked from commit cea64de) Co-authored-by: bhack <bhack@users.noreply.github.com>
…nge (pytorch#175333) [XPU] Fix SyclExtension Windows build for oneAPI 2025.3+ breaking change (pytorch#170701) ## Summary Fixes SyclExtension compilation on Windows when using oneAPI 2025.3 or higher. ## Problem oneAPI 2025.3 introduced a breaking change in how include paths are ordered to align with MSVC behavior. This causes build failures when compiling SyclExtension on Windows. The issue occurs because MSVC include directories are explicitly passed on the compiler command line. With the new include path ordering in oneAPI 2025.3, this causes the wrong std headers included. These MSVC directories are already added as correctly-ordered implicit include paths by the compiler, so they should not need to be passed explicitly on the command line. Passing them explicitly disrupts the intended include order. ## Solution When building SYCL extensions on Windows with oneAPI version >= 2025.3, filter out Microsoft Visual Studio paths from the compiler's include directories. The fix is version-gated to only apply for oneAPI 2025.3+ to avoid affecting users on older oneAPI versions. Fixes: intel/torch-xpu-ops#2574 Pull Request resolved: pytorch#170701 Approved by: https://github.com/dvrogozh, https://github.com/EikanWang, https://github.com/atalman (cherry picked from commit a09b29e) Co-authored-by: astachowiczhabana <adam.stachowicz@intel.com>
…n. (pytorch#176410) [Inductor] Reject non-contiguous subnode fusion in mix-order reduction. (pytorch#176131) We observed assert error after PR pytorch#174947 on XPU in intel/torch-xpu-ops#2932: The assert error in line L2125: https://github.com/pytorch/pytorch/blob/f99ab991dcd3719ee25dd3377a53ea12e518308e/torch/_inductor/scheduler.py#L2122-L2125 which is caused by: https://github.com/pytorch/pytorch/blob/f99ab991dcd3719ee25dd3377a53ea12e518308e/torch/_inductor/scheduler.py#L2200-L2203 Root cause: - MixOrderReduction.can_fuse is a pre-fusion heuristic; it only checks static conditions (both reductions, reversed orders, common reads, one contiguous pre-fusion, size/heuristics). It cannot see access-pattern changes introduced by backend.fuse. - In the failing case, self.node1=op1115 (reduction, contiguous=True) is fused with other=op1123 (pointwise, contiguous=False), producing fused_node=op1115_op1123 (non-contiguous). self.node2=op1117_op1119 is already non-contiguous. The mix-order reduction invariant (at least one side contiguous) is violated, so FusedMixOrderReductions would assert. ``` self.node1 = op1115 (SchedulerNode, reduction, contiguous=True) other = op1123 (SchedulerNode, pointwise, contiguous=False) backend.fuse(self.node1, other) | v fused_node = op1115_op1123 (FusedSchedulerNode, reduction+pointwise, contiguous=False) self.node2 = op1117_op1119 (FusedSchedulerNode, reduction+reduction, contiguous=False) mix-order reduction attempt: fused_node + self.node2 -> FusedMixOrderReductions (assert fails) ``` Fix: - Add a general post-fusion validation in FusedMixOrderReductions.fuse_with: after backend.fuse, re-check the contiguity invariant and reject the fusion if both sides are non-contiguous. - Implement a FusionRejected signal and catch it in Scheduler.fuse_two_nodes to keep nodes unfused. Test: - Added a regression test which reproduced the assert error on **cuda/xpu** and pass with this PR. -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Pull Request resolved: pytorch#176131 Approved by: https://github.com/shunting314 (cherry picked from commit 5a6d6b3) Co-authored-by: xinan.lin <xinan.lin@intel.com>
…6228) (pytorch#176495) The default >1 num_stages have causing multiple out of shared memory issues. Make it to be 1 by default. We could explore other alternatives 1. always add a config with num_stages=1 while keeping the current heuristics. Could increase compilation time 2. dynamically scale down num-stages if all config fail to compile due to out of shared memory 3. minic Triton logic to estimate the amount of shared memory needed per stage and set num-stages accordingly based on smem capacity. Pull Request resolved: pytorch#176228 Approved by: https://github.com/eellison, https://github.com/drisspg, https://github.com/jansel (cherry picked from commit ab17a38)
Fix the torch.Stream context manager reentrance (pytorch#176568) # Motivation This PR aims to fix `torch.Stream` as a context manager nested/reentrance scenario. `torch.cuda.stream` and `torch.xpu.stream` could support these usages. The following scenario would be fixed with this PR: ```python import torch s0 = torch.Stream() with s0, s0: pass ``` ```python import torch s0 = torch.Stream() s1 = torch.Stream() with s0, s1: with s0, s1: pass ``` # Addtional Context Fix pytorch#176560 Pull Request resolved: pytorch#176568 Approved by: https://github.com/albanD (cherry picked from commit d43570c) Co-authored-by: Yu, Guangye <guangye.yu@intel.com>
…3124) Fixes internal CI build failures on release/2.11 due to triton build. Build was able to pass the point where triton failed previously. e.g. https://ml-ci-internal.amd.com/blue/organizations/jenkins/pytorch%2Fpytorch-ci-pipeline/detail/release%2F2.10/31/pipeline With our change to triton pin: https://ml-ci-internal.amd.com/job/pytorch/job/pytorch-ci-pipeline/job/PR-3124/2/pipeline-overview/
…eCUDA::test_flash_attn_backward_mixed_strides_cuda#179086 (#3127) `dv` tensor should be created with `empty_like(v)` rather than `empty_like(k)`. This fixes pytorch#168540, pytorch#168541, and supersedes pytorch#178499 This is cherry-picked from upstream PR pytorch#179086
Build validation: http://rocm-ci.amd.com/job/pytorch2.11-manylinux-wheels_rel-7.2/7/ : Connection issues https://github.com/ROCm/TheRock/actions/runs/23953043418/job/69864879059 : Build succeeded --------- Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
Fixes pytorch#158725 This is essentially @AngryLoki patch: https://github.com/gentoo/gentoo/blob/8cdbe88fa388ce264d1d70047222fcad190fec3d/sci-ml/caffe2/files/caffe2-2.9.0-rocm-distributed-link.patch Pull Request resolved: pytorch#175648 Approved by: https://github.com/jeffdaily, https://github.com/mlazos (cherry picked from commit 9bff6e1)
## Motivation Fix numpy compatibility for Python 3.14 for release/2.11 ## Technical Details - `numpy==2.1.2` has no cp314 wheels on PyPI, causing Python 3.14 builds in TheRock CI to fail with a meson/sccache error when pip falls back to building numpy from source - Add `python_version` markers to use `numpy==2.4.3` for Python 3.14+, while keeping the existing `numpy==2.1.2` pin for older Python versions ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. Co-authored-by: Subodh Dubey <Subodh.Dubey@amd.com>
….sh before sourcing (#3163) ## Summary Fixes the `pytorch_ut` failure introduced in PyTorch 2.11 where `test.sh` exits immediately with code 1 before any tests run. **Root cause:** PR pytorch#168377 added `source /etc/rocm_env.sh` to `.ci/pytorch/common.sh` targeting AMD's internal Jenkins CI, which provisions this file. When cherry-picked into `release/2.11`, this line breaks all TheRock Docker-based CI environments that do **not** provision `/etc/rocm_env.sh`. Since `set -e` is active in `test.sh`, the script exits before a single test runs — causing 0-pass, 1-fail on every host. **The fix:** Add a `[[ -f /etc/rocm_env.sh ]]` existence check so environments without the file skip sourcing it gracefully, while Jenkins CI (which does provision the file) continues working as before. This matches the fix already present on `pytorch/pytorch main`. ```bash # Before (broken): if [[ "${BUILD_ENVIRONMENT}" == *rocm* ]]; then source /etc/rocm_env.sh fi # After (fixed): if [[ "${BUILD_ENVIRONMENT}" == *rocm* ]] && [[ -f /etc/rocm_env.sh ]]; then source /etc/rocm_env.sh fi ``` **Impact without this fix:** - 86/97 `pytorch_ut` runs failed on TheRock build 7.13.0-1208 - Affects all GFX variants and Python versions (3.11, 3.12, 3.13) - PyTorch 2.10 is unaffected (does not have `source /etc/rocm_env.sh`) **References:** - Jira: ROCM-21809 - Upstream issue: pytorch#170983 - Regression introduced by: pytorch#168377
) (#3164) On Windows with HIP/ROCm, std::memcpy is a __host__ function and cannot be called from __device__ code. Use raw memcpy (which the HIP compiler provides as a device builtin) when building on Windows. This will allow builds for of pytorch for gfx942 on Windows. gfx950 is yet to be tested but it should likely build as well. Pull Request resolved: pytorch#175410 Approved by: https://github.com/jeffdaily Co-authored-by: Aaryaman Vasishta <aaryaman.vasishta@amd.com>
…ch#178195) (#3169) Cherry-pick of upstream pytorch#178195 into `release/2.11`. Related PR: - #3168 ## Motivation For MI350, FP64 is supported in hipBLASLt. This PR enables FP64 on hipBLASLt in TunableOp and re-enables the FP64 unit test on MI350. ## Technical Details - Map `double` GEMM to `HIPBLAS_COMPUTE_64F` via a new `HipBlasComputeTypeFor<CT>()` helper (defaults to `HIPBLAS_COMPUTE_32F`, specialized to `HIPBLAS_COMPUTE_64F` for `double`). - Use `at::opmath_type<T>`-typed `alpha` / `beta` in the hipBLASLt path so FP64 tuning and execution use consistent compute semantics. - Set the matmul descriptor scale type with `HipDataTypeFor<opmath_t>()`. - Guard the TF32 override with `if constexpr (std::is_same_v<CT, float>)` so FP64 doesn't get downgraded. - Removes the MI350 skip on `test_matmul_small_brute_force_tunableop_cuda_float64`. The cherry-pick applied cleanly (no conflicts). ## Test Plan Build PyTorch on MI350 with ROCm, then run: \`\`\` PYTORCH_TEST_WITH_ROCM=1 python test/test_linalg.py -v -k tunableop \`\`\` ## Test Result \`\`\` Ran 69 tests in 156.726s OK (skipped=42) \`\`\` All tunableop tests pass. Skipped tests are CPU-only variants and gfx942-only variants (FP8/TF32). Upstream PR: pytorch#178195 Upstream commit: 0550897 Made with [Cursor](https://cursor.com)
…on configs. (#3145) New Inductor configs in support of a customer request. See https://amd-hub.atlassian.net/browse/AIPYTORCH-373
#3148) - This PR updates the Numba version constraints to correctly handle Python 3.14 and aligns the platform conditions with Numba’s current support matrix. - Add a new rule selecting numba==0.64.0 for Python ≥ 3.14 --------- Co-authored-by: sohbodas <Soham.Bodas@gmail.com> Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
…3181) <h2>Fix MIOpen CTC loss access violation on Windows discrete GPUs</h2> <h3>Problem</h3> <p>A failing unit test on Windows started showing a couple weeks ago and a missing <code>#include</code> was added in [](pytorch#178284), but CI on TheRock kept failing. The fix was tested on gfx1151 (APU), where the test passed, but CI showed failures on gfx1100. </p> <p><code>test_CTCLoss_no_batch_dim</code> (and any code path hitting <code>miopen_ctc_loss</code>) crashes with a fatal access violation on Windows systems with discrete AMD GPUs:</p> <pre><code>Windows fatal exception: access violation Exception Code: 0xC0000005 #0 miopen::CTCLossDescriptor::GetCTCLossWorkspaceSize (MIOpen.dll+0x14fde4) #1 miopenGetCTCLossWorkspaceSize (MIOpen.dll+0x150912) #2 at::native::miopen_ctc_loss (torch_hip.dll) </code></pre> <h3>Root Cause</h3> <p><code>miopenGetCTCLossWorkspaceSize</code> and <code>miopenCTCLoss</code> read the <code>labels</code>, <code>label_lengths</code>, and <code>input_lengths</code> arrays <strong>on the host side</strong> to plan the computation and calculate workspace requirements. The existing code copies these arrays to GPU memory and passes device pointers:</p> <pre><code>Tensor labels_gpu = targets_t.to(Device(at::kCUDA), at::kInt); // ... hipMemcpy to GPU ... MIOPEN_CHECK(miopenGetCTCLossWorkspaceSize(..., labels_gpu.data_ptr<int>(), // device pointer label_lengths_gpu.data_ptr<int>(), // device pointer input_lengths_gpu.data_ptr<int>() // device pointer )); </code></pre> <p>This works on:</p> <ul> <li><strong>Linux</strong> — HSA (Heterogeneous System Architecture) maps GPU allocations into the process virtual address space, making device pointers host-readable</li> <li><strong>Windows APUs</strong> — CPU and iGPU share system RAM, so device pointers point to host-accessible memory</li> </ul> <p>This crashes on:</p> <ul> <li><strong>Windows dGPUs</strong> — GPU has dedicated VRAM across PCIe; device pointers are opaque handles that cannot be dereferenced from host code</li> </ul> <h3>Verification</h3> <p>Tested on gfx1201:</p> <table border="1" cellpadding="6" cellspacing="0"> <tr><th>Check</th><th>Result</th></tr> <tr><td><code>hipDeviceAttributeIntegrated</code></td><td><code>0</code> (discrete GPU)</td></tr> <tr><td><code>hipDeviceAttributeCanUseHostPointerForRegisteredMem</code></td><td><code>0</code></td></tr> <tr><td><code>hipDeviceAttributeManagedMemory</code></td><td><code>0x7FFFFFFF</code> (unsupported)</td></tr> <tr><td><code>hipDeviceAttributeUnifiedAddressing</code></td><td><code>0x7FFFFFFF</code> (unsupported)</td></tr> <tr><td>Host read of <code>hipMalloc</code> pointer via <code>ctypes</code></td><td>Access violation</td></tr> <tr><td>CTC loss with CPU pointers</td><td>Pass (forward + backward)</td></tr> </table> <h3>Fix</h3> <p>Use host pointers since this is what MIOpen expects should be used.</p> <h3>Testing</h3> <p>Run all existing CTCLoss unit tests.</p> Pull Request resolved: pytorch#179264 Approved by: https://github.com/jeffdaily Co-authored-by: Milica Stankovic <mstankov@amd.com>
…ch (#3161) Cherry pick of pytorch#178284 Fixes ROCm/TheRock#3987 Co-authored-by: Milica Stankovic <milica.stankovic@amd.com>
…3160) Cherry pick of pytorch#179138 Fixes: ROCm/TheRock#4086 ROCm/rocm-libraries#5205 ROCm/TheRock#4079 Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>
Cherry pick of pytorch#176024 Co-authored-by: nkhasbag <nkhasbag@nvidia.com> Co-authored-by: PyTorch MergeBot <pytorchmergebot@users.noreply.github.com> Co-authored-by: Nikita Shulga <nshulga@meta.com>
#3144) ## Motivation - Enabling gfx103X-all wheels in TheRock is currently blocked due to PyTorch CI failures caused by a lack of `gfx1033` support in CK. ROCm/rocm-libraries#5141 resolves these issues. ## Technical Details - The aforementioned fix has been cherrypicked into the `pytorch/release/2.11/` branch of ROCm/composable_kernel - this PR bumps the `third_party/composable_kernel` branch to pick up these changes. ## Test Plan - Trigger a build and verify it passes ## Test Result - Build succeeds for `cherrypick-gfx1033-CK-support-torch2.11` branch. https://github.com/ROCm/TheRock/actions/runs/24195531659/job/70624339554 - Testing Pasting offline comments from @harkgill-amd > In https://github.com/ROCm/TheRock/actions/runs/24906345786/job/72942139688 Pytorch 3.10 + release/2.11 -> Pass Pytorch 3.11 + release/2.11 -> TestNN.test_Embedding_discontiguous_cuda failed but this seems to be a known flaky test and will be disabled with ROCm/TheRock#4775 Pytorch 3.12 + release/2.11 -> Pass Pytorch 3.13 + release/2.11 -> Pass In https://github.com/ROCm/TheRock/actions/runs/25002732513/job/73225027260 Pytorch 3.14 + release/2.11 -> The failing tests here all share the same miopenStatusUnknownError message. These are the same failures as seen in the main branch run here https://github.com/ROCm/TheRock/actions/runs/24985367049 so they aren't related to my PR ## Submission Checklist - [X] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
`my_lib` in `test_storage_preserve_nonhermetic_in_hermetic_context` leaks into global op space after the test ends and affect subsequent tests in the same process using dynamo. Without the fix, running any tests requiring checkpoint/compile or dynamo-related after `test_storage_preserve_nonhermetic_in_hermetic_context` fails with ``` torch._dynamo.exc.BackendCompilerFailed: backend='aot_eager' raised: TypeError: 'CustomDecompTable' object is not a mapping ``` e.g. `python -m pytest -v pytorch/test/test_torch.py::TestTorch::test_storage_preserve_nonhermetic_in_hermetic_context pytorch/test/test_autograd.py::TestAutograd::test_checkpoint_compile_no_recompile` Upstream PR: pytorch#180998 Signed-off-by: zichguan-amd <zichuan.guan@amd.com>
…on mask optimization pytorch#176269 (#3156) Cherry-pick of #3055 Co-authored-by: Strahinja Stamenkovic <sstamenk@amd.com>
#3191) Fixes a bug where FlexibleLayout on a ReinterpretView incorrectly returns underlying physical buffer strides (e.g., 4D) instead of logical view strides (3D). This patch skips speculative layout and constraint tracking for ReinterpretView nodes, forcing the use of node.get_stride() to prevent Illegal Memory Access (IMA) on ROCm. Manual backport from PyTorch 2.12. Ref commit: pytorch@0e1f562 ## Motivation <!-- Explain the purpose of this PR and the goals it aims to achieve. --> ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
## PR Summary Fixes pytorch#178455 ignore_logger_methods was renamed to ignore_logging_functions in torch 2.11 but wasn't added to blocklist in _get_dynamo_config_for_logging() ## Repro ``` import torch import torch._dynamo.config import torch._dynamo.utils torch._dynamo.config.ignore_logging_functions.add(print) torch._dynamo.utils._get_dynamo_config_for_logging() ``` ## Changes * Include `ignore_logging_functions` from `_get_dynamo_config_for_logging()` (consistent with existing `ignore_logger_methods`) * Add a regression test to ensure no crash when logging config includes builtin functions * Added a test that: * Inserts `print` into `ignore_logging_functions` * Verifies `_get_dynamo_config_for_logging()` returns valid JSON without errors related issue: pytorch#178455 Pull Request resolved: pytorch#178506 Approved by: https://github.com/Lucaskabela (cherry picked from commit 7eea8ea) ## Motivation <!-- Explain the purpose of this PR and the goals it aims to achieve. --> ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. Co-authored-by: vvvdwbvvv <vvvdwbvvv@gmail.com> Co-authored-by: PyTorch MergeBot <pytorchmergebot@users.noreply.github.com>
…h#180998) (#3221) Cherry pick to 2.11 release `my_lib` in `test_storage_preserve_nonhermetic_in_hermetic_context` leaks into global op space after the test ends and affect subsequent tests in the same process using dynamo. Without the fix, running any tests requiring checkpoint/compile or dynamo-related after `test_storage_preserve_nonhermetic_in_hermetic_context` fails with ``` torch._dynamo.exc.BackendCompilerFailed: backend='aot_eager' raised: TypeError: 'CustomDecompTable' object is not a mapping ``` e.g. `python -m pytest -v pytorch/test/test_torch.py::TestTorch::test_storage_preserve_nonhermetic_in_hermetic_context pytorch/test/test_autograd.py::TestAutograd::test_checkpoint_compile_no_recompile` Pull Request resolved: pytorch#180998 Approved by: https://github.com/albanD, https://github.com/ezyang --------- Co-authored-by: Claude Opus 4 <noreply@anthropic.com>
…aLauncher (#3238) ## Summary - Backports upstream PyTorch PR pytorch#183926 to ROCm release/2.11. - Uses `hipModuleLoadData` for ROCm static launcher module loading to avoid retaining open HSACO file descriptors. - Leaves the CUDA/NVIDIA path unchanged. - Resolves Jira https://amd-hub.atlassian.net/browse/ROCM-24659, https://amd-hub.atlassian.net/browse/ROCM-24664 Made with [Cursor](https://cursor.com) Co-authored-by: PyTorch MergeBot <pytorchmergebot@users.noreply.github.com>
#3245) …Class (pytorch#180736) TestPrologueFusion and TestEpilogueFusionStaticAnalysis both use ExitStack in setUpClass to apply config.patch(), but neither defined tearDownClass to close the stack. When TestPrologueFusion runs before TestEpilogueFusionStaticAnalysis in the same process, config values like max_autotune_gemm_backends="TRITON" leak through, removing the aten kernel choice from autotuning and causing test failures. Fixes pytorch#179693 Pull Request resolved: pytorch#180736 Approved by: https://github.com/Skylion007 ## Motivation <!-- Explain the purpose of this PR and the goals it aims to achieve. --> ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. Co-authored-by: NikhilAPatel <nikhilap@meta.com>
pytorch#175817) (#3241) ## Motivation Aimed as a fix for test `TestMemPool.test_graph_capture_reclaim_shared_pool` failing in TheRock wheels: ROCm/TheRock#4925 The test was brought into `release/2.11` by the cherry-pick of upstream pytorch#176024 in #3182, but the allocator fix from upstream pytorch#175817 was not. Without this fix, `endAllocateToPool` (called from `CUDAGraph::capture_end`) does not reclaim `record_stream`-deferred blocks, so a second graph capture into the same shared pool cannot reuse the block freed in the first capture. ## Technical Details Cherry-pick of upstream pytorch#175817 (commit `b55e5314fb72f1ea782f72a6c9728a40c12678ea`) on top of `release/2.11`. ## Test Plan - Build PyTorch wheels from this branch and verify that the test `TestMemPool.test_graph_capture_reclaim_shared_pool` is now passing. ## Test Result - `TestMemPool.test_graph_capture_reclaim_shared_pool` passed for torch 2.11: https://github.com/ROCm/TheRock/actions/runs/26116907093/job/76816330885 ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. Co-authored-by: Frank Lin <eee4017@gmail.com>
…2_IFU_20260521 # Conflicts: # .ci/docker/build.sh # .ci/docker/ci_commit_pins/huggingface-requirements.txt # .ci/docker/ci_commit_pins/triton.txt # .ci/docker/common/install_cuda.sh # .ci/docker/requirements-ci.txt # .ci/docker/requirements-docs.txt # .ci/lumen_cli/cli/lib/core/vllm/vllm_test_library.yaml # .ci/manywheel/build_cuda.sh # .ci/pytorch/common.sh # .ci/pytorch/common_utils.sh # .ci/pytorch/windows/internal/cuda_install.bat # .github/ci_commit_pins/vllm.txt # .github/ci_commit_pins/xla.txt # .github/scripts/build_triton_wheel.py # .github/scripts/filter_test_configs.py # .github/scripts/generate_binary_build_matrix.py # .github/templates/common.yml.j2 # .github/templates/linux_binary_build_workflow.yml.j2 # .github/templates/macos_binary_build_workflow.yml.j2 # .github/templates/windows_binary_build_workflow.yml.j2 # .github/workflows/_bazel-build-test.yml # .github/workflows/_binary-build-flash-attention-wheel-linux.yml # .github/workflows/_binary-build-flash-attention-wheel-windows.yml # .github/workflows/_binary-build-linux.yml # .github/workflows/_binary-test-linux.yml # .github/workflows/_binary-upload.yml # .github/workflows/_docs.yml # .github/workflows/_link_check.yml # .github/workflows/_linux-build.yml # .github/workflows/_linux-test-stable-fa3.yml # .github/workflows/_linux-test.yml # .github/workflows/_mac-build.yml # .github/workflows/_mac-test.yml # .github/workflows/_rocm-test.yml # .github/workflows/_runner-determinator.yml # .github/workflows/_vllm-benchmark.yml # .github/workflows/_win-build.yml # .github/workflows/_win-test.yml # .github/workflows/_xpu-test.yml # .github/workflows/b200-distributed.yml # .github/workflows/b200-symm-mem.yml # .github/workflows/build-almalinux-images.yml # .github/workflows/build-libtorch-images.yml # .github/workflows/build-manywheel-images-s390x.yml # .github/workflows/build-manywheel-images.yml # .github/workflows/build-triton-wheel.yml # .github/workflows/build-vllm-wheel.yml # .github/workflows/claude-code.yml # .github/workflows/claude-issue-triage-run.yml # .github/workflows/close-nonexistent-disable-issues.yml # .github/workflows/create_release.yml # .github/workflows/docker-builds.yml # .github/workflows/docker-cache-rocm.yml # .github/workflows/docker-release.yml # .github/workflows/dynamo-unittest.yml # .github/workflows/generated-linux-aarch64-binary-manywheel-nightly.yml # .github/workflows/generated-linux-binary-libtorch-nightly.yml # .github/workflows/generated-linux-binary-manywheel-nightly.yml # .github/workflows/generated-linux-s390x-binary-manywheel-nightly.yml # .github/workflows/generated-macos-arm64-binary-libtorch-release-nightly.yml # .github/workflows/generated-windows-arm64-binary-libtorch-debug-nightly.yml # .github/workflows/generated-windows-arm64-binary-libtorch-release-nightly.yml # .github/workflows/generated-windows-arm64-binary-wheel-nightly.yml # .github/workflows/generated-windows-binary-libtorch-debug-nightly.yml # .github/workflows/generated-windows-binary-libtorch-release-nightly.yml # .github/workflows/generated-windows-binary-wheel-nightly.yml # .github/workflows/h100-cutlass-backend.yml # .github/workflows/h100-distributed.yml # .github/workflows/h100-symm-mem.yml # .github/workflows/inductor-micro-benchmark.yml # .github/workflows/inductor-nightly.yml # .github/workflows/inductor-pallas.yml # .github/workflows/inductor-perf-compare.yml # .github/workflows/inductor-perf-test-b200.yml # .github/workflows/inductor-perf-test-nightly-aarch64.yml # .github/workflows/inductor-perf-test-nightly-h100.yml # .github/workflows/inductor-perf-test-nightly-rocm-mi300.yml # .github/workflows/inductor-perf-test-nightly-rocm-mi355.yml # .github/workflows/inductor-perf-test-nightly-x86-zen.yml # .github/workflows/inductor-perf-test-nightly-x86.yml # .github/workflows/inductor-perf-test-nightly-xpu.yml # .github/workflows/inductor-perf-test-nightly.yml # .github/workflows/inductor-periodic.yml # .github/workflows/inductor-rocm-mi200.yml # .github/workflows/inductor-rocm-mi300.yml # .github/workflows/inductor-rocm-mi355.yml # .github/workflows/inductor-unittest.yml # .github/workflows/inductor.yml # .github/workflows/lint-autoformat.yml # .github/workflows/lint-bc.yml # .github/workflows/lint.yml # .github/workflows/linux-aarch64.yml # .github/workflows/llm_td_retrieval.yml # .github/workflows/nightly-s3-uploads.yml # .github/workflows/nightly.yml # .github/workflows/nitpicker.yml # .github/workflows/operator_microbenchmark.yml # .github/workflows/periodic-rocm-mi200.yml # .github/workflows/periodic-rocm-mi300.yml # .github/workflows/periodic-rocm-mi355.yml # .github/workflows/periodic.yml # .github/workflows/pull.yml # .github/workflows/quantization-periodic.yml # .github/workflows/rocm-mi200.yml # .github/workflows/rocm-mi300.yml # .github/workflows/rocm-mi355.yml # .github/workflows/rocm-navi31.yml # .github/workflows/rocm-nightly.yml # .github/workflows/slow-rocm-mi200.yml # .github/workflows/slow.yml # .github/workflows/target-determination-indexer.yml # .github/workflows/target_determination.yml # .github/workflows/test-b200.yml # .github/workflows/test-check-binary.yml # .github/workflows/test-h100.yml # .github/workflows/tools-unit-tests.yml # .github/workflows/torchbench.yml # .github/workflows/trunk-rocm-sandbox.yml # .github/workflows/trunk.yml # .github/workflows/unstable.yml # .github/workflows/update-viablestrict.yml # .github/workflows/update_pytorch_labels.yml # .github/workflows/upload-test-stats-while-running.yml # .github/workflows/upload-test-stats.yml # .github/workflows/upload-torch-dynamo-perf-stats.yml # .github/workflows/upload_test_stats_intermediate.yml # .github/workflows/vllm-benchmark.yml # .github/workflows/weekly.yml # .github/workflows/xpu.yml # aten/src/ATen/native/cuda/Sorting.cu # aten/src/ATen/native/cuda/SortingRadixSelect.cuh # aten/src/ATen/native/cuda/TensorTopK.cu # benchmarks/dynamo/timm_models.py # c10/cuda/CUDAAllocatorConfig.h # c10/cuda/CUDACachingAllocator.cpp # related_commits # requirements-build.txt # requirements.txt # test/cpp_extensions/test_libtorch_agnostic.py # test/distributed/tensor/test_tensor_ops.py # test/distributed/test_dynamo_distributed.py # test/inductor/test_ck_backend.py # test/inductor/test_mix_order_reduction.py # test/inductor/test_mps_basic.py # test/inductor/test_pattern_matcher.py # test/inductor/test_torchinductor_dynamic_shapes.py # test/test_cuda.py # test/test_mps.py # test/test_transformers.py # third_party/composable_kernel # tools/stats/import_test_stats.py # torch/_inductor/config.py # torch/_inductor/fx_passes/post_grad.py # torch/_inductor/runtime/triton_heuristics.py # torch/_inductor/select_algorithm.py # torch/testing/_internal/common_methods_invocations.py # torch/testing/_internal/opinfo/definitions/linalg.py # version.txt
|
Jenkins build for f28bb501cbe78f75c6d82f300813a254b63dfec1 commit finished as FAILURE |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
rocm_base: 26872de