[AMDGPU] Implement "non-av" semantics using metadata by ssahasra · Pull Request #199489 · llvm/llvm-project

ssahasra · 2026-05-25T06:54:58Z

When the MMRA tag !{!"amdgcn-av", !"none"} is present on a synchronization operation (fence, atomic load/store/rmw/cmpxchg), suppress cache writeback (MakeAvailable) and cache invalidation (MakeVisible) while preserving memory ordering (waits). This implements the metadata proposed in #191246.

Part of a stack:

Fixes: LCOMPILER-2214

Assisted-By: Claude Opus 4.6

A release consists of two actions: write-back the current cache, and wait for "relevant" outstanding operations to complete. With the new memory model, it is possible to disable the cache write-back using "av none" semantics. This patch cleanly separates the existing implementation so that the write-backs can be selectively applied when such metadata is present. Assisted-By: Claude Opus 4.6

When the MMRA tag !{!"amdgcn-av", !"none"} is present on a synchronization operation (fence, atomic load/store/rmw/cmpxchg), suppress cache writeback (MakeAvailable) and cache invalidation (MakeVisible) while preserving memory ordering (waits). This implements the metadata proposed in #191246. Fixes: LCOMPILER-2214 Assisted-By: Claude Opus 4.6

llvmorg-github-actions · 2026-05-25T06:55:37Z

@llvm/pr-subscribers-backend-amdgpu

Author: Sameer Sahasrabuddhe (ssahasra)

Changes

When the MMRA tag !{!"amdgcn-av", !"none"} is present on a synchronization operation (fence, atomic load/store/rmw/cmpxchg), suppress cache writeback (MakeAvailable) and cache invalidation (MakeVisible) while preserving memory ordering (waits).

This implements the metadata proposed in #191246.

Fixes: LCOMPILER-2214

Assisted-By: Claude Opus 4.6

Patch is 39.91 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/199489.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp (+47-29)
(added) llvm/test/CodeGen/AMDGPU/memory-legalizer-av-none.ll (+722)

diff --git a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
index b721dcaf49d0f..f16192d343531 100644
--- a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+++ b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
@@ -148,6 +148,7 @@ class SIMemOpInfo final {
   bool IsNonTemporal = false;
   bool IsLastUse = false;
   bool IsCooperative = false;
+  bool IsAVNone = false;
 
   // TODO: Should we assume Cooperative=true if no MMO is present?
   SIMemOpInfo(
@@ -160,12 +161,12 @@ class SIMemOpInfo final {
       AtomicOrdering FailureOrdering = AtomicOrdering::SequentiallyConsistent,
       bool IsVolatile = false, bool IsNonTemporal = false,
       bool IsLastUse = false, bool IsCooperative = false,
-      bool CanDemoteWorkgroupToWavefront = false)
+      bool CanDemoteWorkgroupToWavefront = false, bool IsAVNone = false)
       : Ordering(Ordering), FailureOrdering(FailureOrdering), Scope(Scope),
         OrderingAddrSpace(OrderingAddrSpace), InstrAddrSpace(InstrAddrSpace),
         IsCrossAddressSpaceOrdering(IsCrossAddressSpaceOrdering),
         IsVolatile(IsVolatile), IsNonTemporal(IsNonTemporal),
-        IsLastUse(IsLastUse), IsCooperative(IsCooperative) {
+        IsLastUse(IsLastUse), IsCooperative(IsCooperative), IsAVNone(IsAVNone) {
 
     if (Ordering == AtomicOrdering::NotAtomic) {
       assert(!IsCooperative && "Cannot be cooperative & non-atomic!");
@@ -277,6 +278,9 @@ class SIMemOpInfo final {
   /// \returns True if this is a cooperative load or store atomic.
   bool isCooperative() const { return IsCooperative; }
 
+  /// \returns True if MakeAvailable/MakeVisible should be suppressed.
+  bool isAVNone() const { return IsAVNone; }
+
   /// \returns True if ordering constraint of the machine instruction used to
   /// create this SIMemOpInfo is unordered or higher, false otherwise.
   bool isAtomic() const {
@@ -451,13 +455,13 @@ class SICacheControl {
                                SIAtomicScope Scope, SIAtomicAddrSpace AddrSpace,
                                Position Pos) const = 0;
 
-  /// Inserts writeback followed by an unconditional wait to implement a
-  /// release operation.
+  /// Inserts writeback (unless \p IsAVNone) followed by an unconditional wait.
   bool insertRelease(MachineBasicBlock::iterator &MI, SIAtomicScope Scope,
                      SIAtomicAddrSpace AddrSpace, bool IsCrossAddrSpaceOrdering,
-                     Position Pos) const {
+                     Position Pos, bool IsAVNone) const {
     bool Changed = false;
-    Changed |= insertWriteback(MI, Scope, AddrSpace, Pos);
+    if (!IsAVNone)
+      Changed |= insertWriteback(MI, Scope, AddrSpace, Pos);
     Changed |= insertWait(MI, Scope, AddrSpace, SIMemOp::LOAD | SIMemOp::STORE,
                           IsCrossAddrSpaceOrdering, Pos,
                           AtomicOrdering::Release, /*AtomicsOnly=*/false);
@@ -733,6 +737,13 @@ getSynchronizeAddrSpaceMD(const MachineInstr &MI) {
   return Result;
 }
 
+static bool hasAVNoneMMRA(const MachineInstr &MI) {
+  auto MMRA = MMRAMetadata(MI.getMMRAMetadata());
+  if (!MMRA)
+    return false;
+  return MMRA.hasTag("amdgcn-av", "none");
+}
+
 } // end anonymous namespace
 
 void SIMemOpAccess::reportUnsupported(const MachineBasicBlock::iterator &MI,
@@ -876,7 +887,7 @@ std::optional<SIMemOpInfo> SIMemOpAccess::constructFromMIWithMMO(
   return SIMemOpInfo(ST, Ordering, Scope, OrderingAddrSpace, InstrAddrSpace,
                      IsCrossAddressSpaceOrdering, FailureOrdering, IsVolatile,
                      IsNonTemporal, IsLastUse, IsCooperative,
-                     CanDemoteWorkgroupToWavefront);
+                     CanDemoteWorkgroupToWavefront, hasAVNoneMMRA(*MI));
 }
 
 std::optional<SIMemOpInfo>
@@ -946,7 +957,7 @@ SIMemOpAccess::getAtomicFenceInfo(const MachineBasicBlock::iterator &MI) const {
   return SIMemOpInfo(ST, Ordering, Scope, OrderingAddrSpace,
                      SIAtomicAddrSpace::ATOMIC, IsCrossAddressSpaceOrdering,
                      AtomicOrdering::NotAtomic, false, false, false, false,
-                     CanDemoteWorkgroupToWavefront);
+                     CanDemoteWorkgroupToWavefront, hasAVNoneMMRA(*MI));
 }
 
 std::optional<SIMemOpInfo> SIMemOpAccess::getAtomicCmpxchgOrRmwInfo(
@@ -2317,9 +2328,10 @@ bool SIMemoryLegalizer::expandLoad(const SIMemOpInfo &MOI,
           CC->insertWait(MI, MOI.getScope(), MOI.getInstrAddrSpace(),
                          SIMemOp::LOAD, MOI.getIsCrossAddressSpaceOrdering(),
                          Position::AFTER, Order, /*AtomicsOnly=*/true);
-      Changed |= CC->insertAcquire(MI, MOI.getScope(),
-                                   MOI.getOrderingAddrSpace(),
-                                   Position::AFTER);
+      if (!MOI.isAVNone()) {
+        Changed |= CC->insertAcquire(
+            MI, MOI.getScope(), MOI.getOrderingAddrSpace(), Position::AFTER);
+      }
     }
 
     return Changed;
@@ -2363,11 +2375,12 @@ bool SIMemoryLegalizer::expandStore(const SIMemOpInfo &MOI,
       Changed |= CC->handleCooperativeAtomic(*MI);
 
     if (MOI.getOrdering() == AtomicOrdering::Release ||
-        MOI.getOrdering() == AtomicOrdering::SequentiallyConsistent)
-      Changed |= CC->insertRelease(MI, MOI.getScope(),
-                                   MOI.getOrderingAddrSpace(),
-                                   MOI.getIsCrossAddressSpaceOrdering(),
-                                   Position::BEFORE);
+        MOI.getOrdering() == AtomicOrdering::SequentiallyConsistent) {
+      Changed |=
+          CC->insertRelease(MI, MOI.getScope(), MOI.getOrderingAddrSpace(),
+                            MOI.getIsCrossAddressSpaceOrdering(),
+                            Position::BEFORE, MOI.isAVNone());
+    }
 
     Changed |= CC->finalizeStore(StoreMI, /*Atomic=*/true);
     return Changed;
@@ -2412,7 +2425,7 @@ bool SIMemoryLegalizer::expandAtomicFence(const SIMemOpInfo &MOI,
 
     if (Order == AtomicOrdering::Release ||
         Order == AtomicOrdering::AcquireRelease ||
-        Order == AtomicOrdering::SequentiallyConsistent)
+        Order == AtomicOrdering::SequentiallyConsistent) {
       /// TODO: This relies on a barrier always generating a waitcnt
       /// for LDS to ensure it is not reordered with the completion of
       /// the proceeding LDS operations. If barrier had a memory
@@ -2422,18 +2435,21 @@ bool SIMemoryLegalizer::expandAtomicFence(const SIMemOpInfo &MOI,
       /// adding S_WAITCNT before a S_BARRIER.
       Changed |= CC->insertRelease(MI, MOI.getScope(), OrderingAddrSpace,
                                    MOI.getIsCrossAddressSpaceOrdering(),
-                                   Position::BEFORE);
+                                   Position::BEFORE, MOI.isAVNone());
+    }
 
     // TODO: If both release and invalidate are happening they could be combined
     // to use the single "BUFFER_WBINV*" instruction. This could be done by
     // reorganizing this code or as part of optimizing SIInsertWaitcnt pass to
     // track cache invalidate and write back instructions.
 
-    if (Order == AtomicOrdering::Acquire ||
-        Order == AtomicOrdering::AcquireRelease ||
-        Order == AtomicOrdering::SequentiallyConsistent)
+    if ((Order == AtomicOrdering::Acquire ||
+         Order == AtomicOrdering::AcquireRelease ||
+         Order == AtomicOrdering::SequentiallyConsistent) &&
+        !MOI.isAVNone()) {
       Changed |= CC->insertAcquire(MI, MOI.getScope(), OrderingAddrSpace,
                                    Position::BEFORE);
+    }
 
     return Changed;
   }
@@ -2469,11 +2485,12 @@ bool SIMemoryLegalizer::expandAtomicCmpxchgOrRmw(const SIMemOpInfo &MOI,
     if (Order == AtomicOrdering::Release ||
         Order == AtomicOrdering::AcquireRelease ||
         Order == AtomicOrdering::SequentiallyConsistent ||
-        MOI.getFailureOrdering() == AtomicOrdering::SequentiallyConsistent)
-      Changed |= CC->insertRelease(MI, MOI.getScope(),
-                                   MOI.getOrderingAddrSpace(),
-                                   MOI.getIsCrossAddressSpaceOrdering(),
-                                   Position::BEFORE);
+        MOI.getFailureOrdering() == AtomicOrdering::SequentiallyConsistent) {
+      Changed |=
+          CC->insertRelease(MI, MOI.getScope(), MOI.getOrderingAddrSpace(),
+                            MOI.getIsCrossAddressSpaceOrdering(),
+                            Position::BEFORE, MOI.isAVNone());
+    }
 
     if (Order == AtomicOrdering::Acquire ||
         Order == AtomicOrdering::AcquireRelease ||
@@ -2486,9 +2503,10 @@ bool SIMemoryLegalizer::expandAtomicCmpxchgOrRmw(const SIMemOpInfo &MOI,
                          isAtomicRet(*MI) ? SIMemOp::LOAD : SIMemOp::STORE,
                          MOI.getIsCrossAddressSpaceOrdering(), Position::AFTER,
                          Order, /*AtomicsOnly=*/true);
-      Changed |= CC->insertAcquire(MI, MOI.getScope(),
-                                   MOI.getOrderingAddrSpace(),
-                                   Position::AFTER);
+      if (!MOI.isAVNone()) {
+        Changed |= CC->insertAcquire(
+            MI, MOI.getScope(), MOI.getOrderingAddrSpace(), Position::AFTER);
+      }
     }
 
     Changed |= CC->finalizeStore(RMWMI, /*Atomic=*/true);
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-av-none.ll b/llvm/test/CodeGen/AMDGPU/memory-legalizer-av-none.ll
new file mode 100644
index 0000000000000..89230fe1b7cdd
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-av-none.ll
@@ -0,0 +1,722 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx90a < %s | FileCheck -check-prefixes=GFX90A %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx90a -mattr=+tgsplit < %s | FileCheck -check-prefixes=GFX90A-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1200 < %s | FileCheck --check-prefixes=GFX12-WGP %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1200 -mattr=+cumode < %s | FileCheck --check-prefixes=GFX12-CU %s
+
+; Test that !amdgcn-av-none suppresses MakeAvailable/MakeVisible (cache
+; writeback/invalidation) while preserving ordering (waits).
+
+; Fences: one per scope, varying orderings.
+
+define amdgpu_kernel void @workgroup_acq_rel_fence_av_none() {
+; GFX90A-LABEL: workgroup_acq_rel_fence_av_none:
+; GFX90A:       ; %bb.0: ; %entry
+; GFX90A-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX90A-NEXT:    s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: workgroup_acq_rel_fence_av_none:
+; GFX90A-TGSPLIT:       ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX90A-TGSPLIT-NEXT:    s_endpgm
+;
+; GFX12-WGP-LABEL: workgroup_acq_rel_fence_av_none:
+; GFX12-WGP:       ; %bb.0: ; %entry
+; GFX12-WGP-NEXT:    s_wait_bvhcnt 0x0
+; GFX12-WGP-NEXT:    s_wait_samplecnt 0x0
+; GFX12-WGP-NEXT:    s_wait_storecnt 0x0
+; GFX12-WGP-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX12-WGP-NEXT:    s_endpgm
+;
+; GFX12-CU-LABEL: workgroup_acq_rel_fence_av_none:
+; GFX12-CU:       ; %bb.0: ; %entry
+; GFX12-CU-NEXT:    s_wait_bvhcnt 0x0
+; GFX12-CU-NEXT:    s_wait_samplecnt 0x0
+; GFX12-CU-NEXT:    s_wait_storecnt 0x0
+; GFX12-CU-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX12-CU-NEXT:    s_endpgm
+entry:
+  fence syncscope("workgroup") acq_rel, !mmra !0
+  ret void
+}
+
+define amdgpu_kernel void @cluster_seq_cst_fence_av_none() {
+; GFX90A-LABEL: cluster_seq_cst_fence_av_none:
+; GFX90A:       ; %bb.0: ; %entry
+; GFX90A-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX90A-NEXT:    s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: cluster_seq_cst_fence_av_none:
+; GFX90A-TGSPLIT:       ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX90A-TGSPLIT-NEXT:    s_endpgm
+;
+; GFX12-WGP-LABEL: cluster_seq_cst_fence_av_none:
+; GFX12-WGP:       ; %bb.0: ; %entry
+; GFX12-WGP-NEXT:    s_wait_bvhcnt 0x0
+; GFX12-WGP-NEXT:    s_wait_samplecnt 0x0
+; GFX12-WGP-NEXT:    s_wait_storecnt 0x0
+; GFX12-WGP-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX12-WGP-NEXT:    s_endpgm
+;
+; GFX12-CU-LABEL: cluster_seq_cst_fence_av_none:
+; GFX12-CU:       ; %bb.0: ; %entry
+; GFX12-CU-NEXT:    s_wait_bvhcnt 0x0
+; GFX12-CU-NEXT:    s_wait_samplecnt 0x0
+; GFX12-CU-NEXT:    s_wait_storecnt 0x0
+; GFX12-CU-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX12-CU-NEXT:    s_endpgm
+entry:
+  fence syncscope("cluster") seq_cst, !mmra !0
+  ret void
+}
+
+define amdgpu_kernel void @agent_acquire_fence_av_none() {
+; GFX90A-LABEL: agent_acquire_fence_av_none:
+; GFX90A:       ; %bb.0: ; %entry
+; GFX90A-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX90A-NEXT:    s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: agent_acquire_fence_av_none:
+; GFX90A-TGSPLIT:       ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX90A-TGSPLIT-NEXT:    s_endpgm
+;
+; GFX12-WGP-LABEL: agent_acquire_fence_av_none:
+; GFX12-WGP:       ; %bb.0: ; %entry
+; GFX12-WGP-NEXT:    s_wait_storecnt 0x0
+; GFX12-WGP-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX12-WGP-NEXT:    s_endpgm
+;
+; GFX12-CU-LABEL: agent_acquire_fence_av_none:
+; GFX12-CU:       ; %bb.0: ; %entry
+; GFX12-CU-NEXT:    s_wait_storecnt 0x0
+; GFX12-CU-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX12-CU-NEXT:    s_endpgm
+entry:
+  fence syncscope("agent") acquire, !mmra !0
+  ret void
+}
+
+define amdgpu_kernel void @agent_release_fence_av_none() {
+; GFX90A-LABEL: agent_release_fence_av_none:
+; GFX90A:       ; %bb.0: ; %entry
+; GFX90A-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX90A-NEXT:    s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: agent_release_fence_av_none:
+; GFX90A-TGSPLIT:       ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX90A-TGSPLIT-NEXT:    s_endpgm
+;
+; GFX12-WGP-LABEL: agent_release_fence_av_none:
+; GFX12-WGP:       ; %bb.0: ; %entry
+; GFX12-WGP-NEXT:    s_wait_bvhcnt 0x0
+; GFX12-WGP-NEXT:    s_wait_samplecnt 0x0
+; GFX12-WGP-NEXT:    s_wait_storecnt 0x0
+; GFX12-WGP-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX12-WGP-NEXT:    s_endpgm
+;
+; GFX12-CU-LABEL: agent_release_fence_av_none:
+; GFX12-CU:       ; %bb.0: ; %entry
+; GFX12-CU-NEXT:    s_wait_bvhcnt 0x0
+; GFX12-CU-NEXT:    s_wait_samplecnt 0x0
+; GFX12-CU-NEXT:    s_wait_storecnt 0x0
+; GFX12-CU-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX12-CU-NEXT:    s_endpgm
+entry:
+  fence syncscope("agent") release, !mmra !0
+  ret void
+}
+
+define amdgpu_kernel void @system_seq_cst_fence_av_none() {
+; GFX90A-LABEL: system_seq_cst_fence_av_none:
+; GFX90A:       ; %bb.0: ; %entry
+; GFX90A-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX90A-NEXT:    s_endpgm
+;
+; GFX90A-TGSPLIT-LABEL: system_seq_cst_fence_av_none:
+; GFX90A-TGSPLIT:       ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT:    s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX90A-TGSPLIT-NEXT:    s_endpgm
+;
+; GFX12-WGP-LABEL: system_seq_cst_fence_av_none:
+; GFX12-WGP:       ; %bb.0: ; %entry
+; GFX12-WGP-NEXT:    s_wait_bvhcnt 0x0
+; GFX12-WGP-NEXT:    s_wait_samplecnt 0x0
+; GFX12-WGP-NEXT:    s_wait_storecnt 0x0
+; GFX12-WGP-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX12-WGP-NEXT:    s_endpgm
+;
+; GFX12-CU-LABEL: system_seq_cst_fence_av_none:
+; GFX12-CU:       ; %bb.0: ; %entry
+; GFX12-CU-NEXT:    s_wait_bvhcnt 0x0
+; GFX12-CU-NEXT:    s_wait_samplecnt 0x0
+; GFX12-CU-NEXT:    s_wait_storecnt 0x0
+; GFX12-CU-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX12-CU-NEXT:    s_endpgm
+entry:
+  fence seq_cst, !mmra !0
+  ret void
+}
+
+; Atomic loads: acquire across scopes.
+
+define i32 @workgroup_acquire_load_av_none(ptr addrspace(1) %ptr) {
+; GFX90A-LABEL: workgroup_acquire_load_av_none:
+; GFX90A:       ; %bb.0: ; %entry
+; GFX90A-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX90A-NEXT:    v_mov_b32_e32 v2, v1
+; GFX90A-NEXT:    ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX90A-NEXT:    v_mov_b32_e32 v1, v2
+; GFX90A-NEXT:    global_load_dword v0, v[0:1], off
+; GFX90A-NEXT:    s_waitcnt vmcnt(0)
+; GFX90A-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX90A-TGSPLIT-LABEL: workgroup_acquire_load_av_none:
+; GFX90A-TGSPLIT:       ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX90A-TGSPLIT-NEXT:    v_mov_b32_e32 v2, v1
+; GFX90A-TGSPLIT-NEXT:    ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX90A-TGSPLIT-NEXT:    v_mov_b32_e32 v1, v2
+; GFX90A-TGSPLIT-NEXT:    global_load_dword v0, v[0:1], off glc
+; GFX90A-TGSPLIT-NEXT:    s_waitcnt vmcnt(0)
+; GFX90A-TGSPLIT-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX12-WGP-LABEL: workgroup_acquire_load_av_none:
+; GFX12-WGP:       ; %bb.0: ; %entry
+; GFX12-WGP-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX12-WGP-NEXT:    s_wait_expcnt 0x0
+; GFX12-WGP-NEXT:    s_wait_samplecnt 0x0
+; GFX12-WGP-NEXT:    s_wait_bvhcnt 0x0
+; GFX12-WGP-NEXT:    s_wait_kmcnt 0x0
+; GFX12-WGP-NEXT:    v_mov_b32_e32 v2, v1
+; GFX12-WGP-NEXT:    ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX12-WGP-NEXT:    v_mov_b32_e32 v1, v2
+; GFX12-WGP-NEXT:    global_load_b32 v0, v[0:1], off scope:SCOPE_SE
+; GFX12-WGP-NEXT:    s_wait_loadcnt 0x0
+; GFX12-WGP-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX12-CU-LABEL: workgroup_acquire_load_av_none:
+; GFX12-CU:       ; %bb.0: ; %entry
+; GFX12-CU-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX12-CU-NEXT:    s_wait_expcnt 0x0
+; GFX12-CU-NEXT:    s_wait_samplecnt 0x0
+; GFX12-CU-NEXT:    s_wait_bvhcnt 0x0
+; GFX12-CU-NEXT:    s_wait_kmcnt 0x0
+; GFX12-CU-NEXT:    v_mov_b32_e32 v2, v1
+; GFX12-CU-NEXT:    ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX12-CU-NEXT:    v_mov_b32_e32 v1, v2
+; GFX12-CU-NEXT:    global_load_b32 v0, v[0:1], off
+; GFX12-CU-NEXT:    s_wait_loadcnt 0x0
+; GFX12-CU-NEXT:    s_setpc_b64 s[30:31]
+entry:
+  %val = load atomic i32, ptr addrspace(1) %ptr syncscope("workgroup") acquire, align 4, !mmra !0
+  ret i32 %val
+}
+
+define i32 @agent_acquire_load_av_none(ptr addrspace(1) %ptr) {
+; GFX90A-LABEL: agent_acquire_load_av_none:
+; GFX90A:       ; %bb.0: ; %entry
+; GFX90A-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX90A-NEXT:    v_mov_b32_e32 v2, v1
+; GFX90A-NEXT:    ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX90A-NEXT:    v_mov_b32_e32 v1, v2
+; GFX90A-NEXT:    global_load_dword v0, v[0:1], off glc
+; GFX90A-NEXT:    s_waitcnt vmcnt(0)
+; GFX90A-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX90A-TGSPLIT-LABEL: agent_acquire_load_av_none:
+; GFX90A-TGSPLIT:       ; %bb.0: ; %entry
+; GFX90A-TGSPLIT-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX90A-TGSPLIT-NEXT:    v_mov_b32_e32 v2, v1
+; GFX90A-TGSPLIT-NEXT:    ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX90A-TGSPLIT-NEXT:    v_mov_b32_e32 v1, v2
+; GFX90A-TGSPLIT-NEXT:    global_load_dword v0, v[0:1], off glc
+; GFX90A-TGSPLIT-NEXT:    s_waitcnt vmcnt(0)
+; GFX90A-TGSPLIT-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX12-WGP-LABEL: agent_acquire_load_av_none:
+; GFX12-WGP:       ; %bb.0: ; %entry
+; GFX12-WGP-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX12-WGP-NEXT:    s_wait_expcnt 0x0
+; GFX12-WGP-NEXT:    s_wait_samplecnt 0x0
+; GFX12-WGP-NEXT:    s_wait_bvhcnt 0x0
+; GFX12-WGP-NEXT:    s_wait_kmcnt 0x0
+; GFX12-WGP-NEXT:    v_mov_b32_e32 v2, v1
+; GFX12-WGP-NEXT:    ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX12-WGP-NEXT:    v_mov_b32_e32 v1, v2
+; GFX12-WGP-NEXT:    global_load_b32 v0, v[0:1], off scope:SCOPE_DEV
+; GFX12-WGP-NEXT:    s_wait_loadcnt 0x0
+; GFX12-WGP-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX12-CU-LABEL: agent_acquire_load_av_none:
+; GFX12-CU:       ; %bb.0: ; %entry
+; GFX12-CU-NEXT:    s_wait_loadcnt_dscnt 0x0
+; GFX12-CU-NEXT:    s_wait_expcnt 0x0
+; GFX12-CU-NEXT:    s_wait_samplecnt 0x0
+; GFX12-CU-NEXT:    s_wait_bvhcnt 0x0
+; GFX12-CU-NEXT:    s_wait_kmcnt 0x0
+; GFX12-CU-NEXT:    v_mov_b32_e32 v2, v1
+; GFX12-CU-NEXT:    ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; GFX12-CU-NEXT:    v_mov_b32_e32 v1, v2
+; GFX12-CU-NEXT:    global_load_b32 v0, v[0:1], off scope:SCOPE_DEV
+; GFX12-CU-NEXT:    s_wait_loadcnt 0x0
+; GFX12-CU-NEXT:    s_setpc_b64 s[30:31]
+entry:
+  %val = load atomic i32, ptr addrspace(1) %ptr syncscope("agent") acquire, align 4, !mmra !0
+  ret i32 %val
+}
+
+define i32 @system_acquire_load_av_none(ptr addrspace(1) %ptr) {
+; GFX90A-LABEL: system_acquire_load_av_none:
+; GFX90A:       ; %bb.0: ; %entry
+; GFX90A-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX90A-NEXT:    v_mov_b32_e32 v2, v1
+; GFX90A-NEXT:    ; ...
[truncated]

ritter-x2a · 2026-05-26T12:13:36Z

+  auto MMRA = MMRAMetadata(MI.getMMRAMetadata());
+  if (!MMRA)
+    return false;
+  return MMRA.hasTag("amdgcn-av", "none");


Should the tag name be amdgpu-av, to be more consistent with the existing amdgpu-synchronize-as?

Also: should we diagnose values other than "none"? There is no case where we want to make use of the happens-before-breaking semantics of incompatible MMRAs, right?

Should the tag name be amdgpu-av, to be more consistent with the existing amdgpu-synchronize-as?

It's something I had explored. amdgpu-synchronize-as is one of the rare places where "amdgpu" is used, while in the case of almost all builtins, intrinsics and metadata, "amdgcn" is the convention. @Pierre-vh were you trying to start a newer convention with the amdgpu-synchronize-as?

Also: should we diagnose values other than "none"? There is no case where we want to make use of the happens-before-breaking semantics of incompatible MMRAs, right?

Yes, I will add the check.

Also: should we diagnose values other than "none"? There is no case where we want to make use of the happens-before-breaking semantics of incompatible MMRAs, right?

I remember I asked similar question somewhere else but can't find it. What should be the correct way of handling metadata verification? In the IR verifier or where they are being used?

Yeah, I was struggling with that when validating "!amdgcn-av !none". For now I just copied what is done for "!amdgpu-synchronize-as", which is to validate it at the time of consumption. It would be nice to separately work on an AMDGPU metadata verifier plugged into the IR verifier.

Co-authored-by: Pierre van Houtryve <pierre.vanhoutryve@amd.com>

…ctor-acq-rel

…l' into users/ssahasra/av-metadata

github-actions · 2026-05-27T09:49:05Z

🐧 Linux x64 Test Results

195898 tests passed
5274 tests skipped

✅ The build succeeded and all tests passed.

github-actions · 2026-05-27T09:49:05Z

🪟 Windows x64 Test Results

135191 tests passed
3333 tests skipped

✅ The build succeeded and all tests passed.

…C) (#199486) A release consists of two actions: write-back the current cache, and wait for "relevant" outstanding operations to complete. With the new memory model, it is possible to disable the cache write-back using "non-av". This patch cleanly separates the existing implementation so that the write-backs can be selectively applied after checking for non-av semantics. Part of a stack: - #199486 - #199621 - #199489 - #199622 Assisted-By: Claude Opus 4.6 --------- Co-authored-by: Pierre van Houtryve <pierre.vanhoutryve@amd.com>

RyanRio

Don't see any issues with this. You could also add a test that combines the synchronize-as metadata with the av metadata, for applicable areas.

…s/ssahasra/av-metadata

ssahasra added 2 commits May 25, 2026 12:04

ssahasra requested review from RyanRio, jayfoad and t-tye May 25, 2026 06:54

ssahasra requested review from Pierre-vh and ritter-x2a as code owners May 25, 2026 06:54

llvmorg-github-actions Bot added the backend:AMDGPU label May 25, 2026

This was referenced May 25, 2026

[AMDGPU] Refactor insertRelease into insertWriteback + insertWait (NFC) #199486

Merged

[AMDGPU] A Vulkan-style memory model weaker than the LLVM model #191246

Open

ritter-x2a reviewed May 26, 2026

View reviewed changes

declare and assign immediately if possible

a3deda2

Co-authored-by: Pierre van Houtryve <pierre.vanhoutryve@amd.com>

This was referenced May 27, 2026

[IR] Introduce an appendTags() idiom to set MMRA metadata [NFC] #199621

Merged

[Clang][AMDGPU] Add amdgcn_av("none") attribute for atomic expressions #199622

Open

ssahasra added 6 commits May 27, 2026 10:35

Merge remote-tracking branch 'upstream/main' into users/ssahasra/refa…

a8b11f7

…ctor-acq-rel

remove unreachable "default:" case in switch

9d71870

actually cover all values in the enum scope

685c27f

Merge remote-tracking branch 'upstream/users/ssahasra/refactor-acq-re…

ed944cd

…l' into users/ssahasra/av-metadata

diagnose unknown metadata

3496778

always diagnose unknown metadata

3d9cc99

Base automatically changed from users/ssahasra/refactor-acq-rel to main May 27, 2026 14:38

RyanRio reviewed May 27, 2026

View reviewed changes

Merge branch 'main' of https://github.com/llvm/llvm-project into user…

7b5ba23

…s/ssahasra/av-metadata

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU] Implement "non-av" semantics using metadata#199489

[AMDGPU] Implement "non-av" semantics using metadata#199489
ssahasra wants to merge 10 commits into
mainfrom
users/ssahasra/av-metadata

ssahasra commented May 25, 2026 •

edited

Loading

Uh oh!

llvmorg-github-actions Bot commented May 25, 2026

Uh oh!

ritter-x2a May 26, 2026

Uh oh!

ritter-x2a May 26, 2026

Uh oh!

ssahasra May 27, 2026

Uh oh!

shiltian May 27, 2026

Uh oh!

ssahasra May 28, 2026

Uh oh!

github-actions Bot commented May 27, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 27, 2026 •

edited

Loading

Uh oh!

RyanRio left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ssahasra commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmorg-github-actions Bot commented May 25, 2026

Uh oh!

ritter-x2a May 26, 2026

Choose a reason for hiding this comment

Uh oh!

ritter-x2a May 26, 2026

Choose a reason for hiding this comment

Uh oh!

ssahasra May 27, 2026

Choose a reason for hiding this comment

Uh oh!

shiltian May 27, 2026

Choose a reason for hiding this comment

Uh oh!

ssahasra May 28, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐧 Linux x64 Test Results

Uh oh!

github-actions Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🪟 Windows x64 Test Results

Uh oh!

RyanRio left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ssahasra commented May 25, 2026 •

edited

Loading

github-actions Bot commented May 27, 2026 •

edited

Loading

github-actions Bot commented May 27, 2026 •

edited

Loading