Skip to content

Commit ccba9f8

Browse files
alsepkowCopilotgithub-actions[bot]
authored
Fix GVN and SROA miscompilation of min precision vector element access (#8269)
## Summary Fixes three related optimizer bugs that cause miscompilation of min precision vector element access (`[]` operator) on `min16float`, `min16int`, and `min16uint` types at optimization levels O1+. Resolves #8268 ## Root Cause DXC's data layout pads min precision types (`i16:32`, `f16:32`). The HLSL change in `DataLayout::getTypeSizeInBits` makes vector sizes use alloc size per element (e.g., `<3 x i16>` = 96 bits), but scalar `getTypeSizeInBits(i16)` returns the primitive width (16 bits). This inconsistency causes three bugs: 1. **GVN ICE**: `CanCoerceMustAliasedValueToLoad` creates a padded-width integer (i96) then attempts a bitcast from the 48-bit LLVM vector type — assert fires. 2. **GVN incorrect store forwarding**: `processLoad` forwards a zeroinitializer store past partial element stores because MemoryDependence uses padded sizes for aliasing. 3. **SROA element misindexing** (primary bug): `getNaturalGEPRecursively` uses `getTypeSizeInBits(i16)/8 = 2` for element offsets, while GEP uses `getTypeAllocSize(i16) = 4`. Byte offset 4 (element 1) maps to index `4/2 = 2` instead of `4/4 = 1`, causing SROA to misplace or eliminate element stores. ## Changes **`lib/Transforms/Scalar/GVN.cpp`** - `CanCoerceMustAliasedValueToLoad`: Reject coercion when type sizes include padding (`getTypeSizeInBits != getPrimitiveSizeInBits`) - `processLoad` StoreInst handler: Skip store-to-load forwarding for padded types **`lib/Transforms/Scalar/SROA.cpp`** - `getNaturalGEPRecursively`: Use `getTypeAllocSizeInBits` for vector element size to match GEP offset stride - `isVectorPromotionViable`: Same fix for element size calculation - `AllocaSliceRewriter` constructor: Same fix for `ElementSize` ## Testing - All 6 min precision ArrayOperator tests (StaticAccess + DynamicAccess × 3 types) pass with the fix - Verified optimized DXIL output retains all 3 element stores with correct indices ## Co-authored This fix was investigated and implemented with the assistance of GitHub Copilot (AI pair programming). The root cause analysis — tracing the bug through `-print-after-all` pass dumps, identifying SROA as the culprit, and understanding the `getTypeSizeInBits` vs `getTypeAllocSize` mismatch — was a collaborative effort. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
1 parent 71aa195 commit ccba9f8

4 files changed

Lines changed: 334 additions & 4 deletions

File tree

lib/Transforms/Scalar/GVN.cpp

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -853,6 +853,17 @@ static bool CanCoerceMustAliasedValueToLoad(Value *StoredVal,
853853
StoredVal->getType()->isArrayTy())
854854
return false;
855855

856+
// HLSL Change Begin - Reject types where padded and primitive sizes differ.
857+
// Coercion would create bitcasts between mismatched sizes.
858+
Type *StoredValTy = StoredVal->getType();
859+
uint64_t StoredPrimBits = StoredValTy->getPrimitiveSizeInBits();
860+
uint64_t LoadPrimBits = LoadTy->getPrimitiveSizeInBits();
861+
if (StoredPrimBits && DL.getTypeSizeInBits(StoredValTy) != StoredPrimBits)
862+
return false;
863+
if (LoadPrimBits && DL.getTypeSizeInBits(LoadTy) != LoadPrimBits)
864+
return false;
865+
// HLSL Change End
866+
856867
// The store has to be at least as big as the load.
857868
if (DL.getTypeSizeInBits(StoredVal->getType()) <
858869
DL.getTypeSizeInBits(LoadTy))
@@ -1942,6 +1953,16 @@ bool GVN::processLoad(LoadInst *L) {
19421953
if (StoreInst *DepSI = dyn_cast<StoreInst>(DepInst)) {
19431954
Value *StoredVal = DepSI->getValueOperand();
19441955

1956+
// HLSL Change Begin - Defense-in-depth: skip cross-type forwarding for
1957+
// padded types (e.g., min precision vectors).
1958+
if (StoredVal->getType() != L->getType()) {
1959+
Type *StoredTy = StoredVal->getType();
1960+
uint64_t StoredPrimBits = StoredTy->getPrimitiveSizeInBits();
1961+
if (StoredPrimBits && DL.getTypeSizeInBits(StoredTy) != StoredPrimBits)
1962+
return false;
1963+
}
1964+
// HLSL Change End
1965+
19451966
// The store and load are to a must-aliased pointer, but they may not
19461967
// actually have the same type. See if we know how to reuse the stored
19471968
// value (depending on its type).

lib/Transforms/Scalar/SROA.cpp

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1671,7 +1671,10 @@ static Value *getNaturalGEPRecursively(IRBuilderTy &IRB, const DataLayout &DL,
16711671
// extremely poorly defined currently. The long-term goal is to remove GEPing
16721672
// over a vector from the IR completely.
16731673
if (VectorType *VecTy = dyn_cast<VectorType>(Ty)) {
1674-
unsigned ElementSizeInBits = DL.getTypeSizeInBits(VecTy->getScalarType());
1674+
// HLSL Change: Use alloc size for element stride to account for padded
1675+
// types.
1676+
unsigned ElementSizeInBits =
1677+
DL.getTypeAllocSizeInBits(VecTy->getScalarType());
16751678
if (ElementSizeInBits % 8 != 0) {
16761679
// GEPs over non-multiple of 8 size vector elements are invalid.
16771680
return nullptr;
@@ -2134,7 +2137,8 @@ static VectorType *isVectorPromotionViable(AllocaSlices::Partition &P,
21342137

21352138
// Try each vector type, and return the one which works.
21362139
auto CheckVectorTypeForPromotion = [&](VectorType *VTy) {
2137-
uint64_t ElementSize = DL.getTypeSizeInBits(VTy->getElementType());
2140+
// HLSL Change: Use alloc size to match GEP offset stride for padded types.
2141+
uint64_t ElementSize = DL.getTypeAllocSizeInBits(VTy->getElementType());
21382142

21392143
// While the definition of LLVM vectors is bitpacked, we don't support sizes
21402144
// that aren't byte sized.
@@ -2492,12 +2496,14 @@ class AllocaSliceRewriter : public InstVisitor<AllocaSliceRewriter, bool> {
24922496
: nullptr),
24932497
VecTy(PromotableVecTy),
24942498
ElementTy(VecTy ? VecTy->getElementType() : nullptr),
2495-
ElementSize(VecTy ? DL.getTypeSizeInBits(ElementTy) / 8 : 0),
2499+
// HLSL Change: Use alloc size to match GEP offset stride for padded
2500+
// types.
2501+
ElementSize(VecTy ? DL.getTypeAllocSizeInBits(ElementTy) / 8 : 0),
24962502
BeginOffset(), EndOffset(), IsSplittable(), IsSplit(), OldUse(),
24972503
OldPtr(), PHIUsers(PHIUsers), SelectUsers(SelectUsers),
24982504
IRB(NewAI.getContext(), ConstantFolder()) {
24992505
if (VecTy) {
2500-
assert((DL.getTypeSizeInBits(ElementTy) % 8) == 0 &&
2506+
assert((DL.getTypeAllocSizeInBits(ElementTy) % 8) == 0 && // HLSL Change
25012507
"Only multiple-of-8 sized vector elements are viable");
25022508
++NumVectorized;
25032509
}
Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
; RUN: opt < %s -basicaa -gvn -S | FileCheck %s
2+
3+
; Regression test for min precision vector GVN miscompilation.
4+
; DXC's data layout pads i16 to 32 bits (i16:32). GVN must not:
5+
; 1. Coerce padded vector types via bitcast (CanCoerceMustAliasedValueToLoad)
6+
; 2. Forward a zeroinitializer store past partial element stores (processLoad)
7+
;
8+
; Without the fix, GVN would forward the zeroinitializer vector load, producing
9+
; incorrect all-zero results for elements that were individually written.
10+
11+
target datalayout = "e-m:e-p:32:32-i1:32-i8:32-i16:32-i32:32-i64:64-f16:32-f32:32-f64:64-n8:16:32:64"
12+
target triple = "dxil-ms-dx"
13+
14+
; Test 1: GVN must not forward zeroinitializer past element store for <3 x i16>.
15+
; The store of zeroinitializer to %dst is followed by an element store to
16+
; %dst[0], then a vector load of %dst. GVN must not replace the vector load
17+
; with the zeroinitializer.
18+
19+
; CHECK-LABEL: @test_no_forward_i16_vec3
20+
; CHECK: store <3 x i16> zeroinitializer
21+
; CHECK: store i16 %val
22+
; The vector load must survive — GVN must not replace it with zeroinitializer.
23+
; CHECK: %result = load <3 x i16>
24+
; CHECK: ret <3 x i16> %result
25+
define <3 x i16> @test_no_forward_i16_vec3(i16 %val) {
26+
entry:
27+
%dst = alloca <3 x i16>, align 4
28+
store <3 x i16> zeroinitializer, <3 x i16>* %dst, align 4
29+
%elem0 = getelementptr inbounds <3 x i16>, <3 x i16>* %dst, i32 0, i32 0
30+
store i16 %val, i16* %elem0, align 4
31+
%result = load <3 x i16>, <3 x i16>* %dst, align 4
32+
ret <3 x i16> %result
33+
}
34+
35+
; Test 2: Same pattern with <3 x half> (f16:32 padding).
36+
37+
; CHECK-LABEL: @test_no_forward_f16_vec3
38+
; CHECK: store <3 x half> zeroinitializer
39+
; CHECK: store half %val
40+
; CHECK: %result = load <3 x half>
41+
; CHECK: ret <3 x half> %result
42+
define <3 x half> @test_no_forward_f16_vec3(half %val) {
43+
entry:
44+
%dst = alloca <3 x half>, align 4
45+
store <3 x half> zeroinitializer, <3 x half>* %dst, align 4
46+
%elem0 = getelementptr inbounds <3 x half>, <3 x half>* %dst, i32 0, i32 0
47+
store half %val, half* %elem0, align 4
48+
%result = load <3 x half>, <3 x half>* %dst, align 4
49+
ret <3 x half> %result
50+
}
51+
52+
; Test 3: Multiple element stores — all must survive.
53+
; Stores to elements 0, 1, 2 of a <3 x i16> vector after zeroinitializer.
54+
55+
; CHECK-LABEL: @test_no_forward_i16_vec3_all_elems
56+
; CHECK: store <3 x i16> zeroinitializer
57+
; CHECK: store i16 %v0
58+
; CHECK: store i16 %v1
59+
; CHECK: store i16 %v2
60+
; CHECK: %result = load <3 x i16>
61+
; CHECK: ret <3 x i16> %result
62+
define <3 x i16> @test_no_forward_i16_vec3_all_elems(i16 %v0, i16 %v1, i16 %v2) {
63+
entry:
64+
%dst = alloca <3 x i16>, align 4
65+
store <3 x i16> zeroinitializer, <3 x i16>* %dst, align 4
66+
%e0 = getelementptr inbounds <3 x i16>, <3 x i16>* %dst, i32 0, i32 0
67+
store i16 %v0, i16* %e0, align 4
68+
%e1 = getelementptr inbounds <3 x i16>, <3 x i16>* %dst, i32 0, i32 1
69+
store i16 %v1, i16* %e1, align 4
70+
%e2 = getelementptr inbounds <3 x i16>, <3 x i16>* %dst, i32 0, i32 2
71+
store i16 %v2, i16* %e2, align 4
72+
%result = load <3 x i16>, <3 x i16>* %dst, align 4
73+
ret <3 x i16> %result
74+
}
75+
76+
; Test 4: Coercion rejection — store a <3 x i16> vector, load as different type.
77+
; GVN must not attempt bitcast coercion on padded types.
78+
; If coercion happened, the load would be eliminated and replaced with a bitcast.
79+
80+
; CHECK-LABEL: @test_no_coerce_i16_vec3
81+
; CHECK: store <3 x i16>
82+
; CHECK: load i96
83+
; CHECK-NOT: bitcast
84+
; CHECK: ret
85+
define i96 @test_no_coerce_i16_vec3(<3 x i16> %v) {
86+
entry:
87+
%ptr = alloca <3 x i16>, align 4
88+
store <3 x i16> %v, <3 x i16>* %ptr, align 4
89+
%iptr = bitcast <3 x i16>* %ptr to i96*
90+
%result = load i96, i96* %iptr, align 4
91+
ret i96 %result
92+
}
93+
94+
; Test 5: Long vector variant — <5 x i16> (exceeds 4-element native size).
95+
96+
; CHECK-LABEL: @test_no_forward_i16_vec5
97+
; CHECK: store <5 x i16> zeroinitializer
98+
; CHECK: store i16 %val
99+
; CHECK: %result = load <5 x i16>
100+
; CHECK: ret <5 x i16> %result
101+
define <5 x i16> @test_no_forward_i16_vec5(i16 %val) {
102+
entry:
103+
%dst = alloca <5 x i16>, align 4
104+
store <5 x i16> zeroinitializer, <5 x i16>* %dst, align 4
105+
%elem0 = getelementptr inbounds <5 x i16>, <5 x i16>* %dst, i32 0, i32 0
106+
store i16 %val, i16* %elem0, align 4
107+
%result = load <5 x i16>, <5 x i16>* %dst, align 4
108+
ret <5 x i16> %result
109+
}
110+
111+
; Test 6: Long vector variant — <8 x half>.
112+
113+
; CHECK-LABEL: @test_no_forward_f16_vec8
114+
; CHECK: store <8 x half> zeroinitializer
115+
; CHECK: store half %val
116+
; CHECK: %result = load <8 x half>
117+
; CHECK: ret <8 x half> %result
118+
define <8 x half> @test_no_forward_f16_vec8(half %val) {
119+
entry:
120+
%dst = alloca <8 x half>, align 4
121+
store <8 x half> zeroinitializer, <8 x half>* %dst, align 4
122+
%elem0 = getelementptr inbounds <8 x half>, <8 x half>* %dst, i32 0, i32 0
123+
store half %val, half* %elem0, align 4
124+
%result = load <8 x half>, <8 x half>* %dst, align 4
125+
ret <8 x half> %result
126+
}
127+
128+
; Test 7: Same-type store-to-load forwarding must still work for padded types.
129+
; GVN should forward %v directly — no intervening writes, same type.
130+
131+
; CHECK-LABEL: @test_same_type_forward_i16_vec3
132+
; The load should be eliminated and %v returned directly.
133+
; CHECK-NOT: load
134+
; CHECK: ret <3 x i16> %v
135+
define <3 x i16> @test_same_type_forward_i16_vec3(<3 x i16> %v) {
136+
entry:
137+
%ptr = alloca <3 x i16>, align 4
138+
store <3 x i16> %v, <3 x i16>* %ptr, align 4
139+
%result = load <3 x i16>, <3 x i16>* %ptr, align 4
140+
ret <3 x i16> %result
141+
}
142+
143+
; Test 8: Same-type forwarding for <3 x half>.
144+
145+
; CHECK-LABEL: @test_same_type_forward_f16_vec3
146+
; CHECK-NOT: load
147+
; CHECK: ret <3 x half> %v
148+
define <3 x half> @test_same_type_forward_f16_vec3(<3 x half> %v) {
149+
entry:
150+
%ptr = alloca <3 x half>, align 4
151+
store <3 x half> %v, <3 x half>* %ptr, align 4
152+
%result = load <3 x half>, <3 x half>* %ptr, align 4
153+
ret <3 x half> %result
154+
}
Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
; RUN: opt < %s -sroa -S | FileCheck %s
2+
3+
; Regression test for SROA miscompilation of min precision vector element access.
4+
; DXC's data layout pads i16/f16 to 32 bits (i16:32, f16:32), so GEP offsets
5+
; between vector elements are 4 bytes apart. SROA must use alloc size (not
6+
; primitive size) for element stride, otherwise element stores get misplaced.
7+
8+
target datalayout = "e-m:e-p:32:32-i1:32-i8:32-i16:32-i32:32-i64:64-f16:32-f32:32-f64:64-n8:16:32:64"
9+
target triple = "dxil-ms-dx"
10+
11+
; Test 1: Element-wise write to <3 x i16> vector.
12+
; SROA must map GEP byte offsets to correct element indices using alloc size
13+
; (4 bytes per i16), not primitive size (2 bytes). All stores must survive
14+
; with correct indices, and the final vector load must be preserved.
15+
16+
; CHECK-LABEL: @test_sroa_i16_vec3
17+
; CHECK: getelementptr inbounds <3 x i16>, <3 x i16>* %{{.*}}, i32 0, i32 0
18+
; CHECK: store i16 %v0
19+
; CHECK: getelementptr inbounds <3 x i16>, <3 x i16>* %{{.*}}, i32 0, i32 1
20+
; CHECK: store i16 %v1
21+
; CHECK: getelementptr inbounds <3 x i16>, <3 x i16>* %{{.*}}, i32 0, i32 2
22+
; CHECK: store i16 %v2
23+
; CHECK: load <3 x i16>
24+
; CHECK: ret <3 x i16>
25+
define <3 x i16> @test_sroa_i16_vec3(i16 %v0, i16 %v1, i16 %v2) {
26+
entry:
27+
%dst = alloca <3 x i16>, align 4
28+
store <3 x i16> zeroinitializer, <3 x i16>* %dst, align 4
29+
%e0 = getelementptr inbounds <3 x i16>, <3 x i16>* %dst, i32 0, i32 0
30+
store i16 %v0, i16* %e0, align 4
31+
%e1 = getelementptr inbounds <3 x i16>, <3 x i16>* %dst, i32 0, i32 1
32+
store i16 %v1, i16* %e1, align 4
33+
%e2 = getelementptr inbounds <3 x i16>, <3 x i16>* %dst, i32 0, i32 2
34+
store i16 %v2, i16* %e2, align 4
35+
%result = load <3 x i16>, <3 x i16>* %dst, align 4
36+
ret <3 x i16> %result
37+
}
38+
39+
; Test 2: Same pattern with <3 x half> (f16:32 padding).
40+
41+
; CHECK-LABEL: @test_sroa_f16_vec3
42+
; CHECK: getelementptr inbounds <3 x half>, <3 x half>* %{{.*}}, i32 0, i32 0
43+
; CHECK: store half %v0
44+
; CHECK: getelementptr inbounds <3 x half>, <3 x half>* %{{.*}}, i32 0, i32 1
45+
; CHECK: store half %v1
46+
; CHECK: getelementptr inbounds <3 x half>, <3 x half>* %{{.*}}, i32 0, i32 2
47+
; CHECK: store half %v2
48+
; CHECK: load <3 x half>
49+
; CHECK: ret <3 x half>
50+
define <3 x half> @test_sroa_f16_vec3(half %v0, half %v1, half %v2) {
51+
entry:
52+
%dst = alloca <3 x half>, align 4
53+
store <3 x half> zeroinitializer, <3 x half>* %dst, align 4
54+
%e0 = getelementptr inbounds <3 x half>, <3 x half>* %dst, i32 0, i32 0
55+
store half %v0, half* %e0, align 4
56+
%e1 = getelementptr inbounds <3 x half>, <3 x half>* %dst, i32 0, i32 1
57+
store half %v1, half* %e1, align 4
58+
%e2 = getelementptr inbounds <3 x half>, <3 x half>* %dst, i32 0, i32 2
59+
store half %v2, half* %e2, align 4
60+
%result = load <3 x half>, <3 x half>* %dst, align 4
61+
ret <3 x half> %result
62+
}
63+
64+
; Test 3: Partial write — only element 1 is stored. SROA must index it correctly.
65+
66+
; CHECK-LABEL: @test_sroa_i16_vec3_elem1
67+
; Element 1 store must be correctly placed at GEP index 1, not index 2.
68+
; Without the fix, byte offset 4 / prim_size 2 = index 2 (wrong).
69+
; With the fix, byte offset 4 / alloc_size 4 = index 1 (correct).
70+
; CHECK: getelementptr inbounds <3 x i16>, <3 x i16>* %{{.*}}, i32 0, i32 1
71+
; CHECK: store i16 %val
72+
; CHECK: load <3 x i16>
73+
; CHECK: ret <3 x i16>
74+
define <3 x i16> @test_sroa_i16_vec3_elem1(i16 %val) {
75+
entry:
76+
%dst = alloca <3 x i16>, align 4
77+
store <3 x i16> zeroinitializer, <3 x i16>* %dst, align 4
78+
%e1 = getelementptr inbounds <3 x i16>, <3 x i16>* %dst, i32 0, i32 1
79+
store i16 %val, i16* %e1, align 4
80+
%result = load <3 x i16>, <3 x i16>* %dst, align 4
81+
ret <3 x i16> %result
82+
}
83+
84+
; Test 4: Element 2 store — verifies highest index is correct.
85+
86+
; CHECK-LABEL: @test_sroa_i16_vec3_elem2
87+
; CHECK: getelementptr inbounds <3 x i16>, <3 x i16>* %{{.*}}, i32 0, i32 2
88+
; CHECK: store i16 %val
89+
; CHECK: load <3 x i16>
90+
; CHECK: ret <3 x i16>
91+
define <3 x i16> @test_sroa_i16_vec3_elem2(i16 %val) {
92+
entry:
93+
%dst = alloca <3 x i16>, align 4
94+
store <3 x i16> zeroinitializer, <3 x i16>* %dst, align 4
95+
%e2 = getelementptr inbounds <3 x i16>, <3 x i16>* %dst, i32 0, i32 2
96+
store i16 %val, i16* %e2, align 4
97+
%result = load <3 x i16>, <3 x i16>* %dst, align 4
98+
ret <3 x i16> %result
99+
}
100+
101+
; Test 5: Long vector — <5 x i16> (exceeds 4-element native size).
102+
103+
; CHECK-LABEL: @test_sroa_i16_vec5
104+
; CHECK: getelementptr inbounds <5 x i16>, <5 x i16>* %{{.*}}, i32 0, i32 0
105+
; CHECK: store i16 %v0
106+
; CHECK: getelementptr inbounds <5 x i16>, <5 x i16>* %{{.*}}, i32 0, i32 1
107+
; CHECK: store i16 %v1
108+
; CHECK: getelementptr inbounds <5 x i16>, <5 x i16>* %{{.*}}, i32 0, i32 4
109+
; CHECK: store i16 %v4
110+
; CHECK: load <5 x i16>
111+
; CHECK: ret <5 x i16>
112+
define <5 x i16> @test_sroa_i16_vec5(i16 %v0, i16 %v1, i16 %v2, i16 %v3, i16 %v4) {
113+
entry:
114+
%dst = alloca <5 x i16>, align 4
115+
store <5 x i16> zeroinitializer, <5 x i16>* %dst, align 4
116+
%e0 = getelementptr inbounds <5 x i16>, <5 x i16>* %dst, i32 0, i32 0
117+
store i16 %v0, i16* %e0, align 4
118+
%e1 = getelementptr inbounds <5 x i16>, <5 x i16>* %dst, i32 0, i32 1
119+
store i16 %v1, i16* %e1, align 4
120+
%e2 = getelementptr inbounds <5 x i16>, <5 x i16>* %dst, i32 0, i32 2
121+
store i16 %v2, i16* %e2, align 4
122+
%e3 = getelementptr inbounds <5 x i16>, <5 x i16>* %dst, i32 0, i32 3
123+
store i16 %v3, i16* %e3, align 4
124+
%e4 = getelementptr inbounds <5 x i16>, <5 x i16>* %dst, i32 0, i32 4
125+
store i16 %v4, i16* %e4, align 4
126+
%result = load <5 x i16>, <5 x i16>* %dst, align 4
127+
ret <5 x i16> %result
128+
}
129+
130+
; Test 6: Long vector — <8 x half>.
131+
132+
; CHECK-LABEL: @test_sroa_f16_vec8_partial
133+
; CHECK: getelementptr inbounds <8 x half>, <8 x half>* %{{.*}}, i32 0, i32 0
134+
; CHECK: store half %v0
135+
; CHECK: getelementptr inbounds <8 x half>, <8 x half>* %{{.*}}, i32 0, i32 7
136+
; CHECK: store half %v7
137+
; CHECK: load <8 x half>
138+
; CHECK: ret <8 x half>
139+
define <8 x half> @test_sroa_f16_vec8_partial(half %v0, half %v7) {
140+
entry:
141+
%dst = alloca <8 x half>, align 4
142+
store <8 x half> zeroinitializer, <8 x half>* %dst, align 4
143+
%e0 = getelementptr inbounds <8 x half>, <8 x half>* %dst, i32 0, i32 0
144+
store half %v0, half* %e0, align 4
145+
%e7 = getelementptr inbounds <8 x half>, <8 x half>* %dst, i32 0, i32 7
146+
store half %v7, half* %e7, align 4
147+
%result = load <8 x half>, <8 x half>* %dst, align 4
148+
ret <8 x half> %result
149+
}

0 commit comments

Comments
 (0)