Signal when SPIR-V legalization needs targeted cleanup#8283
Signal when SPIR-V legalization needs targeted cleanup#8283AnastaZIuk wants to merge 6 commits intomicrosoft:mainfrom
Conversation
You can test this locally with the following command:git-clang-format --diff bb168c9647beb3d80c25926dd8ac3aab864dd134 899afef2a8665fa9c861b6b7be8562bf7efb850e -- include/dxc/Support/SPIRVOptions.h lib/DxcSupport/HLSLOptions.cpp tools/clang/lib/Frontend/Rewrite/RewriteObjC.cpp tools/clang/lib/SPIRV/SpirvEmitter.cpp tools/clang/lib/SPIRV/SpirvEmitter.hView the diff from clang-format here.diff --git a/lib/DxcSupport/HLSLOptions.cpp b/lib/DxcSupport/HLSLOptions.cpp
index 0c43b9ff..bda90be7 100644
--- a/lib/DxcSupport/HLSLOptions.cpp
+++ b/lib/DxcSupport/HLSLOptions.cpp
@@ -792,9 +792,8 @@ int ReadDxcOpts(const OptTable *optionTable, unsigned flagsToInclude,
}
opts.DisableOptimizations = false;
- if (Arg *A =
- Args.getLastArg(OPT_O0, OPT_O1, OPT_O1experimental, OPT_O2, OPT_O3,
- OPT_Od)) {
+ if (Arg *A = Args.getLastArg(OPT_O0, OPT_O1, OPT_O1experimental, OPT_O2,
+ OPT_O3, OPT_Od)) {
if (A->getOption().matches(OPT_O0))
opts.OptLevel = 0;
if (A->getOption().matches(OPT_O1))
@@ -1267,8 +1266,7 @@ int ReadDxcOpts(const OptTable *optionTable, unsigned flagsToInclude,
errors << "-Oconfig should not be specified more than once";
return 1;
}
- if (Args.getLastArg(OPT_O0, OPT_O1, OPT_O1experimental, OPT_O2,
- OPT_O3)) {
+ if (Args.getLastArg(OPT_O0, OPT_O1, OPT_O1experimental, OPT_O2, OPT_O3)) {
errors << "-Oconfig should not be used together with -O";
return 1;
}
diff --git a/tools/clang/lib/SPIRV/SpirvEmitter.cpp b/tools/clang/lib/SPIRV/SpirvEmitter.cpp
index 82b53abb..f569d42a 100644
--- a/tools/clang/lib/SPIRV/SpirvEmitter.cpp
+++ b/tools/clang/lib/SPIRV/SpirvEmitter.cpp
@@ -593,10 +593,9 @@ SpirvEmitter::SpirvEmitter(CompilerInstance &ci)
constEvaluator(astContext, spvBuilder), entryFunction(nullptr),
curFunction(nullptr), curThis(nullptr), seenPushConstantAt(),
isSpecConstantMode(false), needsLegalization(false),
- needsLegalizationLoopUnroll(false),
- needsLegalizationSsaRewrite(false),
- sawExplicitUnrollHint(false),
- beforeHlslLegalization(false), mainSourceFile(nullptr) {
+ needsLegalizationLoopUnroll(false), needsLegalizationSsaRewrite(false),
+ sawExplicitUnrollHint(false), beforeHlslLegalization(false),
+ mainSourceFile(nullptr) {
// Get ShaderModel from command line hlsl profile option.
const hlsl::ShaderModel *shaderModel =
@@ -972,8 +971,7 @@ void SpirvEmitter::HandleTranslationUnit(ASTContext &context) {
!dsetbindingsToCombineImageSampler.empty() ||
spirvOptions.signaturePacking;
needsLegalizationSsaRewrite =
- needsLegalizationSsaRewrite ||
- !dsetbindingsToCombineImageSampler.empty();
+ needsLegalizationSsaRewrite || !dsetbindingsToCombineImageSampler.empty();
// Run legalization passes
if (spirvOptions.codeGenHighLevel) {
|
s-perron
left a comment
There was a problem hiding this comment.
I don't think we will be able to accept these changes. I understand the place this is coming from. Optimization take time, and if we do lots of unrolling it creates lots of extra code that needs to be optimized.
The problem is that we have to generate legal code, and there are cases where these optimizations have proven necessary.
As a simple example, this PR changes the expected behaviour of on of our example from the SPIR-V cookbook.
We can try to work on alternatives, but they have to be done in a way that we can still legalize as many cases as we currently do. If not, we will break someone.
Things that cannot happen:
-
The front-end is not the place to determine if an optimization can be skipped. It has too little information and cannot account for how other optimizations will change things. For example, can we correctly legalize the code if we limit ssa rewrite to opaque objects only? Not very likely. That pass is needed to enable all sort of things like scalar replacement. It might work in your code, but I can easily come up with cases where it will not.
-
Unrolling can significantly impact runtime performance, and the impact is often unpredictable. There are some drivers that I have see that do not do much unrolling in their drivers, and having spir-v do the unrolling is important for runtime performance. We cannot change the unrolling pass in the performance passes.
Potential ideas:
- We could modify the unrolling in the legalization passes to look for illegal code in the loop. This could leave many of the loops in your code.
- You can disable the performance passs, or replace them with your own set of passes to run for the optimization phase. See here.
- If you want even more control, you can dump the code before legalization (-fcgl), and call the optimizer yourself with the passes you want.
These are ideas for limiting the unrolling. I don't think there is a way to limit the ssa rewrite pass the way you want. However, I don't think that is a good idea. It could impact too many other passes.
On a side note, do you see any changes in the time it takes drivers to compile the shaders/pipelines if you skip all of the unrolling in sirv-opt?
| // CHECK-NEXT: [[ptr:%[0-9]+]] = OpAccessChain %_ptr_Uniform_S %gRWSBuffer | ||
| // CHECK-NEXT: OpStore [[ptr]] [[val]] | ||
| // CHECK: OpLoopMerge {{%[0-9]+}} {{%[0-9]+}} Unroll | ||
| // CHECK: [[which:%[0-9]+]] = OpSelect %_ptr_Uniform_type_StructuredBuffer_S {{%[0-9]+}} %gSBuffer1 %gSBuffer2 |
There was a problem hiding this comment.
This is not legal. We have to unroll in this case.
There was a problem hiding this comment.
yep, it would only be legal with SPV_KHR_variable_pointers if the gSBuffer pointers were of Shader Storage Class.
However I could go out on a limb and claim that it was never supposed to be legal without [[unroll]] on the loop ( I know the example has unroll so in this case SPIR-V codegen shouldn't change)
Yes my first suggestion was probably do it as a flag like the #7996 PR which also changes previous behaviour, but takes care to avoid changing the old default behaviour.
We know that not every driver is Mesa and doesn't optimize fully and are currently running benchmarks for performance regressions. However I am of the opinion that:
TL;DR the performance passes should only be allowed to unroll loops without There's obviously the problem of if-statements, switches, constrant propagation, etc. which can allow turning illegal code legal and they have no attributes to control that. There's a few examples which in my option are a bit messed up and wouldn't make people cry if a flag disabled them
Love the idea, I guess that moves the problem and our impl into SPIR-V Tools ?
Actually if we step back and look at your idea above, an attribute like If this could percolate to the SPIR-V Spec as an At the end of the day analyzing whether to unroll or legalize might be more expensive than just unrolling and legalizing, since you can't exaclty and in leess time, know if all the transformations will end up making the code legal before you try them. My intuition tells me Halting Problem will rear its ugly head at some point if we try this without annotations and some help from the frontend.
@AnastaZIuk is correct here, unrolling for legalization should not kick in without Stuff like [unroll]
for( int j = 0; j < 2; j++ ) {
if (constant > j) {
lSBuffer = gSBuffer1;
} else {
lSBuffer = gSBuffer2;
}
gRWSBuffer[j] = lSBuffer[j];
}should legalize fine, but things like [loop]
for( int j = 0; j < 2; j++ ) {
if (constant > j) {
lSBuffer = gSBuffer1;
} else {
lSBuffer = gSBuffer2;
}
gRWSBuffer[j] = lSBuffer[j];
}really shouldn't IMHO
We'll take our worst shader and let you know. The main problem for us is hot-reload/iteration time, and majority of the time is spent in SPIR-V Tools. Whereas with a 58k LoC shader with template hell like ours we'd expect to spend majority of the time in Clang's Sema. Mind you the difference between SPIR-V ResultID upper bound is only 10k vs 16k for the We're open for ideas#s-perron how would you have us solve the problem of "too much SPIR-V spam". Sidequests
@s-perron was that done correctly, or is that something you'd want fixed? |
The templates will generally generate more code than you realize, creating more work for the optimizer. I would always expect the optimizer to take the most time.
I agree, and we are even more restrictive then that. SPIR-V tools unrolls only when the user marks the loop as unroll, and the loop can be fully unrolled. This is true for performance and legalization. We are suppose to be doing what you want. If you have examples of loop being unrolled when they are not marked by the user as |
Thanks, I wrote that before I dove into the SPIR-V Tools code, also I was thinking that
I do reailze there's also a lot of SPIR-V spam from unused functions and such, one pass of dead code elimination does wonders. Also with Arkadiusz' legalization changes the same code compiles 10x faster (including our Boost.Wave preprocessor doing crazy thigns with BOOST_PP) while only costing 8% runtime performance, so very much a thing I'd like to have for Development builds. |
|
Thanks for the detailed review. TL;DR
I want to be clear that I am not arguing legalization should disappear, or that the current default should be changed casually in a mature backend (and I realize my current PR branch still reads more like a default-behavior change than an opt-in experiment, which is exactly what I am trying to discuss here). I also understand that changes in a legacy backend are often evaluated under real release and maintenance pressure. That is exactly why I am more interested in reshaping this into an explicit opt-in DXC / SPIR-V rollout path than in trying to force an immediate default change. Proposed RolloutThat narrower rollout path now exists concretely as the explicit That remains consistent with the current DXC upstream model, and the current implementation now makes that rollout explicit through
Legendflowchart LR
A["master_source_off\nNabla e11b118d\n2026-03-26"] --> B["devshfixes_upstream\nNabla c13c3366\n2026-03-28"] --> C["unroll_artifact\nNabla 262a8b72\n2026-03-26"] --> D["unroll_v2\nO1experimental\n2026-03-29"]
This compares four checkpoints:
Runtime ProbeThe current up-to-date follow-up is Against
Full tables, captures, runnable bundles, and the detailed report are here: Cookbook Example
I agree with the narrow legality point on the cookbook example. That blocker is now addressed on the PR line: the explicit The causal chain there also makes sense to me. If that loop stays as a loop and the body still carries a dynamic choice between resource pointers, the module can retain a dynamic pointer-selection path across control flow. In that exact situation the SPIR-V variable-pointer rules become the relevant legality constraint. One known way to break that chain is to fully unroll the loop, let constant propagation resolve the branch per iteration, and then let the cleanup passes remove the dynamic pointer-selection path entirely. So I agree with the narrow point that this exact example still needs a legality-preserving transform. I also agree that this is probably not just about one isolated test. A blocker like this may well point to a broader class of legality-sensitive SSA and control-flow patterns. Related to that, the current cookbook and visible legalization test coverage still look incomplete to me, which is one reason I would find explicit rationale or concrete examples for the passes in the legalization recipe more useful than trying to infer each pass only from git history. Producer-Side Gating
Current DXC upstream already uses producer-side information to decide whether legalization should run at all. The docs explicitly list the source patterns that require legalization and say that legalization transformations will not run unless those patterns are encountered:
and the emitter computes
So I agree that the frontend cannot safely guess every downstream optimization effect. But current upstream already does coarse producer-side legality gating. The real question seems narrower: how much of the current blanket recipe is actually required to preserve the same legality coverage. Longer term, if a narrower rollout does prove useful, I could also imagine a cleaner design where DXC uses explicit Performance Recipe
That seems stronger than what the current SPIRV-Tools API says. The public optimizer API describes these recipes as sequences that are
So I understand That distinction matters to me because the same public API also describes the legalization recipe as something specially designed for legalizing literal HLSL-to-SPIR-V output, and explicitly says it should not be used by general workloads for performance or size improvement. So from the upstream API surface itself, legality and general performance policy are related here, but they are not the same question. HistoryOne other thing I checked more carefully is the upstream history. On the DXC side, SPIRV-Tools legalization was introduced as a conditional stage in 2017, and the current doc wording that says legalization is only run for specific source patterns was added in 2018: On the SPIRV-Tools side, the public optimizer API has described these recipes as mutable policy since 2017, while blanket unrolling in the legalization and performance recipes was added later in separate commits in 2018 and 2020: That history does not prove my branch is correct by itself. But it does make me hesitant to treat the current blanket breadth as if it were an original fixed contract rather than a later widening of the pipeline. It also makes me think it would be very helpful to have a more explicit rationale or example coverage for the passes that currently sit in the legalization recipe, because from the outside it is hard to tell which ones are there for one known blocker and which ones are there because the recipe simply accumulated over time. Runtime Impact
I also think this discussion should stay narrow for one more reason. Even on current upstream the unroller is not actually blind. The pass is registered in the recipe, but it only applies when the loop has
So even on current upstream there is already a difference between
That matters because the public SPIR-V spec also describes So from the current upstream code and spec, what I think is established is simply this: legalization is genuinely needed in some cases, DXC upstream already uses producer-side knowledge to decide when it runs, and SPIRV-Tools currently uses a broad unroll/cleanup recipe that should be treated as policy rather than as an immutable rule. Because of that, I think the remaining disagreement is not I also agree that runtime driver impact is the right next question. That part needs data. I am currently checking this on one path tracer workload on one GPU on my side. I also agree that, for a default rollout, even one real regression case may be enough to reject it, which is another reason I am framing this as an opt-in or separate-profile discussion first. |
Summary
-O1experimentalSPIR-V profile-O1experimentalthrough the companion fast compile APIs inSPIRV-Toolsexternal/SPIRV-Toolsto the companion branch stateOpSpecConstantCompositeformCodeGenSPIRVchecks to the accepted legal SPIR-V shapesWhy these DXC changes are needed
The recipe split lives in the companion
SPIRV-Toolsbranch.DXC is still the producer that knows when its own lowering forms require targeted legalization work for correctness. This branch therefore keeps two narrow producer-side signals:
ConstOffsetWithout those signals the optimizer would have to guess globally, which is exactly the blanket work the companion branch removes from the default path.
This branch also fixes one separate producer-side correctness issue. A vector splat of specialization constants must form
OpSpecConstantComposite, notOpConstantComposite.-O1experimentalis explicit opt-in behavior. The default SPIR-V option paths continue to call the existing optimizer entry points. Only-O1experimentaluses the new fast compile entry points.Spec basis
The image-offset case follows the SPIR-V image operand rules:
Spec:
Opaque image and sampled-image legality is also explicit:
Spec:
The involved types are opaque by definition:
Spec:
The retained pointer path in the explicit-unroll case is also legal SPIR-V:
Spec:
The specialization-constant composite fix follows the constant rules directly:
Spec:
The companion branch also keeps the
VariablePointerscontract explicit where the legalizing unroll path can retain pointer-typed control-flow values.Validation
CodeGenSPIRVvalidation on the paired DXC /SPIRV-Toolsbranch state passed with1438expected passes,2expected failures, and0unexpected failuresPhysicalStorageBufferpointer wrappersUpdated test notes
vk.spec-constant.reuse.hlsl: the old check expectedOpConstantCompositefrom a bool spec-constant splat. The legal form isOpSpecConstantComposite, and the check now matches that.15-loop-var-unroll-ok.hlsl: the optimized path now keeps a legal pointerOpSelectshape instead of depending on blanket eager full unroll in the generic optimizer path.cast.flat-conversion.matrix.hlsl: the optimized path rebuilds the target value through legal composite extracts and constructs instead of preserving one older constant-composite shape.type.rwstructured-buffer.array.counter.indirect.hlslandtype.rwstructured-buffer.array.counter.indirect2.hlsl: the checks now match the accepted legal access-chain shape without overconstraining intermediate ids.vk.binding.global-struct-of-resources.optimized.hlsl: the check now matches the stable semantic parts of the emitted resource layout instead of one transient id pattern.max_id.hlsl: the check no longer hardcodes one exactBoundvalue after the accepted optimizer recipe changes.Companion
SPIRV-ToolsPR: