Signal when SPIR-V legalization needs targeted cleanup by AnastaZIuk · Pull Request #8283 · microsoft/DirectXShaderCompiler

AnastaZIuk · 2026-03-20T18:12:56Z

Summary

add the opt-in -O1experimental SPIR-V profile
keep the default SPIR-V optimization and legalization entry points unchanged
route only -O1experimental through the companion fast compile APIs in SPIRV-Tools
keep producer-side signals for the narrow lowering cases that still need legalization-time loop unroll or SSA rewrite cleanup for correctness
update external/SPIRV-Tools to the companion branch state, including the isolated VariablePointers cleanup for the fast path
fix specialization-constant vector splats so they emit the legal OpSpecConstantComposite form
update CodeGenSPIRV checks to the accepted legal SPIR-V shapes

Why these DXC changes are needed

The recipe split lives in the companion SPIRV-Tools branch.

DXC is still the producer that knows when its own lowering forms require targeted legalization work for correctness. This branch therefore keeps two narrow producer-side signals:

legalization-time loop unroll for lowering paths that must end in constant image operands such as ConstOffset
legalization-time SSA rewrite cleanup for lowering paths that materialize opaque or resource-like intermediates during lowering

Without those signals the optimizer would have to guess globally, which is exactly the blanket work the companion branch removes from the default path.

The companion SPIRV-Tools branch now also keeps one isolated -O1experimental follow-up: trim stale VariablePointers / VariablePointersStorageBuffer declarations only at the end of the fast path when the final module no longer needs them. That fixes the EX37 sampler regression without changing the shared default trim path.

This branch also fixes one separate producer-side correctness issue. A vector splat of specialization constants must form OpSpecConstantComposite, not OpConstantComposite.

-O1experimental is explicit opt-in behavior. The default SPIR-V option paths continue to call the existing optimizer entry points. Only -O1experimental uses the new fast compile entry points.

Spec basis

The image-offset case follows the SPIR-V image operand rules:

ConstOffset ... It must be an <id> of a constant instruction

Spec:

https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#Image_Operands

Opaque image and sampled-image legality is also explicit:

Image, sampler, and sampled image objects must not appear as operands to OpPhi instructions, or OpSelect instructions, or any instructions other than the image or sampler instructions specified to operate on them.

All OpSampledImage instructions must be in the same block in which their Result <id> are consumed.

Spec:

https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_universal_validation_rules

The involved types are opaque by definition:

This type is opaque: values of this type have no defined physical size or bit pattern.

Spec:

The retained pointer path in the explicit-unroll case is also legal SPIR-V:

Before version 1.4, Result Type must be a pointer, scalar, or vector.

The types of Object 1 and Object 2 must be the same as Result Type.

Spec:

https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpSelect

The specialization-constant composite fix follows the constant rules directly:

The Constituents must all be <id>s of non-specialization constant-instruction declarations or an OpUndef.

The Constituents must be the <id> of other specialization constants, constant declarations, or an OpUndef.

Spec:

The companion branch also keeps the VariablePointers contract explicit where the legalizing unroll path can retain pointer-typed control-flow values.

Validation

fresh local CodeGenSPIRV validation on the paired DXC / SPIRV-Tools branch state passed with 1438 expected passes, 2 expected failures, and 0 unexpected failures
additional local end-to-end validation covered HLSL shaders with storage image arrays, image atomics, and PhysicalStorageBuffer pointer wrappers

Updated test notes

vk.spec-constant.reuse.hlsl: the old check expected OpConstantComposite from a bool spec-constant splat. The legal form is OpSpecConstantComposite, and the check now matches that.
15-loop-var-unroll-ok.hlsl: the optimized path now keeps a legal pointer OpSelect shape instead of depending on blanket eager full unroll in the generic optimizer path.
cast.flat-conversion.matrix.hlsl: the optimized path rebuilds the target value through legal composite extracts and constructs instead of preserving one older constant-composite shape.
type.rwstructured-buffer.array.counter.indirect.hlsl and type.rwstructured-buffer.array.counter.indirect2.hlsl: the checks now match the accepted legal access-chain shape without overconstraining intermediate ids.
vk.binding.global-struct-of-resources.optimized.hlsl: the check now matches the stable semantic parts of the emitted resource layout instead of one transient id pattern.
max_id.hlsl: the check no longer hardcodes one exact Bound value after the accepted optimizer recipe changes.

Companion SPIRV-Tools PR:

Narrow blanket SPIR-V legalization work in optimizer recipes KhronosGroup/SPIRV-Tools#6612

github-actions · 2026-03-20T18:18:01Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff 0f17d5a2dc46c8c6bacf32f6fca67178ee5314df 36527954ae5ee4df11e02deeb4e914267f402c86 -- include/dxc/Support/SPIRVOptions.h lib/DxcSupport/HLSLOptions.cpp tools/clang/lib/Frontend/Rewrite/RewriteObjC.cpp tools/clang/lib/SPIRV/SpirvEmitter.cpp tools/clang/lib/SPIRV/SpirvEmitter.h

View the diff from clang-format here.

diff --git a/lib/DxcSupport/HLSLOptions.cpp b/lib/DxcSupport/HLSLOptions.cpp
index 0c43b9ff..bda90be7 100644
--- a/lib/DxcSupport/HLSLOptions.cpp
+++ b/lib/DxcSupport/HLSLOptions.cpp
@@ -792,9 +792,8 @@ int ReadDxcOpts(const OptTable *optionTable, unsigned flagsToInclude,
   }
 
   opts.DisableOptimizations = false;
-  if (Arg *A =
-          Args.getLastArg(OPT_O0, OPT_O1, OPT_O1experimental, OPT_O2, OPT_O3,
-                          OPT_Od)) {
+  if (Arg *A = Args.getLastArg(OPT_O0, OPT_O1, OPT_O1experimental, OPT_O2,
+                               OPT_O3, OPT_Od)) {
     if (A->getOption().matches(OPT_O0))
       opts.OptLevel = 0;
     if (A->getOption().matches(OPT_O1))
@@ -1267,8 +1266,7 @@ int ReadDxcOpts(const OptTable *optionTable, unsigned flagsToInclude,
       errors << "-Oconfig should not be specified more than once";
       return 1;
     }
-    if (Args.getLastArg(OPT_O0, OPT_O1, OPT_O1experimental, OPT_O2,
-                        OPT_O3)) {
+    if (Args.getLastArg(OPT_O0, OPT_O1, OPT_O1experimental, OPT_O2, OPT_O3)) {
       errors << "-Oconfig should not be used together with -O";
       return 1;
     }
diff --git a/tools/clang/lib/SPIRV/SpirvEmitter.cpp b/tools/clang/lib/SPIRV/SpirvEmitter.cpp
index 82b53abb..f569d42a 100644
--- a/tools/clang/lib/SPIRV/SpirvEmitter.cpp
+++ b/tools/clang/lib/SPIRV/SpirvEmitter.cpp
@@ -593,10 +593,9 @@ SpirvEmitter::SpirvEmitter(CompilerInstance &ci)
       constEvaluator(astContext, spvBuilder), entryFunction(nullptr),
       curFunction(nullptr), curThis(nullptr), seenPushConstantAt(),
       isSpecConstantMode(false), needsLegalization(false),
-      needsLegalizationLoopUnroll(false),
-      needsLegalizationSsaRewrite(false),
-      sawExplicitUnrollHint(false),
-      beforeHlslLegalization(false), mainSourceFile(nullptr) {
+      needsLegalizationLoopUnroll(false), needsLegalizationSsaRewrite(false),
+      sawExplicitUnrollHint(false), beforeHlslLegalization(false),
+      mainSourceFile(nullptr) {
 
   // Get ShaderModel from command line hlsl profile option.
   const hlsl::ShaderModel *shaderModel =
@@ -972,8 +971,7 @@ void SpirvEmitter::HandleTranslationUnit(ASTContext &context) {
       !dsetbindingsToCombineImageSampler.empty() ||
       spirvOptions.signaturePacking;
   needsLegalizationSsaRewrite =
-      needsLegalizationSsaRewrite ||
-      !dsetbindingsToCombineImageSampler.empty();
+      needsLegalizationSsaRewrite || !dsetbindingsToCombineImageSampler.empty();
 
   // Run legalization passes
   if (spirvOptions.codeGenHighLevel) {

Check this box to apply formatting changes to this branch.

s-perron

I don't think we will be able to accept these changes. I understand the place this is coming from. Optimization take time, and if we do lots of unrolling it creates lots of extra code that needs to be optimized.

The problem is that we have to generate legal code, and there are cases where these optimizations have proven necessary.

As a simple example, this PR changes the expected behaviour of on of our example from the SPIR-V cookbook.

We can try to work on alternatives, but they have to be done in a way that we can still legalize as many cases as we currently do. If not, we will break someone.

Things that cannot happen:

The front-end is not the place to determine if an optimization can be skipped. It has too little information and cannot account for how other optimizations will change things. For example, can we correctly legalize the code if we limit ssa rewrite to opaque objects only? Not very likely. That pass is needed to enable all sort of things like scalar replacement. It might work in your code, but I can easily come up with cases where it will not.
Unrolling can significantly impact runtime performance, and the impact is often unpredictable. There are some drivers that I have see that do not do much unrolling in their drivers, and having spir-v do the unrolling is important for runtime performance. We cannot change the unrolling pass in the performance passes.

Potential ideas:

We could modify the unrolling in the legalization passes to look for illegal code in the loop. This could leave many of the loops in your code.
You can disable the performance passs, or replace them with your own set of passes to run for the optimization phase. See here.
If you want even more control, you can dump the code before legalization (-fcgl), and call the optimizer yourself with the passes you want.

These are ideas for limiting the unrolling. I don't think there is a way to limit the ssa rewrite pass the way you want. However, I don't think that is a good idea. It could impact too many other passes.

On a side note, do you see any changes in the time it takes drivers to compile the shaders/pipelines if you skip all of the unrolling in sirv-opt?

s-perron · 2026-03-26T15:55:08Z

-// CHECK-NEXT: [[ptr:%[0-9]+]] = OpAccessChain %_ptr_Uniform_S %gRWSBuffer
-// CHECK-NEXT:                OpStore [[ptr]] [[val]]
+// CHECK: OpLoopMerge {{%[0-9]+}} {{%[0-9]+}} Unroll
+// CHECK: [[which:%[0-9]+]] = OpSelect %_ptr_Uniform_type_StructuredBuffer_S {{%[0-9]+}} %gSBuffer1 %gSBuffer2


This is not legal. We have to unroll in this case.

yep, it would only be legal with SPV_KHR_variable_pointers if the gSBuffer pointers were of Shader Storage Class.

However I could go out on a limb and claim that it was never supposed to be legal without [[unroll]] on the loop ( I know the example has unroll so in this case SPIR-V codegen shouldn't change)

devshgraphicsprogramming · 2026-03-26T18:43:39Z

We can try to work on alternatives, but they have to be done in a way that we can still legalize as many cases as we currently do. If not, we will break someone.

Yes my first suggestion was probably do it as a flag like the #7996 PR which also changes previous behaviour, but takes care to avoid changing the old default behaviour.

Unrolling can significantly impact runtime performance, and the impact is often unpredictable. There are some drivers that I have see that do not do much unrolling in their drivers, and having spir-v do the unrolling is important for runtime performance. We cannot change the unrolling pass in the performance passes.

We know that not every driver is Mesa and doesn't optimize fully and are currently running benchmarks for performance regressions.

However I am of the opinion that:

if something is marked up explicitly as [loop] or the equivalent SPIR-V hint it should never unroll even with -O3 or if that breaks legalization
things without [unroll] shouldn't unroll unless: their body codegen is illegal or that SPIR-V optimizer pass has been explicitly called for

TL;DR the performance passes should only be allowed to unroll loops without [loop], but thats a thing to address in SPIR-V Tools.

There's obviously the problem of if-statements, switches, constrant propagation, etc. which can allow turning illegal code legal and they have no attributes to control that.

There's a few examples which in my option are a bit messed up and wouldn't make people cry if a flag disabled them
https://github.com/microsoft/DirectXShaderCompiler/blob/main/docs/SPIRV-Cookbook.rst#multiple-calls-to-a-function

We could modify the unrolling in the legalization passes to look for illegal code in the loop. This could leave many of the loops in your code.

Love the idea, I guess that moves the problem and our impl into SPIR-V Tools ?

The front-end is not the place to determine if an optimization can be skipped. It has too little information and cannot account for how other optimizations will change things. For example, can we correctly legalize the code if we limit ssa rewrite to opaque objects only? Not very likely. That pass is needed to enable all sort of things like scalar replacement. It might work in your code, but I can easily come up with cases where it will not.

Actually if we step back and look at your idea above, an attribute like [[vk::allow_legalization]] and [[vk::disallow_legalization]] plus a flag to control the default for anything unannotated could be a neat design.

If this could percolate to the SPIR-V Spec as an OpDecorate or similar it could be a really powerful hint what to skip analyzing or speculatively unrolling, constant propagate, etc. for legalization within SPIR-V Tools

At the end of the day analyzing whether to unroll or legalize might be more expensive than just unrolling and legalizing, since you can't exaclty and in leess time, know if all the transformations will end up making the code legal before you try them.

My intuition tells me Halting Problem will rear its ugly head at some point if we try this without annotations and some help from the frontend.

DXC is still the producer that knows when a specific lowering path genuinely needs those expensive legalization transforms for correctness, so the optimizer should not have to guess that globally.

@AnastaZIuk is correct here, unrolling for legalization should not kick in without [unroll]

Stuff like

  [unroll]
  for( int j = 0; j < 2; j++ ) {
    if (constant > j) {
      lSBuffer = gSBuffer1;
    } else {
      lSBuffer = gSBuffer2;
    }
    gRWSBuffer[j] = lSBuffer[j];
  }

should legalize fine, but things like

  [loop]
for( int j = 0; j < 2; j++ ) {
  if (constant > j) {
    lSBuffer = gSBuffer1;
  } else {
    lSBuffer = gSBuffer2;
  }
  gRWSBuffer[j] = lSBuffer[j];
}

really shouldn't IMHO

On a side note, do you see any changes in the time it takes drivers to compile the shaders/pipelines if you skip all of the unrolling in spirv-opt?

We'll take our worst shader and let you know.

The main problem for us is hot-reload/iteration time, and majority of the time is spent in SPIR-V Tools. Whereas with a 58k LoC shader with template hell like ours we'd expect to spend majority of the time in Clang's Sema.

Mind you the difference between SPIR-V ResultID upper bound is only 10k vs 16k for the -fcgl - Vd case and -O3, and the SPIR-V is 12k vs 20k opcodes but the compile time difference is an order of magnitude.

We're open for ideas

#s-perron how would you have us solve the problem of "too much SPIR-V spam".

Sidequests

This branch also fixes one separate producer-side correctness bug: vector splat of bool spec constants was being emitted as OpConstantComposite instead of the legal OpSpecConstantComposite.

@s-perron was that done correctly, or is that something you'd want fixed?

s-perron · 2026-03-27T14:53:23Z

The main problem for us is hot-reload/iteration time, and majority of the time is spent in SPIR-V Tools. Whereas with a 58k LoC shader with template hell like ours we'd expect to spend majority of the time in Clang's Sema.

The templates will generally generate more code than you realize, creating more work for the optimizer. I would always expect the optimizer to take the most time.

However I am of the opinion that:

if something is marked up explicitly as [loop] or the equivalent SPIR-V hint it should never unroll even with -O3 or if that breaks legalization
things without [unroll] shouldn't unroll unless: their body codegen is illegal or that SPIR-V optimizer pass has been explicitly called for
TL;DR the performance passes should only be allowed to unroll loops without [loop], but thats a thing to address in SPIR-V Tools.

I agree, and we are even more restrictive then that. SPIR-V tools unrolls only when the user marks the loop as unroll, and the loop can be fully unrolled. This is true for performance and legalization. We are suppose to be doing what you want. If you have examples of loop being unrolled when they are not marked by the user as [unroll], then that is a bug. Something might be copying the loop merge instruction adding or dropping the loop control parameters.

See https://godbolt.org/z/eeMss78MY.

devshgraphicsprogramming · 2026-03-27T21:56:01Z

I agree, and we are even more restrictive then that. SPIR-V tools unrolls only when the user marks the loop as unroll, and the loop can be fully unrolled. This is true for performance and legalization. We are suppose to be doing what you want. If you have examples of loop being unrolled when they are not marked by the user as [unroll], then that is a bug. Something might be copying the loop merge instruction adding or dropping the loop control parameters.

See https://godbolt.org/z/eeMss78MY.

Thanks, I wrote that before I dove into the SPIR-V Tools code, also I was thinking that CanUnroll checked the Loop Control, but in another place in the code there was ...==Unroll && ...->CanUnroll()

The templates will generally generate more code than you realize, creating more work for the optimizer. I would always expect the optimizer to take the most time.

I do reailze there's also a lot of SPIR-V spam from unused functions and such, one pass of dead code elimination does wonders.

Also with Arkadiusz' legalization changes the same code compiles 10x faster (including our Boost.Wave preprocessor doing crazy thigns with BOOST_PP) while only costing 8% runtime performance, so very much a thing I'd like to have for Development builds.

AnastaZIuk · 2026-03-28T23:20:57Z

Thanks for the detailed review.

TL;DR

I agree the cookbook example points to one real legality blocker in my current branch.
My narrower point is that one blocker still does not tell me which parts of the current broad recipe are essential, and which parts are historical overreach.
Current upstream already uses producer-side information to decide when legalization runs.
The concrete solution here is the new explicit -O1experimental flag, which has now been added on the PR line as the narrower opt-in rollout.

I want to be clear that I am not arguing legalization should disappear, or that the current default should be changed casually in a mature backend (and I realize my current PR branch still reads more like a default-behavior change than an opt-in experiment, which is exactly what I am trying to discuss here). I also understand that changes in a legacy backend are often evaluated under real release and maintenance pressure. That is exactly why I am more interested in reshaping this into an explicit opt-in DXC / SPIR-V rollout path than in trying to force an immediate default change.

Proposed Rollout

That narrower rollout path now exists concretely as the explicit -O1experimental flag on the SPIR-V path. The current default behavior stays untouched, while the fast path is enabled only explicitly when that flag is requested.

That remains consistent with the current DXC upstream model, and the current implementation now makes that rollout explicit through -O1experimental rather than through a default-behavior change:

DirectXShaderCompiler/docs/SPIR-V.rst

Lines 521 to 531 in b87d15d

    
           Optimization is also delegated to SPIRV-Tools. Right now there are no difference 
        
           between optimization levels greater than zero; they will all invoke the same 
        
           optimization recipe. That is, the recipe behind ``spirv-opt -O``.  If you want to 
        
           run a custom optimization recipe, you can do so using the command line option 
        
           ``-Oconfig=`` and specifying a comma-separated list of your desired passes. 
        
           The passes are invoked in the specified order. 
        
           For example, you can specify ``-Oconfig=--loop-unroll,--scalar-replacement=300,--eliminate-dead-code-aggressive`` 
        
           to firstly invoke loop unrolling, then invoke scalar replacement of aggregates, 
        
           lastly invoke aggressive dead code elimination. All valid options to 
        
           ``spirv-opt`` are accepted as components to the comma-separated list.

Legend

flowchart LR
  A["master_source_off\nNabla e11b118d\n2026-03-26"] --> B["devshfixes_upstream\nNabla c13c3366\n2026-03-28"] --> C["unroll_artifact\nNabla 262a8b72\n2026-03-26"] --> D["unroll_v2\nO1experimental\n2026-03-29"]

This compares four checkpoints:

master_source_off: current master-side baseline
devshfixes_upstream: the same line after refreshing devshFixes with newer DXC upstream state
unroll_artifact: that refreshed line plus the unroll PR work packaged in the published CI artifact
unroll_v2: the current local follow-up measured with the new -O1experimental flag

Runtime Probe

The current up-to-date follow-up is unroll_v2, measured with the new -O1experimental flag.

Against master_source_off on the same paired Nsight Graphics GPU Trace probe:

steady-state GPU frame ms: 21.4304 -> 19.2360, so unroll_v2 is 10.24% faster

Full tables, captures, runnable bundles, and the detailed report are here:

https://github.com/Devsh-Graphics-Programming/DirectXShaderCompiler/releases/tag/nabla-path-tracer-runtime-compare-20260328

Cookbook Example

This is not legal. We have to unroll in this case.

I agree with the narrow legality point on the cookbook example. That blocker is now addressed on the PR line: the explicit -O1experimental path restores legality for that case, the test has been updated accordingly, and the default upstream paths remain untouched.

The causal chain there also makes sense to me. If that loop stays as a loop and the body still carries a dynamic choice between resource pointers, the module can retain a dynamic pointer-selection path across control flow. In that exact situation the SPIR-V variable-pointer rules become the relevant legality constraint. One known way to break that chain is to fully unroll the loop, let constant propagation resolve the branch per iteration, and then let the cleanup passes remove the dynamic pointer-selection path entirely. So I agree with the narrow point that this exact example still needs a legality-preserving transform.

I also agree that this is probably not just about one isolated test. A blocker like this may well point to a broader class of legality-sensitive SSA and control-flow patterns. Related to that, the current cookbook and visible legalization test coverage still look incomplete to me, which is one reason I would find explicit rationale or concrete examples for the passes in the legalization recipe more useful than trying to infer each pass only from git history.

Producer-Side Gating

The front-end is not the place to determine if an optimization can be skipped.

It has too little information and cannot account for how other optimizations will change things.

Current DXC upstream already uses producer-side information to decide whether legalization should run at all. The docs explicitly list the source patterns that require legalization and say that legalization transformations will not run unless those patterns are encountered:

DirectXShaderCompiler/docs/SPIR-V.rst

Lines 498 to 503 in b87d15d

    
           Resulting from the above property of HLSL, if we translate into SPIR-V for 
        
           Vulkan literally from the input HLSL source code, we will sometimes generate 
        
           illegal SPIR-V. Certain transformations are needed to legalize the literally 
        
           translated SPIR-V. Performing such transformations at the frontend AST level 
        
           is cumbersome or impossible (e.g., function inlining). They are better to be 
        
           conducted at SPIR-V level. Therefore, legalization is delegated to SPIRV-Tools.

DirectXShaderCompiler/docs/SPIR-V.rst

Lines 505 to 512 in b87d15d

    
           Specifically, we need to legalize the following HLSL source code patterns: 
        
           * Using resource types in struct types 
        
           * Creating aliases of global resource objects 
        
           * Control flows invovling the above cases 
        
           Legalization transformations will not run unless the above patterns are 
        
           encountered in the source code.

and the emitter computes needsLegalization before calling into SPIRV-Tools:

DirectXShaderCompiler/tools/clang/lib/SPIRV/SpirvEmitter.cpp

Lines 951 to 956 in b87d15d

    
           needsLegalization = 
        
               needsLegalization || declIdMapper.requiresLegalization() || 
        
               spirvOptions.flattenResourceArrays || spirvOptions.reduceLoadSize || 
        
               declIdMapper.requiresFlatteningCompositeResources() || 
        
               !dsetbindingsToCombineImageSampler.empty() || 
        
               spirvOptions.signaturePacking;

DirectXShaderCompiler/tools/clang/lib/SPIRV/SpirvEmitter.cpp

Lines 962 to 965 in b87d15d

    
           if (needsLegalization) { 
        
             std::string messages; 
        
             if (!spirvToolsLegalize(&m, &messages, 
        
                                     &dsetbindingsToCombineImageSampler)) {

So I agree that the frontend cannot safely guess every downstream optimization effect. But current upstream already does coarse producer-side legality gating. The real question seems narrower: how much of the current blanket recipe is actually required to preserve the same legality coverage.

Longer term, if a narrower rollout does prove useful, I could also imagine a cleaner design where DXC uses explicit OpDecorate-style legality markup for the suspicious regions or expressions, potentially even by kind. That could also make the validator's before_hlsl_legalization mode tighter. SPIRV-Tools could then first run a legalization-markup propagation pass and scope the expensive legalization work to the affected regions instead of the whole module. That would also address the concern that today the recipe is enabled at module granularity even when only one entry point or one localized region may need the expensive part of the cleanup. I am not proposing that as a prerequisite for this PR, only as a possible direction if the current coarse gating still feels too blunt.

Performance Recipe

We cannot change the unrolling pass in the performance passes.

That seems stronger than what the current SPIRV-Tools API says. The public optimizer API describes these recipes as sequences that are subject to constant review and will change from time to time:

So I understand we do not want to change this lightly. I do not think current upstream supports the stronger cannot.

That distinction matters to me because the same public API also describes the legalization recipe as something specially designed for legalizing literal HLSL-to-SPIR-V output, and explicitly says it should not be used by general workloads for performance or size improvement. So from the upstream API surface itself, legality and general performance policy are related here, but they are not the same question.

History

One other thing I checked more carefully is the upstream history. On the DXC side, SPIRV-Tools legalization was introduced as a conditional stage in 2017, and the current doc wording that says legalization is only run for specific source patterns was added in 2018:

On the SPIRV-Tools side, the public optimizer API has described these recipes as mutable policy since 2017, while blanket unrolling in the legalization and performance recipes was added later in separate commits in 2018 and 2020:

That history does not prove my branch is correct by itself. But it does make me hesitant to treat the current blanket breadth as if it were an original fixed contract rather than a later widening of the pipeline. It also makes me think it would be very helpful to have a more explicit rationale or example coverage for the passes that currently sit in the legalization recipe, because from the outside it is hard to tell which ones are there for one known blocker and which ones are there because the recipe simply accumulated over time.

Runtime Impact

Unrolling can significantly impact runtime performance, and the impact is often unpredictable.

There are some drivers that I have see that do not do much unrolling in their drivers, and having spir-v do the unrolling is important for runtime performance.

I also think this discussion should stay narrow for one more reason. Even on current upstream the unroller is not actually blind. The pass is registered in the recipe, but it only applies when the loop has Unroll loop control and when CanPerformUnroll() succeeds:

So even on current upstream there is already a difference between the pass is part of the recipe and every loop is materialized unconditionally. The current implementation still checks loop intent and structural preconditions before it actually performs the transform. Current DXC upstream also already preserves the source-side distinction between [loop] / [fastopt] and [unroll]:

DirectXShaderCompiler/tools/clang/lib/SPIRV/SpirvEmitter.cpp

Lines 2283 to 2287 in b87d15d

    
           case attr::HLSLLoop: 
        
           case attr::HLSLFastOpt: 
        
             return spv::LoopControlMask::DontUnroll; 
        
           case attr::HLSLUnroll: 
        
             return spv::LoopControlMask::Unroll;

DirectXShaderCompiler/tools/clang/test/CodeGenSPIRV/cf.for.nested.hlsl

Lines 14 to 39 in b87d15d

    
           // CHECK-NEXT: OpLoopMerge %for_merge_1 %for_continue_1 Unroll 
        
           // CHECK-NEXT: OpBranchConditional [[lt0]] %for_body %for_merge_1 
        
               [unroll] for (int i = 0; i < 10; ++i) { 
        
           // CHECK-LABEL: %for_body = OpLabel 
        
           // CHECK-NEXT: [[val0:%[0-9]+]] = OpLoad %int %val 
        
           // CHECK-NEXT: [[i1:%[0-9]+]] = OpLoad %int %i 
        
           // CHECK-NEXT: [[add0:%[0-9]+]] = OpIAdd %int [[val0]] [[i1]] 
        
           // CHECK-NEXT: OpStore %val [[add0]] 
        
                   val = val + i; 
        
           // CHECK-NEXT: OpStore %j %int_0 
        
           // CHECK-NEXT: OpBranch %for_check_0 
        
           // CHECK-LABEL: %for_check_0 = OpLabel 
        
           // CHECK-NEXT: [[j0:%[0-9]+]] = OpLoad %int %j 
        
           // CHECK-NEXT: [[lt1:%[0-9]+]] = OpSLessThan %bool [[j0]] %int_10 
        
           // CHECK-NEXT: OpLoopMerge %for_merge_0 %for_continue_0 DontUnroll 
        
           // CHECK-NEXT: OpBranchConditional [[lt1]] %for_body_0 %for_merge_0 
        
                   [loop] for (int j = 0; j < 10; ++j) { 
        
           // CHECK-LABEL: %for_body_0 = OpLabel 
        
           // CHECK-NEXT: OpStore %k %int_0 
        
           // CHECK-NEXT: OpBranch %for_check_1 
        
           // CHECK-LABEL: %for_check_1 = OpLabel 
        
           // CHECK-NEXT: [[k0:%[0-9]+]] = OpLoad %int %k 
        
           // CHECK-NEXT: [[lt2:%[0-9]+]] = OpSLessThan %bool [[k0]] %int_10 
        
           // CHECK-NEXT: OpLoopMerge %for_merge %for_continue DontUnroll

That matters because the public SPIR-V spec also describes Unroll and DontUnroll as loop-control performance hints:

https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#Loop_Control

So from the current upstream code and spec, what I think is established is simply this: legalization is genuinely needed in some cases, DXC upstream already uses producer-side knowledge to decide when it runs, and SPIRV-Tools currently uses a broad unroll/cleanup recipe that should be treated as policy rather than as an immutable rule. Because of that, I think the remaining disagreement is not whether legalization matters. It is whether the current blanket breadth of the recipe is really required to preserve the same legality coverage for the broader class of blockers behind examples like the cookbook case.

I also agree that runtime driver impact is the right next question. That part needs data. I am currently checking this on one path tracer workload on one GPU on my side. I also agree that, for a default rollout, even one real regression case may be enough to reject it, which is another reason I am framing this as an opt-in or separate-profile discussion first.

damyanp · 2026-03-31T15:31:57Z

        with:
-          repository: microsoft/DirectXShaderCompiler
-          ref: ${{ github.event.repository.default_branch }}
+          repository: ${{ github.repository }}
+          ref: ${{ github.base_ref }}
          sparse-checkout: |
            utils/git/requirements_formatting.txt
            utils/git/code-format-helper.py
            utils/git/code-format-save-diff.py
          sparse-checkout-cone-mode: false
          path: code-format-tools

      - name: "Listed files"
        env:
          LISTED_FILES: ${{ steps.changed-files.outputs.all_changed_files }}
        run: |
          echo "Formatting files:"
          echo "$LISTED_FILES"

      - name: Install clang-format
        uses: aminya/setup-cpp@v1
        with:
          clangformat: 17.0.1

      - name: Setup Python env
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'
          cache: 'pip'
          cache-dependency-path: 'code-format-tools/utils/git/requirements_formatting.txt'

      - name: Install python dependencies
        run: pip install -r code-format-tools/utils/git/requirements_formatting.txt

      - name: Run code formatter
        id: formatter
        env:
          GITHUB_PR_NUMBER: ${{ github.event.pull_request.number }}
          START_REV: ${{ github.event.pull_request.base.sha }}
          END_REV: ${{ github.event.pull_request.head.sha }}
          CHANGED_FILES: ${{ steps.changed-files.outputs.all_changed_files }}
        run: |
           python code-format-tools/utils/git/code-format-helper.py \
            --token ${{ secrets.GITHUB_TOKEN }} \
            --issue-number $GITHUB_PR_NUMBER \
            --start-rev $START_REV \
            --end-rev $END_REV \
            --changed-files "$CHANGED_FILES"
  apply_diff:
-    if: ${{ github.event_name == 'issue_comment' && github.event.comment.user.login == 'github-actions[bot]' && endsWith(github.event.comment.body, '- [x] Check this box to apply formatting changes to this branch.') }}
+    if: ${{ github.event_name == 'issue_comment' && endsWith(github.event.comment.body, '- [x] Check this box to apply formatting changes to this branch.') }}
    runs-on: ubuntu-latest
    env:
      TMP_DIFF_FILE: /tmp/diff.patch
    permissions:
      pull-requests: write
      contents: write
    steps:
      - uses: actions/github-script@v3
        id: get-pr
        with:
          script: |
            const request = {
              owner: context.repo.owner,
              repo: context.repo.repo,
              pull_number: context.issue.number
            }
            core.info(`Getting PR #${request.pull_number} from ${request.owner}/${request.repo}`)
            try {
              const result = await github.pulls.get(request)
              return result.data
            } catch (err) {
              core.setFailed(`Request failed with error ${err}`)
            } 

      # We need to pull the script from the main branch, so that we ensure
      # we get the latest version of this script.
      - name: Fetch code formatting utils
        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
-          repository: microsoft/DirectXShaderCompiler
-          ref: ${{ github.event.repository.default_branch }}
+          repository: ${{ github.repository }}
+          ref: ${{ github.base_ref }}
          sparse-checkout: |


Please revert these changes

damyanp · 2026-03-31T15:32:04Z

@@ -0,0 +1,186 @@
+name: Devsh - Build & Deploy DXCompiler


Please revert these changes

AnastaZIuk · 2026-03-31T15:56:38Z

ahh sry guys by accident I took our fork branch wrong wrong diff is like 120 LoC, bringing good diff back

Pull in the companion SPIRV-Tools change that adds a dedicated TrimVariablePointersCapabilitiesPass at the end of the O1experimental fast performance recipe. The follow-up fixes a real sampler regression where the final optimized module still carried stale VariablePointers / VariablePointersStorageBuffer capability declarations even though the final IR no longer contained the pointer forms that required them. The failing EX37 shader remained validator-legal and had only scalar OpSelect %float, but removing only those stale capability lines fixed the downstream runtime corruption. Keep the default trim path untouched by keeping the cleanup isolated to the experimental fast path in SPIRV-Tools.

AnastaZIuk · 2026-03-31T16:15:04Z

ok we are back

s-perron · 2026-03-31T19:27:26Z

I agree the cookbook example points to one real legality blocker in my current branch.

My narrower point is that one blocker still does not tell me which parts of the current broad recipe are essential, and which parts are historical overreach.

My point was much larger than just the single example. The legalization passes were curated over time by looking at shaders from graphic engines. If it was added, there was a reason for it. If you want to know what we consider "essential", it is the legalization passes in SPIR-V.

If you reduce the optimizations in any significant way, I can probably come up with an example where something will break. The passes are not just there to fix one illegal code construct. Many of them are there to put the code in a state where it can be optimized by other passes. For example, if you limit the ssa rewriter pass to work only on resource types or "special types", you will miss a lot of the constant folding opportunities that lead to folding entire branches. See this example:

RWBuffer<float> b, c;

static const int i = 15;

int foo() { return i; }

[numthreads(1,1,1)]
void main() {
  RWBuffer<float> t;
  int l = foo();
  if (l > 10)
    l = 10;

  if (l > 5)
    t = b;
  else
    t = c;
  t[0] = 0;
}

I'm not going to keep giving you examples. Suffice it to say, you need all of the legalization passes as is. Except for what I mentioned earlier. We might be able to reduce the amount of unrolling in the legalization passes, but not the performance passes. I could easily see that affecting the performance of the shaders on mobile platforms.

Note that the performance passes that are run after it, were curated by looking at how the code could be made cleaner. Their usefulness will be dependent on the driver you use. If you are using a driver that optimizes heavily or you don't care about the performance of the shader, you can turn those off (-O0).

Current upstream already uses producer-side information to decide when legalization runs.

DXC can sometimes know that code will be illegal, but it does not usually know what needs to be done to fix it. Shaders can be arbitrary complex. Even our current gating of the legalization passes has lead to bugs because we do not run the legalizer when we should. I do not want to double down on that.

The concrete solution here is the new explicit -O1experimental flag, which has now been added on the PR line as the narrower opt-in rollout.

We are not adding these types of new options to DXC at this time in the life cycle of DXC. If we add it there will be some expectation that we will do something similar in LLVM.

VariablePointers tracks pointer values that remain first-class in the final module. A plain OpStore stores through a pointer. It does not by itself prove that the module still needs VariablePointers. The SPIR-V OpStore definition distinguishes Pointer as the location being stored through and Object as the value being stored. The VariablePointers rules separately constrain the case where a pointer is the Object operand of OpStore or the result of OpLoad. Ordinary stores of non-pointer objects through StorageBuffer pointers do not require keeping VariablePointers alive. This updates the DXC-side SPIRV-Tools pointer to the SPIRV-Tools change that stops treating every OpStore as a VariablePointers requirement. Spec references: https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpStore https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#VariablePointers

github-project-automation Bot added this to HLSL Roadmap Mar 20, 2026

github-project-automation Bot moved this to New in HLSL Roadmap Mar 20, 2026

AnastaZIuk mentioned this pull request Mar 20, 2026

Narrow blanket SPIR-V legalization work in optimizer recipes KhronosGroup/SPIRV-Tools#6612

Open

AnastaZIuk marked this pull request as ready for review March 20, 2026 18:18

AnastaZIuk changed the title ~~Signal when SPIR-V legalization needs loop unroll~~ Signal when SPIR-V legalization needs targeted cleanup Mar 20, 2026

AnastaZIuk mentioned this pull request Mar 21, 2026

Narrow blanket SPIR-V loop unroll in optimizer recipes Devsh-Graphics-Programming/SPIRV-Tools#2

Closed

Signal when SPIR-V legalization needs targeted cleanup

66e975a

AnastaZIuk force-pushed the unroll branch from a3904f6 to 66e975a Compare March 22, 2026 07:12

Update SPIR-V submodule pointers

9ac5798

s-perron requested changes Mar 26, 2026

View reviewed changes

github-project-automation Bot moved this from New to In progress in HLSL Roadmap Mar 26, 2026

Add O1experimental SPIR-V fast compile mode

5f9c364

AnastaZIuk added 3 commits March 29, 2026 00:41

Update SPIRV-Tools

bd2ec9c

Route O1experimental through fast compile APIs

4dd86f9

Update SPIRV-Tools for O1experimental cleanup

899afef

AnastaZIuk force-pushed the unroll branch 2 times, most recently from 899afef to 73a31a4 Compare March 31, 2026 09:58

damyanp reviewed Mar 31, 2026

View reviewed changes

AnastaZIuk force-pushed the unroll branch from 73a31a4 to ea6bd94 Compare March 31, 2026 16:12

Conversation

AnastaZIuk commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why these DXC changes are needed

Spec basis

Validation

Updated test notes

Uh oh!

github-actions Bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

s-perron left a comment

Choose a reason for hiding this comment

Uh oh!

s-perron Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

devshgraphicsprogramming Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devshgraphicsprogramming commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

We're open for ideas

Sidequests

Uh oh!

s-perron commented Mar 27, 2026

Uh oh!

devshgraphicsprogramming commented Mar 27, 2026

Uh oh!

AnastaZIuk commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

Proposed Rollout

Legend

Runtime Probe

Cookbook Example

Producer-Side Gating

Performance Recipe

History

Runtime Impact

Uh oh!

damyanp Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

AnastaZIuk Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

damyanp Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

AnastaZIuk Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

AnastaZIuk commented Mar 31, 2026

Uh oh!

AnastaZIuk commented Mar 31, 2026

Uh oh!

s-perron commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

AnastaZIuk commented Mar 20, 2026 •

edited

Loading

github-actions Bot commented Mar 20, 2026 •

edited

Loading

devshgraphicsprogramming Mar 26, 2026 •

edited

Loading

devshgraphicsprogramming commented Mar 26, 2026 •

edited

Loading

AnastaZIuk commented Mar 28, 2026 •

edited

Loading