Skip to content

Signal when SPIR-V legalization needs targeted cleanup#8283

Open
AnastaZIuk wants to merge 6 commits intomicrosoft:mainfrom
Devsh-Graphics-Programming:unroll
Open

Signal when SPIR-V legalization needs targeted cleanup#8283
AnastaZIuk wants to merge 6 commits intomicrosoft:mainfrom
Devsh-Graphics-Programming:unroll

Conversation

@AnastaZIuk
Copy link
Copy Markdown
Contributor

@AnastaZIuk AnastaZIuk commented Mar 20, 2026

Summary

  • add the opt-in -O1experimental SPIR-V profile
  • keep the default SPIR-V optimization and legalization entry points unchanged
  • route only -O1experimental through the companion fast compile APIs in SPIRV-Tools
  • keep producer-side signals for the narrow lowering cases that still need legalization-time loop unroll or SSA rewrite cleanup for correctness
  • update external/SPIRV-Tools to the companion branch state
  • fix specialization-constant vector splats so they emit the legal OpSpecConstantComposite form
  • update CodeGenSPIRV checks to the accepted legal SPIR-V shapes

Why these DXC changes are needed

The recipe split lives in the companion SPIRV-Tools branch.

DXC is still the producer that knows when its own lowering forms require targeted legalization work for correctness. This branch therefore keeps two narrow producer-side signals:

  • legalization-time loop unroll for lowering paths that must end in constant image operands such as ConstOffset
  • legalization-time SSA rewrite cleanup for lowering paths that materialize opaque or resource-like intermediates during lowering

Without those signals the optimizer would have to guess globally, which is exactly the blanket work the companion branch removes from the default path.

This branch also fixes one separate producer-side correctness issue. A vector splat of specialization constants must form OpSpecConstantComposite, not OpConstantComposite.

-O1experimental is explicit opt-in behavior. The default SPIR-V option paths continue to call the existing optimizer entry points. Only -O1experimental uses the new fast compile entry points.

Spec basis

The image-offset case follows the SPIR-V image operand rules:

ConstOffset ... It must be an <id> of a constant instruction

Spec:

Opaque image and sampled-image legality is also explicit:

Image, sampler, and sampled image objects must not appear as operands to OpPhi instructions, or OpSelect instructions, or any instructions other than the image or sampler instructions specified to operate on them.

All OpSampledImage instructions must be in the same block in which their Result <id> are consumed.

Spec:

The involved types are opaque by definition:

This type is opaque: values of this type have no defined physical size or bit pattern.

Spec:

The retained pointer path in the explicit-unroll case is also legal SPIR-V:

Before version 1.4, Result Type must be a pointer, scalar, or vector.

The types of Object 1 and Object 2 must be the same as Result Type.

Spec:

The specialization-constant composite fix follows the constant rules directly:

The Constituents must all be <id>s of non-specialization constant-instruction declarations or an OpUndef.

The Constituents must be the <id> of other specialization constants, constant declarations, or an OpUndef.

Spec:

The companion branch also keeps the VariablePointers contract explicit where the legalizing unroll path can retain pointer-typed control-flow values.

Validation

  • fresh local CodeGenSPIRV validation on the paired DXC / SPIRV-Tools branch state passed with 1438 expected passes, 2 expected failures, and 0 unexpected failures
  • additional local end-to-end validation covered HLSL shaders with storage image arrays, image atomics, and PhysicalStorageBuffer pointer wrappers

Updated test notes

  • vk.spec-constant.reuse.hlsl: the old check expected OpConstantComposite from a bool spec-constant splat. The legal form is OpSpecConstantComposite, and the check now matches that.
  • 15-loop-var-unroll-ok.hlsl: the optimized path now keeps a legal pointer OpSelect shape instead of depending on blanket eager full unroll in the generic optimizer path.
  • cast.flat-conversion.matrix.hlsl: the optimized path rebuilds the target value through legal composite extracts and constructs instead of preserving one older constant-composite shape.
  • type.rwstructured-buffer.array.counter.indirect.hlsl and type.rwstructured-buffer.array.counter.indirect2.hlsl: the checks now match the accepted legal access-chain shape without overconstraining intermediate ids.
  • vk.binding.global-struct-of-resources.optimized.hlsl: the check now matches the stable semantic parts of the emitted resource layout instead of one transient id pattern.
  • max_id.hlsl: the check no longer hardcodes one exact Bound value after the accepted optimizer recipe changes.

Companion SPIRV-Tools PR:

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 20, 2026

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff bb168c9647beb3d80c25926dd8ac3aab864dd134 899afef2a8665fa9c861b6b7be8562bf7efb850e -- include/dxc/Support/SPIRVOptions.h lib/DxcSupport/HLSLOptions.cpp tools/clang/lib/Frontend/Rewrite/RewriteObjC.cpp tools/clang/lib/SPIRV/SpirvEmitter.cpp tools/clang/lib/SPIRV/SpirvEmitter.h
View the diff from clang-format here.
diff --git a/lib/DxcSupport/HLSLOptions.cpp b/lib/DxcSupport/HLSLOptions.cpp
index 0c43b9ff..bda90be7 100644
--- a/lib/DxcSupport/HLSLOptions.cpp
+++ b/lib/DxcSupport/HLSLOptions.cpp
@@ -792,9 +792,8 @@ int ReadDxcOpts(const OptTable *optionTable, unsigned flagsToInclude,
   }
 
   opts.DisableOptimizations = false;
-  if (Arg *A =
-          Args.getLastArg(OPT_O0, OPT_O1, OPT_O1experimental, OPT_O2, OPT_O3,
-                          OPT_Od)) {
+  if (Arg *A = Args.getLastArg(OPT_O0, OPT_O1, OPT_O1experimental, OPT_O2,
+                               OPT_O3, OPT_Od)) {
     if (A->getOption().matches(OPT_O0))
       opts.OptLevel = 0;
     if (A->getOption().matches(OPT_O1))
@@ -1267,8 +1266,7 @@ int ReadDxcOpts(const OptTable *optionTable, unsigned flagsToInclude,
       errors << "-Oconfig should not be specified more than once";
       return 1;
     }
-    if (Args.getLastArg(OPT_O0, OPT_O1, OPT_O1experimental, OPT_O2,
-                        OPT_O3)) {
+    if (Args.getLastArg(OPT_O0, OPT_O1, OPT_O1experimental, OPT_O2, OPT_O3)) {
       errors << "-Oconfig should not be used together with -O";
       return 1;
     }
diff --git a/tools/clang/lib/SPIRV/SpirvEmitter.cpp b/tools/clang/lib/SPIRV/SpirvEmitter.cpp
index 82b53abb..f569d42a 100644
--- a/tools/clang/lib/SPIRV/SpirvEmitter.cpp
+++ b/tools/clang/lib/SPIRV/SpirvEmitter.cpp
@@ -593,10 +593,9 @@ SpirvEmitter::SpirvEmitter(CompilerInstance &ci)
       constEvaluator(astContext, spvBuilder), entryFunction(nullptr),
       curFunction(nullptr), curThis(nullptr), seenPushConstantAt(),
       isSpecConstantMode(false), needsLegalization(false),
-      needsLegalizationLoopUnroll(false),
-      needsLegalizationSsaRewrite(false),
-      sawExplicitUnrollHint(false),
-      beforeHlslLegalization(false), mainSourceFile(nullptr) {
+      needsLegalizationLoopUnroll(false), needsLegalizationSsaRewrite(false),
+      sawExplicitUnrollHint(false), beforeHlslLegalization(false),
+      mainSourceFile(nullptr) {
 
   // Get ShaderModel from command line hlsl profile option.
   const hlsl::ShaderModel *shaderModel =
@@ -972,8 +971,7 @@ void SpirvEmitter::HandleTranslationUnit(ASTContext &context) {
       !dsetbindingsToCombineImageSampler.empty() ||
       spirvOptions.signaturePacking;
   needsLegalizationSsaRewrite =
-      needsLegalizationSsaRewrite ||
-      !dsetbindingsToCombineImageSampler.empty();
+      needsLegalizationSsaRewrite || !dsetbindingsToCombineImageSampler.empty();
 
   // Run legalization passes
   if (spirvOptions.codeGenHighLevel) {
  • Check this box to apply formatting changes to this branch.

@AnastaZIuk AnastaZIuk marked this pull request as ready for review March 20, 2026 18:18
@AnastaZIuk AnastaZIuk changed the title Signal when SPIR-V legalization needs loop unroll Signal when SPIR-V legalization needs targeted cleanup Mar 20, 2026
Copy link
Copy Markdown
Collaborator

@s-perron s-perron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we will be able to accept these changes. I understand the place this is coming from. Optimization take time, and if we do lots of unrolling it creates lots of extra code that needs to be optimized.

The problem is that we have to generate legal code, and there are cases where these optimizations have proven necessary.

As a simple example, this PR changes the expected behaviour of on of our example from the SPIR-V cookbook.

We can try to work on alternatives, but they have to be done in a way that we can still legalize as many cases as we currently do. If not, we will break someone.

Things that cannot happen:

  1. The front-end is not the place to determine if an optimization can be skipped. It has too little information and cannot account for how other optimizations will change things. For example, can we correctly legalize the code if we limit ssa rewrite to opaque objects only? Not very likely. That pass is needed to enable all sort of things like scalar replacement. It might work in your code, but I can easily come up with cases where it will not.

  2. Unrolling can significantly impact runtime performance, and the impact is often unpredictable. There are some drivers that I have see that do not do much unrolling in their drivers, and having spir-v do the unrolling is important for runtime performance. We cannot change the unrolling pass in the performance passes.

Potential ideas:

  1. We could modify the unrolling in the legalization passes to look for illegal code in the loop. This could leave many of the loops in your code.
  2. You can disable the performance passs, or replace them with your own set of passes to run for the optimization phase. See here.
  3. If you want even more control, you can dump the code before legalization (-fcgl), and call the optimizer yourself with the passes you want.

These are ideas for limiting the unrolling. I don't think there is a way to limit the ssa rewrite pass the way you want. However, I don't think that is a good idea. It could impact too many other passes.

On a side note, do you see any changes in the time it takes drivers to compile the shaders/pipelines if you skip all of the unrolling in sirv-opt?

// CHECK-NEXT: [[ptr:%[0-9]+]] = OpAccessChain %_ptr_Uniform_S %gRWSBuffer
// CHECK-NEXT: OpStore [[ptr]] [[val]]
// CHECK: OpLoopMerge {{%[0-9]+}} {{%[0-9]+}} Unroll
// CHECK: [[which:%[0-9]+]] = OpSelect %_ptr_Uniform_type_StructuredBuffer_S {{%[0-9]+}} %gSBuffer1 %gSBuffer2
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not legal. We have to unroll in this case.

Copy link
Copy Markdown

@devshgraphicsprogramming devshgraphicsprogramming Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, it would only be legal with SPV_KHR_variable_pointers if the gSBuffer pointers were of Shader Storage Class.

However I could go out on a limb and claim that it was never supposed to be legal without [[unroll]] on the loop ( I know the example has unroll so in this case SPIR-V codegen shouldn't change)

@github-project-automation github-project-automation bot moved this from New to In progress in HLSL Roadmap Mar 26, 2026
@devshgraphicsprogramming
Copy link
Copy Markdown

devshgraphicsprogramming commented Mar 26, 2026

We can try to work on alternatives, but they have to be done in a way that we can still legalize as many cases as we currently do. If not, we will break someone.

Yes my first suggestion was probably do it as a flag like the #7996 PR which also changes previous behaviour, but takes care to avoid changing the old default behaviour.

Unrolling can significantly impact runtime performance, and the impact is often unpredictable. There are some drivers that I have see that do not do much unrolling in their drivers, and having spir-v do the unrolling is important for runtime performance. We cannot change the unrolling pass in the performance passes.

We know that not every driver is Mesa and doesn't optimize fully and are currently running benchmarks for performance regressions.

However I am of the opinion that:

  1. if something is marked up explicitly as [loop] or the equivalent SPIR-V hint it should never unroll even with -O3 or if that breaks legalization
  2. things without [unroll] shouldn't unroll unless: their body codegen is illegal or that SPIR-V optimizer pass has been explicitly called for

TL;DR the performance passes should only be allowed to unroll loops without [loop], but thats a thing to address in SPIR-V Tools.

There's obviously the problem of if-statements, switches, constrant propagation, etc. which can allow turning illegal code legal and they have no attributes to control that.

There's a few examples which in my option are a bit messed up and wouldn't make people cry if a flag disabled them
https://github.com/microsoft/DirectXShaderCompiler/blob/main/docs/SPIRV-Cookbook.rst#multiple-calls-to-a-function

We could modify the unrolling in the legalization passes to look for illegal code in the loop. This could leave many of the loops in your code.

Love the idea, I guess that moves the problem and our impl into SPIR-V Tools ?

The front-end is not the place to determine if an optimization can be skipped. It has too little information and cannot account for how other optimizations will change things. For example, can we correctly legalize the code if we limit ssa rewrite to opaque objects only? Not very likely. That pass is needed to enable all sort of things like scalar replacement. It might work in your code, but I can easily come up with cases where it will not.

Actually if we step back and look at your idea above, an attribute like [[vk::allow_legalization]] and [[vk::disallow_legalization]] plus a flag to control the default for anything unannotated could be a neat design.

If this could percolate to the SPIR-V Spec as an OpDecorate or similar it could be a really powerful hint what to skip analyzing or speculatively unrolling, constant propagate, etc. for legalization within SPIR-V Tools

At the end of the day analyzing whether to unroll or legalize might be more expensive than just unrolling and legalizing, since you can't exaclty and in leess time, know if all the transformations will end up making the code legal before you try them.

My intuition tells me Halting Problem will rear its ugly head at some point if we try this without annotations and some help from the frontend.

DXC is still the producer that knows when a specific lowering path genuinely needs those expensive legalization transforms for correctness, so the optimizer should not have to guess that globally.

@AnastaZIuk is correct here, unrolling for legalization should not kick in without [unroll]

Stuff like

  [unroll]
  for( int j = 0; j < 2; j++ ) {
    if (constant > j) {
      lSBuffer = gSBuffer1;
    } else {
      lSBuffer = gSBuffer2;
    }
    gRWSBuffer[j] = lSBuffer[j];
  }

should legalize fine, but things like

  [loop]
for( int j = 0; j < 2; j++ ) {
  if (constant > j) {
    lSBuffer = gSBuffer1;
  } else {
    lSBuffer = gSBuffer2;
  }
  gRWSBuffer[j] = lSBuffer[j];
}

really shouldn't IMHO

On a side note, do you see any changes in the time it takes drivers to compile the shaders/pipelines if you skip all of the unrolling in spirv-opt?

We'll take our worst shader and let you know.

The main problem for us is hot-reload/iteration time, and majority of the time is spent in SPIR-V Tools. Whereas with a 58k LoC shader with template hell like ours we'd expect to spend majority of the time in Clang's Sema.

Mind you the difference between SPIR-V ResultID upper bound is only 10k vs 16k for the -fcgl - Vd case and -O3, and the SPIR-V is 12k vs 20k opcodes but the compile time difference is an order of magnitude.

We're open for ideas

#s-perron how would you have us solve the problem of "too much SPIR-V spam".

Sidequests

This branch also fixes one separate producer-side correctness bug: vector splat of bool spec constants was being emitted as OpConstantComposite instead of the legal OpSpecConstantComposite.

@s-perron was that done correctly, or is that something you'd want fixed?

@s-perron
Copy link
Copy Markdown
Collaborator

The main problem for us is hot-reload/iteration time, and majority of the time is spent in SPIR-V Tools. Whereas with a 58k LoC shader with template hell like ours we'd expect to spend majority of the time in Clang's Sema.

The templates will generally generate more code than you realize, creating more work for the optimizer. I would always expect the optimizer to take the most time.

However I am of the opinion that:

if something is marked up explicitly as [loop] or the equivalent SPIR-V hint it should never unroll even with -O3 or if that breaks legalization
things without [unroll] shouldn't unroll unless: their body codegen is illegal or that SPIR-V optimizer pass has been explicitly called for
TL;DR the performance passes should only be allowed to unroll loops without [loop], but thats a thing to address in SPIR-V Tools.

I agree, and we are even more restrictive then that. SPIR-V tools unrolls only when the user marks the loop as unroll, and the loop can be fully unrolled. This is true for performance and legalization. We are suppose to be doing what you want. If you have examples of loop being unrolled when they are not marked by the user as [unroll], then that is a bug. Something might be copying the loop merge instruction adding or dropping the loop control parameters.

See https://godbolt.org/z/eeMss78MY.

@devshgraphicsprogramming
Copy link
Copy Markdown

I agree, and we are even more restrictive then that. SPIR-V tools unrolls only when the user marks the loop as unroll, and the loop can be fully unrolled. This is true for performance and legalization. We are suppose to be doing what you want. If you have examples of loop being unrolled when they are not marked by the user as [unroll], then that is a bug. Something might be copying the loop merge instruction adding or dropping the loop control parameters.

See https://godbolt.org/z/eeMss78MY.

Thanks, I wrote that before I dove into the SPIR-V Tools code, also I was thinking that CanUnroll checked the Loop Control, but in another place in the code there was ...==Unroll && ...->CanUnroll()

The templates will generally generate more code than you realize, creating more work for the optimizer. I would always expect the optimizer to take the most time.

I do reailze there's also a lot of SPIR-V spam from unused functions and such, one pass of dead code elimination does wonders.

Also with Arkadiusz' legalization changes the same code compiles 10x faster (including our Boost.Wave preprocessor doing crazy thigns with BOOST_PP) while only costing 8% runtime performance, so very much a thing I'd like to have for Development builds.

@AnastaZIuk
Copy link
Copy Markdown
Contributor Author

AnastaZIuk commented Mar 28, 2026

Thanks for the detailed review.

TL;DR

  1. I agree the cookbook example points to one real legality blocker in my current branch.
  2. My narrower point is that one blocker still does not tell me which parts of the current broad recipe are essential, and which parts are historical overreach.
  3. Current upstream already uses producer-side information to decide when legalization runs.
  4. The concrete solution here is the new explicit -O1experimental flag, which has now been added on the PR line as the narrower opt-in rollout.

I want to be clear that I am not arguing legalization should disappear, or that the current default should be changed casually in a mature backend (and I realize my current PR branch still reads more like a default-behavior change than an opt-in experiment, which is exactly what I am trying to discuss here). I also understand that changes in a legacy backend are often evaluated under real release and maintenance pressure. That is exactly why I am more interested in reshaping this into an explicit opt-in DXC / SPIR-V rollout path than in trying to force an immediate default change.

Proposed Rollout

That narrower rollout path now exists concretely as the explicit -O1experimental flag on the SPIR-V path. The current default behavior stays untouched, while the fast path is enabled only explicitly when that flag is requested.

That remains consistent with the current DXC upstream model, and the current implementation now makes that rollout explicit through -O1experimental rather than through a default-behavior change:

  • Optimization is also delegated to SPIRV-Tools. Right now there are no difference
    between optimization levels greater than zero; they will all invoke the same
    optimization recipe. That is, the recipe behind ``spirv-opt -O``. If you want to
    run a custom optimization recipe, you can do so using the command line option
    ``-Oconfig=`` and specifying a comma-separated list of your desired passes.
    The passes are invoked in the specified order.
    For example, you can specify ``-Oconfig=--loop-unroll,--scalar-replacement=300,--eliminate-dead-code-aggressive``
    to firstly invoke loop unrolling, then invoke scalar replacement of aggregates,
    lastly invoke aggressive dead code elimination. All valid options to
    ``spirv-opt`` are accepted as components to the comma-separated list.

Legend

flowchart LR
  A["master_source_off\nNabla e11b118d\n2026-03-26"] --> B["devshfixes_upstream\nNabla c13c3366\n2026-03-28"] --> C["unroll_artifact\nNabla 262a8b72\n2026-03-26"] --> D["unroll_v2\nO1experimental\n2026-03-29"]
Loading

This compares four checkpoints:

  • master_source_off: current master-side baseline
  • devshfixes_upstream: the same line after refreshing devshFixes with newer DXC upstream state
  • unroll_artifact: that refreshed line plus the unroll PR work packaged in the published CI artifact
  • unroll_v2: the current local follow-up measured with the new -O1experimental flag

Runtime Probe

The current up-to-date follow-up is unroll_v2, measured with the new -O1experimental flag.

Against master_source_off on the same paired Nsight Graphics GPU Trace probe:

  • steady-state GPU frame ms: 21.4304 -> 19.2360, so unroll_v2 is 10.24% faster

Full tables, captures, runnable bundles, and the detailed report are here:

Cookbook Example

This is not legal. We have to unroll in this case.

I agree with the narrow legality point on the cookbook example. That blocker is now addressed on the PR line: the explicit -O1experimental path restores legality for that case, the test has been updated accordingly, and the default upstream paths remain untouched.

The causal chain there also makes sense to me. If that loop stays as a loop and the body still carries a dynamic choice between resource pointers, the module can retain a dynamic pointer-selection path across control flow. In that exact situation the SPIR-V variable-pointer rules become the relevant legality constraint. One known way to break that chain is to fully unroll the loop, let constant propagation resolve the branch per iteration, and then let the cleanup passes remove the dynamic pointer-selection path entirely. So I agree with the narrow point that this exact example still needs a legality-preserving transform.

I also agree that this is probably not just about one isolated test. A blocker like this may well point to a broader class of legality-sensitive SSA and control-flow patterns. Related to that, the current cookbook and visible legalization test coverage still look incomplete to me, which is one reason I would find explicit rationale or concrete examples for the passes in the legalization recipe more useful than trying to infer each pass only from git history.

Producer-Side Gating

The front-end is not the place to determine if an optimization can be skipped.

It has too little information and cannot account for how other optimizations will change things.

Current DXC upstream already uses producer-side information to decide whether legalization should run at all. The docs explicitly list the source patterns that require legalization and say that legalization transformations will not run unless those patterns are encountered:

  • Resulting from the above property of HLSL, if we translate into SPIR-V for
    Vulkan literally from the input HLSL source code, we will sometimes generate
    illegal SPIR-V. Certain transformations are needed to legalize the literally
    translated SPIR-V. Performing such transformations at the frontend AST level
    is cumbersome or impossible (e.g., function inlining). They are better to be
    conducted at SPIR-V level. Therefore, legalization is delegated to SPIRV-Tools.
  • Specifically, we need to legalize the following HLSL source code patterns:
    * Using resource types in struct types
    * Creating aliases of global resource objects
    * Control flows invovling the above cases
    Legalization transformations will not run unless the above patterns are
    encountered in the source code.

and the emitter computes needsLegalization before calling into SPIRV-Tools:

So I agree that the frontend cannot safely guess every downstream optimization effect. But current upstream already does coarse producer-side legality gating. The real question seems narrower: how much of the current blanket recipe is actually required to preserve the same legality coverage.

Longer term, if a narrower rollout does prove useful, I could also imagine a cleaner design where DXC uses explicit OpDecorate-style legality markup for the suspicious regions or expressions, potentially even by kind. That could also make the validator's before_hlsl_legalization mode tighter. SPIRV-Tools could then first run a legalization-markup propagation pass and scope the expensive legalization work to the affected regions instead of the whole module. That would also address the concern that today the recipe is enabled at module granularity even when only one entry point or one localized region may need the expensive part of the cleanup. I am not proposing that as a prerequisite for this PR, only as a possible direction if the current coarse gating still feels too blunt.

Performance Recipe

We cannot change the unrolling pass in the performance passes.

That seems stronger than what the current SPIRV-Tools API says. The public optimizer API describes these recipes as sequences that are subject to constant review and will change from time to time:

So I understand we do not want to change this lightly. I do not think current upstream supports the stronger cannot.

That distinction matters to me because the same public API also describes the legalization recipe as something specially designed for legalizing literal HLSL-to-SPIR-V output, and explicitly says it should not be used by general workloads for performance or size improvement. So from the upstream API surface itself, legality and general performance policy are related here, but they are not the same question.

History

One other thing I checked more carefully is the upstream history. On the DXC side, SPIRV-Tools legalization was introduced as a conditional stage in 2017, and the current doc wording that says legalization is only run for specific source patterns was added in 2018:

On the SPIRV-Tools side, the public optimizer API has described these recipes as mutable policy since 2017, while blanket unrolling in the legalization and performance recipes was added later in separate commits in 2018 and 2020:

That history does not prove my branch is correct by itself. But it does make me hesitant to treat the current blanket breadth as if it were an original fixed contract rather than a later widening of the pipeline. It also makes me think it would be very helpful to have a more explicit rationale or example coverage for the passes that currently sit in the legalization recipe, because from the outside it is hard to tell which ones are there for one known blocker and which ones are there because the recipe simply accumulated over time.

Runtime Impact

Unrolling can significantly impact runtime performance, and the impact is often unpredictable.

There are some drivers that I have see that do not do much unrolling in their drivers, and having spir-v do the unrolling is important for runtime performance.

I also think this discussion should stay narrow for one more reason. Even on current upstream the unroller is not actually blind. The pass is registered in the recipe, but it only applies when the loop has Unroll loop control and when CanPerformUnroll() succeeds:

So even on current upstream there is already a difference between the pass is part of the recipe and every loop is materialized unconditionally. The current implementation still checks loop intent and structural preconditions before it actually performs the transform. Current DXC upstream also already preserves the source-side distinction between [loop] / [fastopt] and [unroll]:

  • case attr::HLSLLoop:
    case attr::HLSLFastOpt:
    return spv::LoopControlMask::DontUnroll;
    case attr::HLSLUnroll:
    return spv::LoopControlMask::Unroll;
  • // CHECK-NEXT: OpLoopMerge %for_merge_1 %for_continue_1 Unroll
    // CHECK-NEXT: OpBranchConditional [[lt0]] %for_body %for_merge_1
    [unroll] for (int i = 0; i < 10; ++i) {
    // CHECK-LABEL: %for_body = OpLabel
    // CHECK-NEXT: [[val0:%[0-9]+]] = OpLoad %int %val
    // CHECK-NEXT: [[i1:%[0-9]+]] = OpLoad %int %i
    // CHECK-NEXT: [[add0:%[0-9]+]] = OpIAdd %int [[val0]] [[i1]]
    // CHECK-NEXT: OpStore %val [[add0]]
    val = val + i;
    // CHECK-NEXT: OpStore %j %int_0
    // CHECK-NEXT: OpBranch %for_check_0
    // CHECK-LABEL: %for_check_0 = OpLabel
    // CHECK-NEXT: [[j0:%[0-9]+]] = OpLoad %int %j
    // CHECK-NEXT: [[lt1:%[0-9]+]] = OpSLessThan %bool [[j0]] %int_10
    // CHECK-NEXT: OpLoopMerge %for_merge_0 %for_continue_0 DontUnroll
    // CHECK-NEXT: OpBranchConditional [[lt1]] %for_body_0 %for_merge_0
    [loop] for (int j = 0; j < 10; ++j) {
    // CHECK-LABEL: %for_body_0 = OpLabel
    // CHECK-NEXT: OpStore %k %int_0
    // CHECK-NEXT: OpBranch %for_check_1
    // CHECK-LABEL: %for_check_1 = OpLabel
    // CHECK-NEXT: [[k0:%[0-9]+]] = OpLoad %int %k
    // CHECK-NEXT: [[lt2:%[0-9]+]] = OpSLessThan %bool [[k0]] %int_10
    // CHECK-NEXT: OpLoopMerge %for_merge %for_continue DontUnroll

That matters because the public SPIR-V spec also describes Unroll and DontUnroll as loop-control performance hints:

So from the current upstream code and spec, what I think is established is simply this: legalization is genuinely needed in some cases, DXC upstream already uses producer-side knowledge to decide when it runs, and SPIRV-Tools currently uses a broad unroll/cleanup recipe that should be treated as policy rather than as an immutable rule. Because of that, I think the remaining disagreement is not whether legalization matters. It is whether the current blanket breadth of the recipe is really required to preserve the same legality coverage for the broader class of blockers behind examples like the cookbook case.

I also agree that runtime driver impact is the right next question. That part needs data. I am currently checking this on one path tracer workload on one GPU on my side. I also agree that, for a default rollout, even one real regression case may be enough to reject it, which is another reason I am framing this as an opt-in or separate-profile discussion first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

3 participants