Skip to content

Upgrade halide-llvm to 23.0.0.dev86417+gf014202d#9073

Merged
alexreinking merged 2 commits intomainfrom
automated/upgrade-halide-llvm
Mar 23, 2026
Merged

Upgrade halide-llvm to 23.0.0.dev86417+gf014202d#9073
alexreinking merged 2 commits intomainfrom
automated/upgrade-halide-llvm

Conversation

@halide-ci
Copy link
Contributor

@halide-ci halide-ci bot commented Mar 22, 2026

Automated upgrade via uv lock -P halide-llvm.

@alexreinking
Copy link
Member

I had Codex (GPT 5.4) try to troubleshoot this. Here's the summary it produced:


Summary

The failing correctness_simd_op_check_sve2 case is udot_uint64_x1, specifically the SVE2 64-bit dot-product reduction path.

The root cause appears to be an LLVM bug in the ExpandReductions pass, not a Halide IR-generation bug.

What Fails

The failing path is the f = 8 case in Halide's SVE2 dot-product vector-reduce lowering in CodeGen_ARM.cpp and CodeGen_ARM.cpp.

That path generates LLVM IR of this shape:

%d = call <vscale x 2 x i64> @llvm.aarch64.sve.udot.nxv2i64(...)
%r = call i64 @llvm.vector.reduce.add.nxv2i64(<vscale x 2 x i64> %d)

Why It Crashes

A local crash report shows LLVM aborting in expandReductions(). Relevant stack frames:

__assert_rtn
(anonymous namespace)::expandReductions(llvm::Function&, llvm::TargetTransformInfo const*)
llvm::FPPassManager::runOnFunction
llvm::FPPassManager::runOnModule
llvm::legacy::PassManagerImpl::run
Halide::(anonymous namespace)::emit_file(...)

In LLVM's ExpandReductions.cpp, the pass assumes vector_reduce_add always has a fixed-width vector operand and does cast<FixedVectorType>(Vec->getType()). That is invalid here because the operand is scalable: <vscale x 2 x i64>.

So this is an LLVM assertion failure on scalable reductions.

What I Verified

  • The issue reproduces reliably in Halide with:
./build/macOS/test/correctness/correctness_simd_op_check_sve2 'udot_uint64_x1'
  • The workaround below makes the test pass:
HL_LLVM_ARGS='-disable-expand-reductions' \
./build/macOS/test/correctness/correctness_simd_op_check_sve2 'udot_uint64_x1'
  • A minimal standalone .ll reproduces the same crash when run through a tiny standalone LLVM API driver using the same assertions-on LLVM.
  • A plain llc invocation on that minimal .ll did not reproduce locally, even with assertions on.

Likely Regressor

Not proven, but the most likely exposing change in the reported window is:

  • 221d2f57eccd [AArch64] Add partial reduce patterns for new sve dot variants (#184649)

ExpandReductions.cpp itself did not change in the relevant range, so the most likely scenario is that this AArch64/SVE change started feeding a scalable vector.reduce.add into a pass that was already assuming fixed-width vectors.

Interpretation

  • Root bug: LLVM
  • Possible Halide mitigation: avoid generating this shape, or avoid ExpandReductions for this path
  • But that would be a workaround, not the real fix

Standalone Reproducer Status

We now have:

  • minimal IR
  • standalone LLVM API reproducer source

This is enough for a useful LLVM issue even though we do not yet have a pure llc file.ll repro.

Minimal IR: /tmp/reduce_udot_min.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32"
target triple = "aarch64--linux-gnueabihf"

declare <vscale x 2 x i64> @llvm.aarch64.sve.udot.nxv2i64(<vscale x 2 x i64>, <vscale x 8 x i16>, <vscale x 8 x i16>)
declare i64 @llvm.vector.reduce.add.nxv2i64(<vscale x 2 x i64>)

define i64 @f(ptr %p) {
entry:
  %a = load <vscale x 8 x i16>, ptr %p, align 16
  %p2 = getelementptr i8, ptr %p, i64 64
  %b = load <vscale x 8 x i16>, ptr %p2, align 16
  %d = call <vscale x 2 x i64> @llvm.aarch64.sve.udot.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
  %r = call i64 @llvm.vector.reduce.add.nxv2i64(<vscale x 2 x i64> %d)
  ret i64 %r
}
Standalone LLVM API reproducer: /tmp/llvm_emit_repro.cpp
#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/CodeGen/CommandFlags.h"
#include "llvm/IR/LegacyPassManager.h"
#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/MDBuilder.h"
#include "llvm/IR/Module.h"
#include "llvm/IRReader/IRReader.h"
#include "llvm/MC/TargetRegistry.h"
#include "llvm/Pass.h"
#include "llvm/Support/FileSystem.h"
#include "llvm/Support/InitLLVM.h"
#include "llvm/Support/Path.h"
#include "llvm/Support/SourceMgr.h"
#include "llvm/Support/TargetSelect.h"
#include "llvm/Support/ToolOutputFile.h"
#include "llvm/Target/TargetMachine.h"
#include "llvm/Target/TargetOptions.h"
#include "llvm/TargetParser/Triple.h"
#include "llvm/Transforms/IPO/AlwaysInliner.h"

#include <memory>
#include <string>
#include <vector>

using namespace llvm;

static std::string getModuleFlagString(Module &M, StringRef Name) {
  if (auto *MD = M.getModuleFlag(Name)) {
    if (auto *MDS = dyn_cast<MDString>(MD)) {
      return MDS->getString().str();
    }
  }
  return "";
}

static std::unique_ptr<TargetMachine> makeTM(Module &M) {
  Triple TT = M.getTargetTriple();
  std::string TripleStr = TT.getTriple();
  std::string Error;
  const Target *T = TargetRegistry::lookupTarget(TT, Error);
  if (!T) {
    errs() << "lookupTarget failed for " << TripleStr << ": " << Error << "\n";
    return nullptr;
  }

  std::string CPU = getModuleFlagString(M, "halide_mcpu_target");
  std::string Features = getModuleFlagString(M, "halide_mattrs");
  TargetOptions Options;
  auto RM = std::optional<Reloc::Model>(Reloc::PIC_);
  return std::unique_ptr<TargetMachine>(
      T->createTargetMachine(TT, CPU, Features, Options, RM));
}

static bool emitOne(StringRef InPath, StringRef OutPath,
                    CodeGenFileType FileType) {
  LLVMContext Ctx;
  SMDiagnostic Err;
  auto M = parseIRFile(InPath, Err, Ctx);
  if (!M) {
    Err.print("llvm_emit_repro", errs());
    return false;
  }

  auto TM = makeTM(*M);
  if (!TM) {
    return false;
  }
  M->setDataLayout(TM->createDataLayout());

  std::error_code EC;
  auto Out = std::make_unique<ToolOutputFile>(OutPath, EC, sys::fs::OF_None);
  if (EC) {
    errs() << "open failed for " << OutPath << ": " << EC.message() << "\n";
    return false;
  }

  legacy::PassManager PM;
  PM.add(new TargetLibraryInfoWrapperPass(Triple(M->getTargetTriple())));
  PM.add(createAlwaysInlinerLegacyPass());
  TM->Options.MCOptions.AsmVerbose = true;
  TM->addPassesToEmitFile(PM, Out->os(), nullptr, FileType);
  PM.run(*M);
  Out->keep();
  return true;
}

int main(int argc, char **argv) {
  InitLLVM X(argc, argv);
  if (argc < 3) {
    errs() << "usage: llvm_emit_repro outdir file1.ll [file2.ll ...]\n";
    return 2;
  }

  LLVMInitializeAArch64TargetInfo();
  LLVMInitializeAArch64Target();
  LLVMInitializeAArch64TargetMC();
  LLVMInitializeAArch64AsmPrinter();
  LLVMInitializeAArch64AsmParser();

  std::string OutDir = argv[1];
  for (int i = 2; i < argc; i++) {
    std::string InPath = argv[i];
    std::string Base = sys::path::filename(InPath).str();
    std::string OutPath = OutDir + "/" + Base + ".s";
    errs() << "emitting " << InPath << " -> " << OutPath << "\n";
    if (!emitOne(InPath, OutPath, CodeGenFileType::AssemblyFile)) {
      return 1;
    }
  }
  return 0;
}

@alexreinking
Copy link
Member

Opened llvm/llvm-project#188024

@alexreinking
Copy link
Member

alexreinking commented Mar 23, 2026

Codex noticed that TargetTransformInfo was set differently between CodeGen_LLVM.cpp and lld... and I noticed that CodeGen_PTX_Dev.cpp had the following line:

module_pass_manager.add(createTargetTransformInfoWrapperPass(target_machine->getTargetIRAnalysis()));

Copying it over to CodeGen_LLVM.cpp fixed the test locally, so now I'm seeing what CI says about it.

@alexreinking alexreinking merged commit 455b34b into main Mar 23, 2026
24 checks passed
@halide-ci halide-ci bot deleted the automated/upgrade-halide-llvm branch March 24, 2026 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant