Speed improvements to resize convolution (no vpermps w/ FMA) by JimBobSquarePants · Pull Request #2793 · SixLabors/ImageSharp

JimBobSquarePants · 2024-08-15T06:46:35Z

Prerequisites

I have written a descriptive pull-request title
I have verified that there are no overlapping pull-requests open
I have verified that I am following the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
I have provided test coverage for my change (where applicable)

Description

This is a replacement for #1518 by @Sergio0694 with most of the work based upon his implementation. I've modernized some of the code and fixed the precision issues.

Description

Follow up to #1513. This PR does a couple things:

Switch the resize kernel processing to float

Add an AVX2 vectorized method to normalize the kernel

Vectorize the kernel copy when not using FMA, using Span<T>.CopyTo instead

Remove the permute8x32 when using FMA, by creating a convolution kernel of 4x the size

Resize convolution codegen diff

Before:
vmovsd xmm2, [rax]
vpermps ymm2, ymm1, ymm2
vfmadd231ps ymm0, ymm2, [r8]
After:
vmovupd ymm2, [r8]
vfmadd231ps ymm0, ymm2, [rax] 

Benchmarks

Main

BenchmarkDotNet v0.15.8, Windows 11 (10.0.26200.7628/25H2/2025Update/HudsonValley2)
AMD RYZEN AI MAX+ 395 w/ Radeon 8060S 3.00GHz, 1 CPU, 32 logical and 16 physical cores
.NET SDK 10.0.102
  [Host] : .NET 8.0.23 (8.0.23, 8.0.2325.60607), X64 RyuJIT x86-64-v4

Runtime=.NET 8.0  Arguments=/p:DebugType=portable  Toolchain=InProcessEmitToolchain

Method	Mean	Error	StdDev	Ratio	Allocated	Alloc Ratio
SystemDrawing	4.432 ms	0.0089 ms	0.0083 ms	1.00	96 B	1.00
'ImageSharp, MaxDegreeOfParallelism = 1'	2.032 ms	0.0044 ms	0.0037 ms	0.46	54664 B	569.42

PR

BenchmarkDotNet v0.15.8, Windows 11 (10.0.26200.7628/25H2/2025Update/HudsonValley2)
AMD RYZEN AI MAX+ 395 w/ Radeon 8060S 3.00GHz, 1 CPU, 32 logical and 16 physical cores
.NET SDK 10.0.102
  [Host] : .NET 8.0.23 (8.0.23, 8.0.2325.60607), X64 RyuJIT x86-64-v4

Runtime=.NET 8.0  Arguments=/p:DebugType=portable  Toolchain=InProcessEmitToolchain

Method	Mean	Error	StdDev	Ratio	Gen0	Allocated	Alloc Ratio
SystemDrawing	4.429 ms	0.0069 ms	0.0061 ms	1.00	-	96 B	1.00
'ImageSharp, MaxDegreeOfParallelism = 1'	1.892 ms	0.0103 ms	0.0086 ms	0.43	1.9531	54617 B	568.93

Performance in the Playground Benchmarks looks really, really good.

CC @antonfirsov @saucecontrol

BenchmarkDotNet v0.15.8, Windows 11 (10.0.26200.7628/25H2/2025Update/HudsonValley2)
AMD RYZEN AI MAX+ 395 w/ Radeon 8060S 3.00GHz, 1 CPU, 32 logical and 16 physical cores
.NET SDK 10.0.102
  [Host]   : .NET 10.0.2 (10.0.2, 10.0.225.61305), X64 RyuJIT x86-64-v4
  ShortRun : .NET 10.0.2 (10.0.2, 10.0.225.61305), X64 RyuJIT x86-64-v4

Job=ShortRun  IterationCount=5  LaunchCount=1
WarmupCount=5

Method	Mean	Error	StdDev	Ratio
'MagicScaler Load, Resize, Save'	31.95 ms	0.377 ms	0.098 ms	0.17
'ImageSharp TD Load, Resize, Save'	39.72 ms	0.842 ms	0.219 ms	0.21
'NetVips Load, Resize, Save'	58.67 ms	1.063 ms	0.276 ms	0.32
'ImageSharp Load, Resize, Save'	59.62 ms	3.248 ms	0.503 ms	0.32
'SkiaSharp Load, Resize, Save'	77.58 ms	1.987 ms	0.516 ms	0.42
'ImageFree Load, Resize, Save'	132.15 ms	1.994 ms	0.518 ms	0.71
'System.Drawing Load, Resize, Save'	185.61 ms	4.171 ms	1.083 ms	1.00
'ImageFlow Load, Resize, Save'	189.55 ms	2.393 ms	0.622 ms	1.02
'ImageMagick Load, Resize, Save'	200.84 ms	2.369 ms	0.615 ms	1.08

src/ImageSharp/Common/Helpers/Vector512Utilities.cs

src/ImageSharp/Common/Helpers/Numerics.cs

src/ImageSharp/Processing/Processors/Transforms/Resize/ResizeKernel.cs

saucecontrol · 2024-08-21T18:56:40Z

Brain's not fully awake yet today, but I'll give the maths a look soon.

…ernel.cs Co-authored-by: Clinton Ingram <clinton.ingram@outlook.com>

Co-authored-by: Clinton Ingram <clinton.ingram@outlook.com>

JimBobSquarePants · 2024-08-22T12:25:52Z

Brain's not fully awake yet today, but I'll give the maths a look soon.

Thanks for the review so far. I still haven't figured out what is going on with PeriodicKernelMap. It all looks correct to me.

saucecontrol · 2024-08-23T18:43:51Z

It looks to me like the only differences are due to the change to single precision for kernel normalization and for calculation of the distances passed to the interpolation function. You'll definitely give up some accuracy there, and I'm not sure it's worth it since the kernels only have to be built once per resize. You can see here that @antonfirsov changed the precision to double from the initial implementation some years back.

Since the periodic kernel map relies on each repetition of the kernel weights being exact, I can see how precision loss might lead to some differences when compared with a separate calculation per interval. I've actually never looked at your implementation of the kernel map before, and now my curiosity is piqued because I arrived at something similar myself, but my implementation calculates each kernel window separately, and only replaces that interval with the periodic version if they match exactly. Part of this was due to a lack of confidence in the maths on my part, as I only discovered the periodicity of the kernel weights by observation and kind of intuited my way to a solution.

@antonfirsov would you mind filling in some gaps on the theory behind your periodic kernel map implementation? Did you use some paper or other implementation as inspiration, or did you arrive at it observationally like I did?

JimBobSquarePants · 2025-06-11T12:11:39Z

I thought, I'd update this to match latest main. I don't quite understand what is happening with the sampling here and I'm not sure it's worth me taking the time to figure it out. @antonfirsov if you do have any insight I'd appreciate it otherwise I think I'll scrap this.

antonfirsov · 2025-06-12T11:12:55Z

@antonfirsov if you do have any insight I'd appreciate

Not without going deep down the rabbit hole :(

JimBobSquarePants · 2025-06-12T11:23:29Z

@antonfirsov if you do have any insight I'd appreciate

Not without going deep down the rabbit hole :(

I thought that might be the case. I'll leave this hanging around for a bit longer, but I don't know if it's worth it. I can do a few smaller things instead (like vectorize normalize.)

tests/ImageSharp.Tests/Processing/Processors/Transforms/ResizeKernelMapTests.cs

src/ImageSharp/Processing/Processors/Transforms/Resize/ResizeKernelMap.cs

JimBobSquarePants · 2026-02-03T08:11:53Z

@Sergio0694 Only took me 5 years to figure out the precision issue!! 😛

JimBobSquarePants added 3 commits August 14, 2024 23:19

Reimplement @Sergio0694 work.

cd1b77a

Add Vector512 support

36fefc6

Use dedicated property

4728b97

JimBobSquarePants added the area:performance label Aug 15, 2024

JimBobSquarePants added this to the v4.0.0 milestone Aug 15, 2024

JimBobSquarePants added 2 commits August 15, 2024 17:24

Update ResizeKernelMap.cs

8c19a97

Don't use FMA for 512

7840665

saucecontrol reviewed Aug 21, 2024

View reviewed changes

src/ImageSharp/Common/Helpers/Vector512Utilities.cs Outdated Show resolved Hide resolved

saucecontrol reviewed Aug 21, 2024

View reviewed changes

src/ImageSharp/Common/Helpers/Numerics.cs Outdated Show resolved Hide resolved

saucecontrol reviewed Aug 21, 2024

View reviewed changes

src/ImageSharp/Processing/Processors/Transforms/Resize/ResizeKernel.cs Outdated Show resolved Hide resolved

JimBobSquarePants and others added 4 commits August 22, 2024 21:20

Update src/ImageSharp/Processing/Processors/Transforms/Resize/ResizeK…

58f6afb

…ernel.cs Co-authored-by: Clinton Ingram <clinton.ingram@outlook.com>

Update src/ImageSharp/Common/Helpers/Numerics.cs

0594035

Co-authored-by: Clinton Ingram <clinton.ingram@outlook.com>

Merge branch 'main' into js/resize-map-optimizations

e60dd07

use Avx512F.FusedMultiplyAdd

72813ee

JimBobSquarePants added 4 commits October 15, 2024 10:24

Merge branch 'main' into js/resize-map-optimizations

6e84a34

Merge branch 'main' into js/resize-map-optimizations

639ce69

Merge branch 'main' into js/resize-map-optimizations

5fa4c13

Fix API changes following merge

a3d605d

JimBobSquarePants added 2 commits February 3, 2026 16:53

Merge branch 'main' into js/resize-map-optimizations

a412805

Update shared-infrastructure

cba0fee

github-code-quality bot found potential problems Feb 3, 2026

View reviewed changes

tests/ImageSharp.Tests/Processing/Processors/Transforms/ResizeKernelMapTests.cs Fixed Show fixed Hide fixed

src/ImageSharp/Processing/Processors/Transforms/Resize/ResizeKernelMap.cs Fixed Show fixed Hide fixed

JimBobSquarePants added 2 commits February 3, 2026 17:55

Fix precision loss.

6fd20f5

Implement CodeQL feedback

1e87e4e

JimBobSquarePants marked this pull request as ready for review February 3, 2026 08:07

JimBobSquarePants changed the title ~~WIP - Speed improvements to resize convolution (no vpermps w/ FMA)~~ Speed improvements to resize convolution (no vpermps w/ FMA) Feb 3, 2026

Don't use Vector512 (SLOW)

a96c78d

JimBobSquarePants merged commit ad816ed into main Feb 4, 2026
12 checks passed

JimBobSquarePants deleted the js/resize-map-optimizations branch February 4, 2026 01:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed improvements to resize convolution (no vpermps w/ FMA)#2793

Speed improvements to resize convolution (no vpermps w/ FMA)#2793
JimBobSquarePants merged 18 commits intomainfrom
js/resize-map-optimizations

JimBobSquarePants commented Aug 15, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

saucecontrol commented Aug 21, 2024

Uh oh!

JimBobSquarePants commented Aug 22, 2024

Uh oh!

saucecontrol commented Aug 23, 2024

Uh oh!

JimBobSquarePants commented Jun 11, 2025

Uh oh!

antonfirsov commented Jun 12, 2025

Uh oh!

JimBobSquarePants commented Jun 12, 2025

Uh oh!

Uh oh!

Uh oh!

JimBobSquarePants commented Feb 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

JimBobSquarePants commented Aug 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Prerequisites

Description

Description

Resize convolution codegen diff

Benchmarks

Uh oh!

Uh oh!

Uh oh!

Uh oh!

saucecontrol commented Aug 21, 2024

Uh oh!

JimBobSquarePants commented Aug 22, 2024

Uh oh!

saucecontrol commented Aug 23, 2024

Uh oh!

JimBobSquarePants commented Jun 11, 2025

Uh oh!

antonfirsov commented Jun 12, 2025

Uh oh!

JimBobSquarePants commented Jun 12, 2025

Uh oh!

Uh oh!

Uh oh!

JimBobSquarePants commented Feb 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JimBobSquarePants commented Aug 15, 2024 •

edited

Loading