Speed improvements to resize convolution (no vpermps w/ FMA)#2793
Speed improvements to resize convolution (no vpermps w/ FMA)#2793JimBobSquarePants merged 18 commits intomainfrom
Conversation
src/ImageSharp/Processing/Processors/Transforms/Resize/ResizeKernel.cs
Outdated
Show resolved
Hide resolved
|
Brain's not fully awake yet today, but I'll give the maths a look soon. |
…ernel.cs Co-authored-by: Clinton Ingram <clinton.ingram@outlook.com>
Co-authored-by: Clinton Ingram <clinton.ingram@outlook.com>
Thanks for the review so far. I still haven't figured out what is going on with |
|
It looks to me like the only differences are due to the change to single precision for kernel normalization and for calculation of the distances passed to the interpolation function. You'll definitely give up some accuracy there, and I'm not sure it's worth it since the kernels only have to be built once per resize. You can see here that @antonfirsov changed the precision to double from the initial implementation some years back. Since the periodic kernel map relies on each repetition of the kernel weights being exact, I can see how precision loss might lead to some differences when compared with a separate calculation per interval. I've actually never looked at your implementation of the kernel map before, and now my curiosity is piqued because I arrived at something similar myself, but my implementation calculates each kernel window separately, and only replaces that interval with the periodic version if they match exactly. Part of this was due to a lack of confidence in the maths on my part, as I only discovered the periodicity of the kernel weights by observation and kind of intuited my way to a solution. @antonfirsov would you mind filling in some gaps on the theory behind your periodic kernel map implementation? Did you use some paper or other implementation as inspiration, or did you arrive at it observationally like I did? |
|
I thought, I'd update this to match latest main. I don't quite understand what is happening with the sampling here and I'm not sure it's worth me taking the time to figure it out. @antonfirsov if you do have any insight I'd appreciate it otherwise I think I'll scrap this. |
Not without going deep down the rabbit hole :( |
I thought that might be the case. I'll leave this hanging around for a bit longer, but I don't know if it's worth it. I can do a few smaller things instead (like vectorize normalize.) |
tests/ImageSharp.Tests/Processing/Processors/Transforms/ResizeKernelMapTests.cs
Fixed
Show fixed
Hide fixed
src/ImageSharp/Processing/Processors/Transforms/Resize/ResizeKernelMap.cs
Fixed
Show fixed
Hide fixed
|
@Sergio0694 Only took me 5 years to figure out the precision issue!! 😛 |

Prerequisites
Description
Fixes #1515
This is a replacement for #1518 by @Sergio0694 with most of the work based upon his implementation. I've modernized some of the code and fixed the precision issues.
Benchmarks
Main
PR
Performance in the Playground Benchmarks looks really, really good.
CC @antonfirsov @saucecontrol