Fixed #34 Customizing scipy's oaconvolve #35

NimaSarajpoor · 2026-01-08T15:35:08Z

This PR is to address #34.

gitnotebooks · 2026-01-08T15:35:11Z

Review these changes at https://app.gitnotebooks.com/stumpy-dev/sliding_dot_product/pull/35

NimaSarajpoor · 2026-01-08T15:56:52Z

./timing.py -timeout 1.0 -pmin 7 -pmax 24 pyfftw pocketfft_r2c_c2r scipy_oaconvolve challenger > timing.csv

# in timing.py, I change timeout to 5.0 when `len(T) >= 2^20`

The challenger is the customized version of scipy's oaconvolve.

Observations:

The challenger outperforms scipy's oaconvolve
For len(Q) <= 2 ^16 (and len(Q)>= 2^7), challenger outperforms the others for the most part.
For len(Q) > 2^16, pocketfft outperforms the others for the most part.

For me, the important one is the first bullet point. Of the four optimization opportunities mentioned in this comment, I've addressed 1, 2, and 3 in this PR. The last item, which is about adjusting the number of multiplication for real-valued arrays, can be explored next.

sdp/challenger_sdp.py

seanlaw · 2026-01-09T01:32:35Z

As a gentle reminder, even if we can do things faster, we will never (??) remove the public scipy convolution functions from STUMPY because they should be our last resort fallback (in case the alternatives, that may use private functions, raise an error). Does that make sense?

NimaSarajpoor · 2026-01-09T02:19:50Z

Good reminder. It makes sense!!

NimaSarajpoor · 2026-01-09T18:29:56Z

test.py

+
+
+def test_oaconvolve_sdp_blocksize():
+    from sdp.challenger_sdp import sliding_dot_product


This line needs to be modified if, at a later time, we decide to move the proposal to a new file (module).

NimaSarajpoor · 2026-01-09T18:43:49Z

sdp/challenger_sdp.py

+from scipy.fft._pocketfft.basic import r2c, c2r
+
+
+def _rfft_irfft_r2c2r_block(Q, T, block_size):


I think we should choose a better name for the function. This function performs convolution between ~~each block~~ chunks of T and Q. So, how about _fftconvolve_block ?

So, I see that this functions has parts in the end that look VERY similar to https://github.com/stumpy-dev/sliding_dot_product/blob/main/sdp/pocketfft_r2c_c2r_sdp.py

It feels like we should consolidate and:

Create a function that determines the block/step sizes

Create a function that creates the actual steps/blocks of arrays (to iterate over) - Maybe even a Python generator that takes the original full length array but knows how to return the next appropriate chunk for processing

Create a function that can iterate over a single step/block (i.e., it has no knowledge of other blocks) and blindly call pocketfft_r2c_c2r_sdp.sliding_dot_product

Hmmm, would it be appropriate and more descriptive to call this something like _pocketfft_oaconvolve?

Hmmm, would it be appropriate and more descriptive to call this something like _pocketfft_oaconvolve?

Sounds better 👍

@seanlaw

Create a function that determines the block/step sizes

Done.

Create a function that creates the actual steps/blocks of arrays (to iterate over) - Maybe even a Python generator that takes the original full length array but knows how to return the next appropriate chunk for processing

Note: In oaconvolve, chunk_size is = block_size - (len(Q) - 1). The array T will be chunked by the chunk_size, and each chunk will be padded with len(Q)-1 zeros. So, the length of each chunk after padding is block_size.

Something like the following?

def chunk_generator(arr, chunk_size): n = len(arr) for i in range(0, n, chunk_size): yield arr[i: i+ chunk_size]

I am curious to know how you are planning to use this. Are you thinking about handling very large arrays in the future?

Create a function that can iterate over a single step/block (i.e., it has no knowledge of other blocks) and blindly call pocketfft_r2c_c2r_sdp.sliding_dot_product

It feels like we should consolidate

Should we return convolution instead? I've revised the module pocketfft_r2c_c2r_sdp, and it now has the function _pocketfft_convolve

I think the convolution is the common component.

Something like the following?

Instead of passing around arrays, I am thinking that the generator would yield the start_idx and stop_idx for the array

I am curious to know how you are planning to use this.

I am still not clear on how oaconvolve works but I am thinking that the _pocketfft_convolve is generic for ANY Q and T and then your generator gives you the appropriate index ranges to process. Without understanding everything, I could be giving, admittedly, bad advice here.

Should we return convolution instead? I've revised the module pocketfft_r2c_c2r_sdp, and it now has the function _pocketfft_convolve

Remind me, what is the difference between "convolution" and "sliding dot product"? Is the sliding dot product simply a slice of the convolution? If so, then, yes, perhaps we should maybe add a docstring to _pocketfft_convolve to explain how one gets SDP from this convolution (and also how this convolution compares to scipy.signal.convolve).

I am still not clear on how oaconvolve works

Created PR #36 to help us with that.

and then your generator gives you the appropriate index ranges to process. Without understanding everything, I could be giving, admittedly, bad advice here.

overlap-add breaks down T into NON-overlapping chunks, and call RFFT on all chunks at once. So, I think we should check if r2c on N arrays with same size in one call is faster than N calls of r2c, one call per array. I guess the former should be faster because it should use the same transformation (twiddle factors). If that is true (we should check), then maybe we should pass a bunch of ranges to be processed together. (And, therefore, maybe we do batch processing when T is VERY large, where each batch contains a set of ranges ??)

Remind me, what is the difference between "convolution" and "sliding dot product"? Is the sliding dot product simply a slice of the convolution?

It is simply that.

If so, then, yes, perhaps we should maybe add a docstring to _pocketfft_convolve to explain how one gets SDP from this convolution (and also how this convolution compares to scipy.signal.convolve).

Going to keep this comment open, and will get back to this after finalizing #36, which should help us get some clarity.

NimaSarajpoor · 2026-01-09T20:15:08Z

./timing.py -timeout 1.0 -pmin 7 -pmax 24 pyfftw pocketfft_r2c_c2r scipy_oaconvolve challenger > timing.csv

# in timing.py, I change timeout to 5.0 when `len(T) >= 2^20`

@seanlaw
"Challenger" seems to be the winner for most cases, and I think it is worth it to include it. What do you think? Also, can you please review the script? I've made major changes. The private objects are now only r2c and c2r. IMO, the script looks cleaner now.

sdp/challenger_sdp.py

seanlaw

@NimaSarajpoor I've left some comments but would still like another pass after you've cleaned things up further

I do agree that, for the most part, things look clean. I think it still lacks clarity as to what is happening or why the logic is coded in this way

seanlaw · 2026-01-09T20:52:39Z

sdp/challenger_sdp.py

+from scipy.fft._pocketfft.basic import r2c, c2r
+
+
+def _rfft_irfft_r2c2r_block(Q, T, block_size):


So, I see that this functions has parts in the end that look VERY similar to https://github.com/stumpy-dev/sliding_dot_product/blob/main/sdp/pocketfft_r2c_c2r_sdp.py

It feels like we should consolidate and:

Create a function that determines the block/step sizes

Create a function that creates the actual steps/blocks of arrays (to iterate over) - Maybe even a Python generator that takes the original full length array but knows how to return the next appropriate chunk for processing

Create a function that can iterate over a single step/block (i.e., it has no knowledge of other blocks) and blindly call pocketfft_r2c_c2r_sdp.sliding_dot_product

sdp/challenger_sdp.py

NimaSarajpoor · 2026-01-11T19:13:11Z

sdp/challenger_sdp.py

+from scipy.fft._pocketfft.basic import r2c, c2r
+
+
+def _rfft_irfft_r2c2r_block(Q, T, block_size):


Hmmm, would it be appropriate and more descriptive to call this something like _pocketfft_oaconvolve?

Sounds better 👍

NimaSarajpoor · 2026-01-12T02:49:32Z

sdp/challenger_sdp.py

+
+
+def _pocketfft_oaconvolve(Q, T, conv_block_size):
+    # Circular convolution between two 1D arrays X and Y


Is it Circular or Linear? If Circular, then why overlap-add gives a different output in the example provided here for the first element?

NimaSarajpoor added 3 commits January 7, 2026 23:32

modified oaconvolve

72f0fb2

update code logic

be0035e

minor clean ups

907e2e2

NimaSarajpoor commented Jan 8, 2026

View reviewed changes

NimaSarajpoor added 3 commits January 9, 2026 10:16

major changes to imporve readability

969f187

Add param blocksize

b366e35

add temp test for challenger

a3d2e44

NimaSarajpoor commented Jan 9, 2026

View reviewed changes

NimaSarajpoor requested a review from seanlaw January 9, 2026 20:39

NimaSarajpoor commented Jan 9, 2026

View reviewed changes

sdp/challenger_sdp.py Outdated Show resolved Hide resolved

seanlaw requested changes Jan 9, 2026

View reviewed changes

NimaSarajpoor commented Jan 10, 2026

View reviewed changes

sdp/challenger_sdp.py Outdated Show resolved Hide resolved

NimaSarajpoor added 2 commits January 9, 2026 23:53

add func for computing block size

22d233b

remove redundant code

d6eedfa

NimaSarajpoor commented Jan 11, 2026

View reviewed changes

NimaSarajpoor added 7 commits January 11, 2026 14:50

added clearer functions

4e23694

minor change

01a0e7a

minor change to help with future refactoring

dddb708

minor change

41db845

Added reference for finding optimal block size

99e450b

fixed test

e8fa331

revise comment

f45f541

NimaSarajpoor commented Jan 12, 2026

View reviewed changes

NimaSarajpoor added 2 commits January 13, 2026 13:33

removed overlap-add explanation. Created PR#36 instead

39e936c

renaming private functions to reflect valid convolution

f6fed15



		def test_oaconvolve_sdp_blocksize():
		from sdp.challenger_sdp import sliding_dot_product

		from scipy.fft._pocketfft.basic import r2c, c2r


		def _rfft_irfft_r2c2r_block(Q, T, block_size):



		def _pocketfft_oaconvolve(Q, T, conv_block_size):
		# Circular convolution between two 1D arrays X and Y

Fixed #34 Customizing scipy's oaconvolve #35

Are you sure you want to change the base?

Fixed #34 Customizing scipy's oaconvolve #35

Uh oh!

Conversation

NimaSarajpoor commented Jan 8, 2026

Uh oh!

gitnotebooks bot commented Jan 8, 2026

Uh oh!

NimaSarajpoor commented Jan 8, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

seanlaw commented Jan 9, 2026

Uh oh!

NimaSarajpoor commented Jan 9, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NimaSarajpoor Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NimaSarajpoor Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NimaSarajpoor Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NimaSarajpoor commented Jan 9, 2026

Uh oh!

Uh oh!

seanlaw left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

NimaSarajpoor Jan 9, 2026 •

edited

Loading

NimaSarajpoor Jan 11, 2026 •

edited

Loading

NimaSarajpoor Jan 13, 2026 •

edited

Loading

seanlaw left a comment •

edited

Loading