Skip to content

Conversation

@NimaSarajpoor
Copy link
Collaborator

This PR is to address #34.

@gitnotebooks
Copy link

gitnotebooks bot commented Jan 8, 2026

@NimaSarajpoor
Copy link
Collaborator Author

./timing.py -timeout 1.0 -pmin 7 -pmax 24 pyfftw pocketfft_r2c_c2r scipy_oaconvolve challenger > timing.csv

# in timing.py, I change timeout to 5.0 when `len(T) >= 2^20`

The challenger is the customized version of scipy's oaconvolve.

customized_oaconvolve

Observations:

  • The challenger outperforms scipy's oaconvolve
  • For len(Q) <= 2 ^16 (and len(Q)>= 2^7), challenger outperforms the others for the most part.
  • For len(Q) > 2^16, pocketfft outperforms the others for the most part.

For me, the important one is the first bullet point. Of the four optimization opportunities mentioned in this comment, I've addressed 1, 2, and 3 in this PR. The last item, which is about adjusting the number of multiplication for real-valued arrays, can be explored next.

@seanlaw
Copy link
Contributor

seanlaw commented Jan 9, 2026

As a gentle reminder, even if we can do things faster, we will never (??) remove the public scipy convolution functions from STUMPY because they should be our last resort fallback (in case the alternatives, that may use private functions, raise an error). Does that make sense?

@NimaSarajpoor
Copy link
Collaborator Author

Good reminder. It makes sense!!



def test_oaconvolve_sdp_blocksize():
from sdp.challenger_sdp import sliding_dot_product
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line needs to be modified if, at a later time, we decide to move the proposal to a new file (module).

from scipy.fft._pocketfft.basic import r2c, c2r


def _rfft_irfft_r2c2r_block(Q, T, block_size):
Copy link
Collaborator Author

@NimaSarajpoor NimaSarajpoor Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should choose a better name for the function. This function performs convolution between each block chunks of T and Q. So, how about _fftconvolve_block ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I see that this functions has parts in the end that look VERY similar to https://github.com/stumpy-dev/sliding_dot_product/blob/main/sdp/pocketfft_r2c_c2r_sdp.py

It feels like we should consolidate and:

  1. Create a function that determines the block/step sizes
  2. Create a function that creates the actual steps/blocks of arrays (to iterate over) - Maybe even a Python generator that takes the original full length array but knows how to return the next appropriate chunk for processing
  3. Create a function that can iterate over a single step/block (i.e., it has no knowledge of other blocks) and blindly call pocketfft_r2c_c2r_sdp.sliding_dot_product

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, would it be appropriate and more descriptive to call this something like _pocketfft_oaconvolve?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, would it be appropriate and more descriptive to call this something like _pocketfft_oaconvolve?

Sounds better 👍

Copy link
Collaborator Author

@NimaSarajpoor NimaSarajpoor Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanlaw

  1. Create a function that determines the block/step sizes

Done.

  1. Create a function that creates the actual steps/blocks of arrays (to iterate over) - Maybe even a Python generator that takes the original full length array but knows how to return the next appropriate chunk for processing

Note: In oaconvolve, chunk_size is = block_size - (len(Q) - 1). The array T will be chunked by the chunk_size, and each chunk will be padded with len(Q)-1 zeros. So, the length of each chunk after padding is block_size.

Something like the following?

def chunk_generator(arr, chunk_size):
    n = len(arr)
    for i in range(0, n, chunk_size):
        yield arr[i: i+ chunk_size]

I am curious to know how you are planning to use this. Are you thinking about handling very large arrays in the future?

Create a function that can iterate over a single step/block (i.e., it has no knowledge of other blocks) and blindly call pocketfft_r2c_c2r_sdp.sliding_dot_product

It feels like we should consolidate

Should we return convolution instead? I've revised the module pocketfft_r2c_c2r_sdp, and it now has the function _pocketfft_convolve

I think the convolution is the common component.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like the following?

Instead of passing around arrays, I am thinking that the generator would yield the start_idx and stop_idx for the array

I am curious to know how you are planning to use this.

I am still not clear on how oaconvolve works but I am thinking that the _pocketfft_convolve is generic for ANY Q and T and then your generator gives you the appropriate index ranges to process. Without understanding everything, I could be giving, admittedly, bad advice here.

Should we return convolution instead? I've revised the module pocketfft_r2c_c2r_sdp, and it now has the function _pocketfft_convolve

Remind me, what is the difference between "convolution" and "sliding dot product"? Is the sliding dot product simply a slice of the convolution? If so, then, yes, perhaps we should maybe add a docstring to _pocketfft_convolve to explain how one gets SDP from this convolution (and also how this convolution compares to scipy.signal.convolve).

Copy link
Collaborator Author

@NimaSarajpoor NimaSarajpoor Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not clear on how oaconvolve works

Created PR #36 to help us with that.

and then your generator gives you the appropriate index ranges to process. Without understanding everything, I could be giving, admittedly, bad advice here.

overlap-add breaks down T into NON-overlapping chunks, and call RFFT on all chunks at once. So, I think we should check if r2c on N arrays with same size in one call is faster than N calls of r2c, one call per array. I guess the former should be faster because it should use the same transformation (twiddle factors). If that is true (we should check), then maybe we should pass a bunch of ranges to be processed together. (And, therefore, maybe we do batch processing when T is VERY large, where each batch contains a set of ranges ??)

Remind me, what is the difference between "convolution" and "sliding dot product"? Is the sliding dot product simply a slice of the convolution?

It is simply that.

If so, then, yes, perhaps we should maybe add a docstring to _pocketfft_convolve to explain how one gets SDP from this convolution (and also how this convolution compares to scipy.signal.convolve).

Going to keep this comment open, and will get back to this after finalizing #36, which should help us get some clarity.

@NimaSarajpoor
Copy link
Collaborator Author

./timing.py -timeout 1.0 -pmin 7 -pmax 24 pyfftw pocketfft_r2c_c2r scipy_oaconvolve challenger > timing.csv

# in timing.py, I change timeout to 5.0 when `len(T) >= 2^20`
customized_oaconvolve_performance

@seanlaw
"Challenger" seems to be the winner for most cases, and I think it is worth it to include it. What do you think? Also, can you please review the script? I've made major changes. The private objects are now only r2c and c2r. IMO, the script looks cleaner now.

@NimaSarajpoor NimaSarajpoor requested a review from seanlaw January 9, 2026 20:39
Copy link
Contributor

@seanlaw seanlaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NimaSarajpoor I've left some comments but would still like another pass after you've cleaned things up further

I do agree that, for the most part, things look clean. I think it still lacks clarity as to what is happening or why the logic is coded in this way

from scipy.fft._pocketfft.basic import r2c, c2r


def _rfft_irfft_r2c2r_block(Q, T, block_size):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I see that this functions has parts in the end that look VERY similar to https://github.com/stumpy-dev/sliding_dot_product/blob/main/sdp/pocketfft_r2c_c2r_sdp.py

It feels like we should consolidate and:

  1. Create a function that determines the block/step sizes
  2. Create a function that creates the actual steps/blocks of arrays (to iterate over) - Maybe even a Python generator that takes the original full length array but knows how to return the next appropriate chunk for processing
  3. Create a function that can iterate over a single step/block (i.e., it has no knowledge of other blocks) and blindly call pocketfft_r2c_c2r_sdp.sliding_dot_product

from scipy.fft._pocketfft.basic import r2c, c2r


def _rfft_irfft_r2c2r_block(Q, T, block_size):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, would it be appropriate and more descriptive to call this something like _pocketfft_oaconvolve?

Sounds better 👍



def _pocketfft_oaconvolve(Q, T, conv_block_size):
# Circular convolution between two 1D arrays X and Y
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it Circular or Linear? If Circular, then why overlap-add gives a different output in the example provided here for the first element?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants