Add support for convolution + pooling + unfolding #10

fferflo · 2024-04-23T13:02:56Z

fferflo
Apr 23, 2024
Maintainer

Considering some potential options here to support convolutions (and related operations like pooling, fold/ unfold) using einx notation.

Option 1: Introduce expression for folded axis

Introduce

(a * b) as a single axis that can be unfolded into axes a and b

similar to

(a b) representing a single axis that can be decomposed into a and b, and
(a + b) representing a single axis that can be split into a and b.

This would allow expressing different operations using existing functions:

Pooling: (i.e. unfolding + reduction, computed jointly using efficient backend ops)

# Mean-pooling
einx.mean("b (s * [ds])... c", x, ds=4)

# Joint max-pooling and channel reduction? -> ordinary reduction first
einx.max("b (s * [ds])... [c] -> b s...", x, ds=4)

Convolution: (i.e. unfolding + dot product, computed jointly using efficient backend ops)

# Convolution
einx.dot("b (s * [ds])... [c_in->c_out]", x, weight, ds=4, c_out=64) # weight shape will be 'ds... c_in c_out'

# Depthwise convolution with same weights per channel
einx.dot("b (s * [ds])... c", x, weight, ds=4) # weight shape will be 'ds...'

# Depthwise convolution with different weights per channel
einx.dot("b (s * [ds])... [c->c]", x, weight, ds=4) # weight shape will be 'ds... c'

# Explicit kernel shape:
einx.dot("b (s * [ds])... c, [ds]... -> b s... c")

Unfolding:

# Extract local windows of size 'ds...' for each pixel
einx.rearrange("b (s * ds)... c" -> b s... (ds...) c", x, ds=4)

Good

Retains the existing set of elementary operations. No new function names einx.{conv|unfold|mean_pool|max_pool|*_pool}.

Pooling resembles the existing option of using axis compositions:

einx.mean("b (s   [ds])... c", x, ds=4) # Existing pooling with kernel_size=stride=4 (works only if evenly divisible)
einx.mean("b (s * [ds])... c", x, ds=4) # New general pooling

1x1 convolution resembles the existing option of using a regular dot-product:

einx.dot("b s...         [c_in->c_out]", x, weight, c_out=64)
einx.dot("b (s * [1])... [c_in->c_out]", x, weight, c_out=64)

Bad

New type of notation which is hard to understand without checking the docs or tutorials. Using some kind of einx.conv(...) would be more expressive for anyone not familiar with einx notation.
Maybe unintuitive: (s * ds) does not refer to the result of a convolution between s and ds. It rather means: Convolving (s * ds) with ds will yield an axis s.
* star represents a convolution operator, but might look like it is a placeholder for other axes. It may or may not be better to use a different symbol, e.g. #, to avoid confusion.

Option 2: Just use `[]`-brackets for spatial axes

Thinking through whether it would make sense to use []-brackets to mark spatial axes, and include kernel_size, stride and dilation as additional arguments to a function. This would define unfolding as part of the elementary operation.

Pooling:

einx.pool("b [h w] c -> b [h_out w_out] c", x, kernel_size=(4, 4), stride=(4, 4), op="mean")
einx.pool("b [h w] c", x, kernel_size=4, stride=4, op="mean") # Implicitly determine output shape

Convolution:

The input channel axes should be marked since they represent an operation axis (might potentially include an implicit notation without these brackets similar to einx.dot). This would give something like:

einx.conv("b [h_in w_in] [c_in], [h_kernel w_kernel c_in] c_out -> b [h_out w_out] c_out", x, weight, stride=4)
# kernel_size is given from weight shape.
# order of axes should match in all expressions since the axes do not have the same names (h_in vs h_kernel vs h_out).
# c_in can be identified as a channel axis in the input because it appears in the weight expression, h_in and w_in don't.

Good

It is obvious for anyone not familiar with einx that einx.conv performs a convolution, and einx.pool performs pooling.
No new expression types that have to be learnt.

Bad

Convolution is tedious and hard to read compared to something like nn.Conv(features=64, kernel_size=4)(x)
Many new function names einx.{conv|unfold|mean_pool|max_pool|*_pool} and larger API.
No obvious commonality between pooling, convolution and unfolding.

Cannot implicitly determine the weight shape similar to einx.dot:

einx.dot("... [c1] -> ... [c2]", x, einn.param(), c2=64)
# -> weight shape is "c1 c2"
einx.conv("b [h w] [c_in] -> b [h_out w_out] [c_out]", x, einn.param(), ...)
# -> Which axes are spatial axes, which axes are channel axes?

New pooling operation looks different from existing pooling operation:

einx.mean("b (s [ds])... c", x, ds=4) # Existing pooling with kernel_size=stride=4 (works only if evenly divisible)
einx.pool("b [...] c", x, ds=4) # New general pooling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for convolution + pooling + unfolding #10

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Add support for convolution + pooling + unfolding #10

Uh oh!

Uh oh!

fferflo Apr 23, 2024 Maintainer

Option 1: Introduce expression for folded axis

Good

Bad

Option 2: Just use []-brackets for spatial axes

Good

Bad

Replies: 0 comments

fferflo
Apr 23, 2024
Maintainer

Option 2: Just use `[]`-brackets for spatial axes