Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
236 changes: 173 additions & 63 deletions doc/tutorial/broadcasting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,73 +10,183 @@
Broadcasting
============

Broadcasting is a mechanism which allows tensors with
different numbers of dimensions to be added or multiplied
together by (virtually) replicating the smaller tensor along
the dimensions that it is lacking.
Broadcasting is a mechanism which allows tensors with different
numbers of dimensions, or with axes of length ``1``, to be combined
in elementwise operations by (virtually) replicating the smaller
tensor along the dimensions that it is lacking.

Broadcasting is the mechanism by which a scalar
may be added to a matrix, a vector to a matrix or a scalar to
a vector.
It is what lets you add a scalar to a matrix, a vector to a matrix,
or a column to a row, without having to manually tile either
operand.

.. figure:: bcast.png

Broadcasting a row matrix. T and F respectively stand for
True and False and indicate along which dimensions we allow
broadcasting.

If the second argument were a vector, its shape would be
``(2,)`` and its broadcastable pattern ``(False,)``. They would
be automatically expanded to the **left** to match the
dimensions of the matrix (adding ``1`` to the shape and ``True``
to the pattern), resulting in ``(1, 2)`` and ``(True, False)``.
It would then behave just like the example above.

Unlike numpy which does broadcasting dynamically, PyTensor needs
to know, for any operation which supports broadcasting, which
dimensions will need to be broadcasted. When applicable, this
information is given in the :ref:`type` of a *Variable*.

The following code illustrates how rows and columns are broadcasted in order to perform an addition operation with a matrix:

>>> r = pt.row()
>>> r.broadcastable
(True, False)
>>> mtr = pt.matrix()
>>> mtr.broadcastable
(False, False)
>>> f_row = pytensor.function([r, mtr], [r + mtr])
>>> R = np.arange(3).reshape(1, 3)
>>> R
array([[0, 1, 2]])
>>> M = np.arange(9).reshape(3, 3)
>>> M
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> f_row(R, M)
[array([[ 0., 2., 4.],
[ 3., 5., 7.],
[ 6., 8., 10.]])]
>>> c = pt.col()
>>> c.broadcastable
(False, True)
>>> f_col = pytensor.function([c, mtr], [c + mtr])
>>> C = np.arange(3).reshape(3, 1)
>>> C
array([[0],
[1],
[2]])
>>> M = np.arange(9).reshape(3, 3)
>>> f_col(C, M)
[array([[ 0., 1., 2.],
[ 4., 5., 6.],
[ 8., 9., 10.]])]

In these examples, we can see that both the row vector and the column vector are broadcasted in order to be be added to the matrix.
Broadcasting a ``(1, 2)`` row against a ``(3, 2)`` matrix. The
row is virtually replicated along axis 0 to match the matrix.
The figure uses the legacy ``bcast: (T, F)`` notation: ``T``
marks an axis statically known to be length ``1`` (broadcastable)
and ``F`` an axis whose length is unconstrained.

See also:
If the second argument were a vector instead of a row, its shape
would be ``(2,)``. It would be automatically expanded to the
**left** to match the rank of the matrix (adding ``1`` to the
shape), resulting in ``(1, 2)``, and then broadcast just like the
row in the figure.

Unlike NumPy, which does broadcasting dynamically, PyTensor needs
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this accurate? We have basically the same broadcasting rules as numpy, but we're more flexible because we don't require you to tell us the shapes.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is accurate we do strictly less broadcasting than Numpy in the sense numpy will never fail with zeros(3, 1) + zeros(3, 3), whereas we will absolutely refuse if we didn't know that 1 was going to be a 1 in advance.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right framing? We're the same as numpy, just more flexible because sometimes we don't know the shape of things, so there's a requirements to be more like numpy to get broadcasting to work

Copy link
Copy Markdown
Member Author

@ricardoV94 ricardoV94 Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are less flexible. Numpy is eager, it always knows everything it needs. No symbolic/deferred stuff

to know, at graph-build time, which dimensions of an input may be
broadcasted. A dimension can only be broadcasted against another if
PyTensor knows statically that its length is ``1``. This information
lives on the variable's :attr:`type.shape <pytensor.tensor.TensorType.shape>`:
a concrete integer (such as ``1``) means PyTensor knows the size,
and ``None`` means the size is only known at runtime.

The following code illustrates how a column variable is broadcasted
in order to be added to a matrix (broadcasting on the trailing axis):

>>> import numpy as np
>>> import pytensor
>>> import pytensor.tensor as pt
>>> x_matrix = pt.matrix("x_matrix")
>>> x_matrix.type.shape
(None, None)
>>> y_col = pt.col("y_col")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider pt.tensor('y_col', shape=(None, 1)) to make it super explicit? That's all col is, i wouldn't want the example to give the impression this is the way you hanve to do it

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I show that below. Here I wanted to just show broadcast working. pt.col is not bad per se. I agree we never promote it. In this case it makes the document a bit easier to follow. Otherwise the solution to the gotcha below is just the working case above. Let me thing

>>> y_col.type.shape
(None, 1)
>>> out = x_matrix + y_col
>>> fn = pytensor.function([x_matrix, y_col], out)
>>> X = np.arange(9).reshape(3, 3)
>>> Y = np.arange(3).reshape(3, 1)
>>> fn(X, Y)
array([[ 0., 1., 2.],
[ 4., 5., 6.],
[ 8., 9., 10.]])

The column's trailing axis is statically known to be ``1``, which is
why PyTensor is willing to broadcast it against the matrix.


.. _runtime_broadcasting:

Runtime broadcasting limitations
================================

.. warning::

PyTensor does **not** broadcast dimensions whose length is only
known to be ``1`` at runtime. A graph that would broadcast fine
in NumPy can raise a ``ValueError`` when executed. To get
broadcasting, the length-``1`` axis must be visible in the
variable's static :attr:`type.shape <pytensor.tensor.TensorType.shape>`.

For example, adding a matrix to another matrix whose trailing axis
happens to be ``1`` at runtime fails:

>>> x_matrix = pt.matrix("x_matrix")
>>> y_matrix = pt.matrix("y_matrix")
>>> out = x_matrix + y_matrix
>>> fn = pytensor.function([x_matrix, y_matrix], out)
>>> try:
... fn(np.zeros((3, 3)), np.zeros((3, 1)))
... except ValueError as err:
... print(str(err).split("\n")[0]) # doctest: +ELLIPSIS
Incompatible vectorized shapes for input 1 and axis 1. ...

Note that runtime length ``1`` is only a problem when paired with a
non-``1`` length on the other side, where broadcasting would be
required. Calling the same ``fn`` with matching shapes works fine,
because no broadcasting needs to happen:

>>> fn(np.zeros((3, 3)), np.zeros((3, 3))).shape
(3, 3)
>>> fn(np.zeros((3, 1)), np.zeros((3, 1))).shape
(3, 1)

PyTensor assumes generality by default: a dimension declared with
``None`` is treated as "any length", not as "possibly ``1``". To
allow broadcasting you have to make the length-``1`` axis visible
to the graph. There are three idiomatic ways to do this.

* `SciPy documentation about numpy's broadcasting <http://www.scipy.org/EricsBroadcastingDoc>`_
#. **Declare the static shape on the input.** If you know an input
will always have length ``1`` along an axis, say so when you
create it:

>>> x_matrix = pt.matrix("x_matrix")
>>> y_col = pt.matrix("y_col", shape=(None, 1))
>>> out = x_matrix + y_col
>>> fn = pytensor.function([x_matrix, y_col], out)
>>> fn(np.zeros((3, 3)), np.zeros((3, 1)))
array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])

#. **Use** :func:`specify_shape <pytensor.tensor.specify_shape>`
**inside the graph.** When the variable comes from somewhere
you do not control (for example an intermediate result), you can
pin its shape:

>>> x_matrix = pt.matrix("x_matrix")
>>> y_matrix = pt.matrix("y_matrix")
>>> y_col = pt.specify_shape(y_matrix, (None, 1))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

illustrate the intermediate result case you mentioned as motivation

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYM? I don't want to complicate the example with an arbitrary Op that happens to lose static shape just to make the point. If that's what you had in mind....

>>> out = x_matrix + y_col
>>> fn = pytensor.function([x_matrix, y_matrix], out)
>>> fn(np.zeros((3, 3)), np.zeros((3, 1)))
array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])

#. **Drop the axis and add it back explicitly.** If the variable is
conceptually a vector that you only happened to wrap in a
length-``1`` trailing axis, model it as a vector and broadcast
with :func:`expand_dims <pytensor.tensor.expand_dims>`:

>>> x_matrix = pt.matrix("x_matrix")
>>> y_vector = pt.vector("y_vector")
>>> out = x_matrix + pt.expand_dims(y_vector, 1) # or x_matrix + y_vector[:, None]
Comment on lines +144 to +145
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the "drop the axis" part here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we went for a vector instead of a matrix

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a drop before you even start. I can try to get a better title

>>> fn = pytensor.function([x_matrix, y_vector], out)
>>> fn(np.zeros((3, 3)), np.zeros(3))
array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])

The same caveat applies to shape-producing operations that accept
symbolic shapes, such as :func:`full <pytensor.tensor.full>`,
:func:`zeros <pytensor.tensor.zeros>`, :func:`ones <pytensor.tensor.ones>`,
or :func:`alloc <pytensor.tensor.alloc>`. Passing a symbolic length
yields ``None`` in the static shape, even if at runtime the length
will always be ``1``:

>>> n = pt.scalar("n", dtype="int64")
>>> pt.full((n,), 0.0).type.shape
(None,)
>>> pt.full((1,), 0.0).type.shape
(1,)

If you need the result to broadcast, use a literal ``1`` (or wrap
the result in ``specify_shape``) so the static shape carries that
information.

Constants, on the other hand, always carry their full static shape
and are safe to broadcast — PyTensor reads the shape directly from
the underlying value:

>>> pt.constant(np.zeros((3, 1))).type.shape
(3, 1)

Shared variables, unlike constants, are assumed to be resizable.
By default their static shape is ``None`` along every axis, even if
the initial value happens to have a length-``1`` axis. Pass
``shape=`` to mark the axes you want to be broadcastable:

>>> value = np.zeros((3, 1))
>>> pytensor.shared(value).type.shape
(None, None)
>>> pytensor.shared(value, shape=(None, 1)).type.shape
(None, 1)

Pinning the shape also tells PyTensor that future ``set_value`` calls
must respect it, so only do so for axes that genuinely will not change.

See also:

* `OnLamp article about numpy's broadcasting <http://www.onlamp.com/pub/a/python/2000/09/27/numerically.html>`_
* `NumPy documentation about broadcasting <https://numpy.org/doc/stable/user/basics.broadcasting.html>`_
Loading