Fix cumprod gradient returning NaN when input contains zeros#1911
Fix cumprod gradient returning NaN when input contains zeros#1911WHOIM1205 wants to merge 2 commits intopymc-devs:mainfrom
Conversation
|
pre-commit.ci autofix |
|
hey @jessegrabowski and @ricardoV94 The implementation has been rewritten using a division-free formulation, and regression tests covering zero cases (including multi-dimensional inputs) have been added. All existing tests pass. |
ricardoV94
left a comment
There was a problem hiding this comment.
Yeah I don't think we're not going with a scan for the gradient. Cumprod is pretty rare and never heard people having issues with it/gradient.
Sometimes we need convenience at the expense of edge cases
Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>
3561c36 to
e13ffa0
Compare
|
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
|
Thanks for the earlier feedback about avoiding scan that makes sense I’ve reworked the implementation to keep it lightweight and removed scan entirely the gradient now uses only cumprod, cumsum, and simple masking logic to handle zeros safely It avoids division-by-zero while still matching the correct mathematical behavior for single and multiple zero cases The graph stays simple and NUMBA-compatible, and the sparse / typed_list tests pass as well Let me know if you'd like it simplified further or adjusted in any way |
|
|
||
| # Zero at the beginning | ||
| result = f(np.array([0.0, 2.0, 3.0])) | ||
| expected = np.array([9.0, 0.0, 0.0]) |
There was a problem hiding this comment.
Shouldn't the gradient wrt x[0] be 1.0?
| expected = np.array([9.0, 0.0, 0.0]) | |
| expected = np.array([1.0, 0.0, 0.0]) |
There was a problem hiding this comment.
Ah it affects the subsequent outputs too. Can you also include the case of all zeros?
| f = pytensor.function([x], g) | ||
|
|
||
| # Single zero in the middle | ||
| result = f(np.array([1.0, 0.0, 2.0])) |
There was a problem hiding this comment.
Can you avoid 1's and 2's in the tests? I think it's more robust as it avoids the mul identity or the equality between * and +
| result = f(np.array([1.0, 0.0, 2.0])) | |
| result = f(np.array([3.0, 0.0, 3.0])) |
Problem
CumOp.L_opcomputed the gradient ofcumprodusing a division-based formula:When
x[i] == 0, this produced0 / 0 = NaN, silently corrupting the gradient.This NaN propagates through the computation graph and can break optimization or MCMC without any clear indication that
cumprodis the source.This is a real-world issue since zeros commonly appear in probability masks, indicator variables, ReLU outputs, and sparse data.
The existing tests did not catch this because they only used random inputs in
(0, 1), which never include zeros.Root Cause
The gradient formula relied on dividing by
x, which is only valid when all elements are nonzero.Unlike
Prod.L_op, which implements explicit zero-handling logic,CumOp.L_opdid not account for zero values.Fix
Replaced the division-based implementation with a mathematically equivalent division-free formulation.
For each position
i:Where:
L[i]= exclusive prefix product (prod(x[0:i]))R[i]= reverse linear recurrenceR[i] = g[i] + x[i+1] * R[i+1]
This approach:
Changes
Updated
CumOp.L_opin:pytensor/tensor/extra_ops.py
Added regression tests in:
tests/tensor/test_extra_ops.py
Tests Added
New tests cover:
All existing tests pass, and gradients are now correct for inputs containing zeros.
Impact
cumprodgradientsThis PR fixes a clear correctness bug in
cumprodgradient computation.