Skip to content

Fix cumprod gradient returning NaN when input contains zeros#1911

Open
WHOIM1205 wants to merge 2 commits intopymc-devs:mainfrom
WHOIM1205:fix-cumprod-grad-zeros
Open

Fix cumprod gradient returning NaN when input contains zeros#1911
WHOIM1205 wants to merge 2 commits intopymc-devs:mainfrom
WHOIM1205:fix-cumprod-grad-zeros

Conversation

@WHOIM1205
Copy link
Contributor

Problem

CumOp.L_op computed the gradient of cumprod using a division-based formula:

cumsum((cumprod(x, axis) * g_out)[reverse], axis)[reverse] / x

When x[i] == 0, this produced 0 / 0 = NaN, silently corrupting the gradient.
This NaN propagates through the computation graph and can break optimization or MCMC without any clear indication that cumprod is the source.

This is a real-world issue since zeros commonly appear in probability masks, indicator variables, ReLU outputs, and sparse data.
The existing tests did not catch this because they only used random inputs in (0, 1), which never include zeros.


Root Cause

The gradient formula relied on dividing by x, which is only valid when all elements are nonzero.

Unlike Prod.L_op, which implements explicit zero-handling logic, CumOp.L_op did not account for zero values.


Fix

Replaced the division-based implementation with a mathematically equivalent division-free formulation.

For each position i:

grad[i] = L[i] * R[i]

Where:

  • L[i] = exclusive prefix product (prod(x[0:i]))
  • R[i] = reverse linear recurrence
    R[i] = g[i] + x[i+1] * R[i+1]

This approach:

  • Avoids division entirely
  • Correctly handles zero values
  • Preserves the computation graph
  • Passes finite-difference gradient checks

Changes

  • Updated CumOp.L_op in:
    pytensor/tensor/extra_ops.py

  • Added regression tests in:
    tests/tensor/test_extra_ops.py


Tests Added

New tests cover:

  • Single zero in the middle
  • Zero at the beginning
  • Multiple zeros
  • 2D inputs with zeros along an axis

All existing tests pass, and gradients are now correct for inputs containing zeros.


Impact

  • Eliminates silent NaN corruption in cumprod gradients
  • Prevents hard-to-debug failures in downstream models (e.g., PyMC)
  • Adds regression protection against future breakage
  • Maintains backward compatibility

This PR fixes a clear correctness bug in cumprod gradient computation.

@WHOIM1205
Copy link
Contributor Author

pre-commit.ci autofix

@WHOIM1205
Copy link
Contributor Author

hey @jessegrabowski and @ricardoV94
This PR fixes a correctness issue in CumOp.L_op where the gradient of cumprod returned NaN when the input contained zeros due to a 0/0 division.

The implementation has been rewritten using a division-free formulation, and regression tests covering zero cases (including multi-dimensional inputs) have been added. All existing tests pass.

Copy link
Member

@ricardoV94 ricardoV94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't think we're not going with a scan for the gradient. Cumprod is pretty rare and never heard people having issues with it/gradient.

Sometimes we need convenience at the expense of edge cases

Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>
@WHOIM1205 WHOIM1205 force-pushed the fix-cumprod-grad-zeros branch from 3561c36 to e13ffa0 Compare February 24, 2026 20:12
@WHOIM1205
Copy link
Contributor Author

pre-commit.ci autofix

@WHOIM1205
Copy link
Contributor Author

Thanks for the earlier feedback about avoiding scan that makes sense

I’ve reworked the implementation to keep it lightweight and removed scan entirely the gradient now uses only cumprod, cumsum, and simple masking logic to handle zeros safely It avoids division-by-zero while still matching the correct mathematical behavior for single and multiple zero cases

The graph stays simple and NUMBA-compatible, and the sparse / typed_list tests pass as well

Let me know if you'd like it simplified further or adjusted in any way


# Zero at the beginning
result = f(np.array([0.0, 2.0, 3.0]))
expected = np.array([9.0, 0.0, 0.0])
Copy link
Member

@ricardoV94 ricardoV94 Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the gradient wrt x[0] be 1.0?

Suggested change
expected = np.array([9.0, 0.0, 0.0])
expected = np.array([1.0, 0.0, 0.0])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah it affects the subsequent outputs too. Can you also include the case of all zeros?

f = pytensor.function([x], g)

# Single zero in the middle
result = f(np.array([1.0, 0.0, 2.0]))
Copy link
Member

@ricardoV94 ricardoV94 Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you avoid 1's and 2's in the tests? I think it's more robust as it avoids the mul identity or the equality between * and +

Suggested change
result = f(np.array([1.0, 0.0, 2.0]))
result = f(np.array([3.0, 0.0, 3.0]))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants