Skip to content

Fix scan_save_mem uninitialized memory read when n_steps=0 (fixes #1878)#1902

Open
ayulockedin wants to merge 2 commits intopymc-devs:mainfrom
ayulockedin:fix/scan-save-mem-n-steps-zero
Open

Fix scan_save_mem uninitialized memory read when n_steps=0 (fixes #1878)#1902
ayulockedin wants to merge 2 commits intopymc-devs:mainfrom
ayulockedin:fix/scan-save-mem-n-steps-zero

Conversation

@ayulockedin
Copy link
Contributor

Description

When n_steps=0, the scan_save_mem rewrite was returning uninitialized memory when the output buffer was sliced with [-1].

Root Cause: 1. The optimization forced a minimum loop iteration count of 1 via select_max(nw_steps, 1), causing a dummy execution.
2. The pre-allocated cyclical buffer unconditionally padded its size (init_l[i] + 1). Because start is calculated as nw_steps - 1 for [-1] slices, algebraic simplification evaluated the buffer size to a constant 2, bypassing standard dynamic checks and leaving the last slot uninitialized.

The Fix:

  • Injected pt.switch(pt.eq(node.inputs[0], 0), 0, ...) to strictly enforce an execution count of 0 when requested.
  • Added a pt.switch to the pval buffer sizing logic to strictly match the buffer size to the initial states (taps) when n_steps=0, preventing the symbolic compiler from allocating the extra uninitialized memory slot.
  • Added a regression test test_zero_steps_uninitialized_memory to TestSaveMem in tests/scan/test_rewriting.py.

Related Issue

Checklist

Type of change

  • New feature / enhancement
  • Bug fix
  • Documentation
  • Maintenance
  • Other (please specify):

@ayulockedin
Copy link
Contributor Author

ayulockedin commented Feb 20, 2026

@ricardoV94 is this alright for issue#1878, take a look it when you have a moment thx

@ricardoV94
Copy link
Member

Why do we ever do +1? Can you just remove and investigate what fails, and assess if that makes sense?

@ayulockedin
Copy link
Contributor Author

@ricardoV94
I tested this and completely reverted my pt.switch fixes and removed the + 1 from the pval buffer calculation (pval = select_max(nw_steps - start + init_l[i], init_l[i])) to see if the padding was just legacy debt.

Running the test suite resulted in 9 specific failures that perfectly explain why the + 1 is there:

  1. Aliasing Failures (test_aliased_inner_outputs): Without the + 1 slot, the memory optimizer causes the current step to overwrite a historical state (tap) while it is still being read by another node in the graph, resulting in completely incorrect numerical outputs.

  2. Explicit Memory Assertions (test_foldl_memory_consumption): The test suite explicitly checks that f1().shape[0] == 2 (rather than 1) specifically "to avoid aliasing between the inputs and the outputs".

So the + 1 is a hard architectural requirement to prevent read/write collisions in the cyclic buffer during execution.

Because the padding is structurally required when n_steps > 0, but becomes a dangerous uninitialized memory slot when n_steps == 0, wrapping the buffer calculation in a pt.switch tied to n_steps == 0 seems to be the safest way to conditionally collapse the padding without breaking the aliasing protections.

Let me know if you agree with this assessment or if you'd like me to look into a different way to enforce the boundary!

@ricardoV94
Copy link
Member

That + 1 is different than the max(length, 1), though. Unless those tests were running with zero steps it shouldn't be relevant

@ayulockedin
Copy link
Contributor Author

@ricardoV94 You are completely right, the + 1 padding for anti-aliasing and the max(length, 1) loop forcing are distinct.
The reason the padding becomes an issue specifically when n_steps=0 is because of how the final state is sliced. Since the loop does not execute, that extra padded slot is never written to and remains uninitialized memory. However, the output logic still uses a [-1] slice to grab the "final" state, which ends up fetching that uninitialized garbage slot instead of the initial tap.

Since wrapping the core buffer allocation in a pt.switch is a heavy-handed way to handle this, would it be better to leave the buffer padding alone and instead conditionally adjust the output slice index?If we keep the memory allocation standard but use a pt.switch on the read side so that if n_steps==0 it slices the last initialized tap instead of the end of the padded buffer we avoid messing with the memory optimizer entirely.

Does that sound like a safer architectural approach to you? If so, I can revert the buffer size changes and prototype the fix on the slicing logic instead.

@ayulockedin
Copy link
Contributor Author

@ricardoV94 You were right; conditionally removing the + 1 padding broke static shape guarantees and caused those aliasing failures. I reverted the memory allocation to keep the optimizer happy, and instead put the pt.switch on the final output routing to safely bypass reading the uninitialized slot when n_steps=0. All tests are green locally now.

@ayulockedin
Copy link
Contributor Author

@ricardoV94 can u take a look at it, check it once when u have a bit time thx for your guidance tho :)

@ricardoV94
Copy link
Member

This is a tricky PR, I will need time to look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

scan_save_mem gives wrong results for empty scan

2 participants