Fix scan_save_mem uninitialized memory read when n_steps=0 (fixes #1878)#1902
Fix scan_save_mem uninitialized memory read when n_steps=0 (fixes #1878)#1902ayulockedin wants to merge 2 commits intopymc-devs:mainfrom
Conversation
|
@ricardoV94 is this alright for issue#1878, take a look it when you have a moment thx |
|
Why do we ever do +1? Can you just remove and investigate what fails, and assess if that makes sense? |
|
@ricardoV94 Running the test suite resulted in 9 specific failures that perfectly explain why the + 1 is there:
So the + 1 is a hard architectural requirement to prevent read/write collisions in the cyclic buffer during execution. Because the padding is structurally required when n_steps > 0, but becomes a dangerous uninitialized memory slot when n_steps == 0, wrapping the buffer calculation in a pt.switch tied to n_steps == 0 seems to be the safest way to conditionally collapse the padding without breaking the aliasing protections. Let me know if you agree with this assessment or if you'd like me to look into a different way to enforce the boundary! |
|
That + 1 is different than the max(length, 1), though. Unless those tests were running with zero steps it shouldn't be relevant |
|
@ricardoV94 You are completely right, the + 1 padding for anti-aliasing and the max(length, 1) loop forcing are distinct. Since wrapping the core buffer allocation in a pt.switch is a heavy-handed way to handle this, would it be better to leave the buffer padding alone and instead conditionally adjust the output slice index?If we keep the memory allocation standard but use a pt.switch on the read side so that if n_steps==0 it slices the last initialized tap instead of the end of the padded buffer we avoid messing with the memory optimizer entirely. Does that sound like a safer architectural approach to you? If so, I can revert the buffer size changes and prototype the fix on the slicing logic instead. |
|
@ricardoV94 You were right; conditionally removing the + 1 padding broke static shape guarantees and caused those aliasing failures. I reverted the memory allocation to keep the optimizer happy, and instead put the pt.switch on the final output routing to safely bypass reading the uninitialized slot when n_steps=0. All tests are green locally now. |
|
@ricardoV94 can u take a look at it, check it once when u have a bit time thx for your guidance tho :) |
|
This is a tricky PR, I will need time to look into it. |
Description
When
n_steps=0, thescan_save_memrewrite was returning uninitialized memory when the output buffer was sliced with[-1].Root Cause: 1. The optimization forced a minimum loop iteration count of 1 via
select_max(nw_steps, 1), causing a dummy execution.2. The pre-allocated cyclical buffer unconditionally padded its size (
init_l[i] + 1). Becausestartis calculated asnw_steps - 1for[-1]slices, algebraic simplification evaluated the buffer size to a constant2, bypassing standard dynamic checks and leaving the last slot uninitialized.The Fix:
pt.switch(pt.eq(node.inputs[0], 0), 0, ...)to strictly enforce an execution count of 0 when requested.pt.switchto thepvalbuffer sizing logic to strictly match the buffer size to the initial states (taps) whenn_steps=0, preventing the symbolic compiler from allocating the extra uninitialized memory slot.test_zero_steps_uninitialized_memorytoTestSaveMemintests/scan/test_rewriting.py.Related Issue
scan_save_memgives wrong results for empty scan #1878Checklist
Type of change