When a pto.tmov instruction has the same source and destination (identity move), ptoas still treats it as a data-movement op and inserts sync instructions around it. These spurious syncs cause synchronization errors that hang the kernel on A5. The same code passes on A3 due to different sync behavior between the two platforms.
Root Cause
ptoas unconditionally inserts sync instructions around every pto.tmov. When the tmov is a no-op (dst == src), the syncs are unnecessary and harmful.
Suggested Fix
ptoas should detect identity pto.tmov instructions (where source and destination resolve to the same address/MemRef) and either:
- Skip sync insertion for identity moves, or
- Eliminate identity
pto.tmov instructions entirely
Reproducer
fa5_softmax_rescale.py
Related
Note
A complementary fix has been applied on the pypto side (hw-native-sys/pypto#836) to avoid emitting identity pto.tmov in the first place. However, ptoas should still handle this case defensively.
When a
pto.tmovinstruction has the same source and destination (identity move), ptoas still treats it as a data-movement op and inserts sync instructions around it. These spurious syncs cause synchronization errors that hang the kernel on A5. The same code passes on A3 due to different sync behavior between the two platforms.Root Cause
ptoas unconditionally inserts sync instructions around every
pto.tmov. When the tmov is a no-op (dst == src), the syncs are unnecessary and harmful.Suggested Fix
ptoas should detect identity
pto.tmovinstructions (where source and destination resolve to the same address/MemRef) and either:pto.tmovinstructions entirelyReproducer
fa5_softmax_rescale.py
Related
Note
A complementary fix has been applied on the pypto side (hw-native-sys/pypto#836) to avoid emitting identity
pto.tmovin the first place. However, ptoas should still handle this case defensively.