-
Notifications
You must be signed in to change notification settings - Fork 349
ipc3: Fix spinlock violation in PM context save on fuzzer #10193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ipc3: Fix spinlock violation in PM context save on fuzzer #10193
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a spinlock validation assertion failure in native simulation builds with IPC3 by preventing hardware PM operations in POSIX environments. The recent Zephyr spinlock validation detects context switching while holding locks, which occurs when ipc_pm_context_save calls arch_irq_lock before the EDF work queue yields.
- Extends the existing guard condition to exclude hardware PM operations when
CONFIG_ZEPHYR_POSIXis defined - Preserves PM functionality while avoiding spinlock violations in native simulation builds
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
UT(https://github.com/thesofproject/sof/actions/runs/17238881393/job/48911240441?pr=10193) unfortunately already broken on the main branch (lucky for me). |
Fix spinlock validation assertion failure during fuzzing on native_sim builds with IPC3. Recent Zephyr commit added spinlock validation that detects context switching while holding spinlocks. This revealed that `ipc_pm_context_save` calls `arch_irq_lock` before the EDF work queue yields, causing: ASSERTION FAIL [arch_irq_unlocked(key) || ...] Context switching while holding lock! The hardware PM operations (arch_irq_lock, platform_timer_stop, etc.) are not needed for POSIX simulation environments. Extend the existing guard to exclude these operations when CONFIG_ZEPHYR_POSIX is defined. This preserves PM functionality while avoiding spinlock violations in native simulation builds. Signed-off-by: Tomasz Leman <tomasz.m.leman@intel.com>
lgirdwood
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dbaluta fyi
kv2019i
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some opens on why exactly the lock is held, but not really a reason against this change.
| platform_context_save(sof_get()); | ||
|
|
||
| #if !defined(CONFIG_LIBRARY) | ||
| #if !defined(CONFIG_LIBRARY) && !defined(CONFIG_ZEPHYR_POSIX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for late review. I'm still a bit baffled how come spinlock is held in the POSIX simulation runs. As far as I can tell, this shouldn't be the case. Do we have a bug lurking here somewhere?
The patch itself is ok. There is no need for these actions in simulator runs, but curious if @tmleman you looked up the where the lock is taken?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In line 681:
/* mask all DSP interrupts */
arch_irq_lock();There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tmleman Aa, right, sorry, I thought the assert is hit on L681. But ack, no, we take the lock there, and in real hw we go to D3 (and code won't return) but in simulation there is a context switch and this then hits an assert. Thanks for clarifying, then this is ok!
Fix spinlock validation assertion failure during fuzzing on native_sim builds with IPC3.
Recent Zephyr commit added spinlock validation that detects context switching while holding spinlocks. This revealed that
ipc_pm_context_savecallsarch_irq_lockbefore the EDF work queue yields, causing:ASSERTION FAIL [arch_irq_unlocked(key) || ...] Context switching while holding lock!
The hardware PM operations (arch_irq_lock, platform_timer_stop, etc.) are not needed for POSIX simulation environments. Extend the existing guard to exclude these operations when CONFIG_ZEPHYR_POSIX is defined.
This preserves PM functionality while avoiding spinlock violations in native simulation builds.