Skip to content

Conversation

@tmleman
Copy link
Contributor

@tmleman tmleman commented Aug 26, 2025

Fix spinlock validation assertion failure during fuzzing on native_sim builds with IPC3.

Recent Zephyr commit added spinlock validation that detects context switching while holding spinlocks. This revealed that ipc_pm_context_save calls arch_irq_lock before the EDF work queue yields, causing:

ASSERTION FAIL [arch_irq_unlocked(key) || ...] Context switching while holding lock!

The hardware PM operations (arch_irq_lock, platform_timer_stop, etc.) are not needed for POSIX simulation environments. Extend the existing guard to exclude these operations when CONFIG_ZEPHYR_POSIX is defined.

This preserves PM functionality while avoiding spinlock violations in native simulation builds.

Copilot AI review requested due to automatic review settings August 26, 2025 12:58
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a spinlock validation assertion failure in native simulation builds with IPC3 by preventing hardware PM operations in POSIX environments. The recent Zephyr spinlock validation detects context switching while holding locks, which occurs when ipc_pm_context_save calls arch_irq_lock before the EDF work queue yields.

  • Extends the existing guard condition to exclude hardware PM operations when CONFIG_ZEPHYR_POSIX is defined
  • Preserves PM functionality while avoiding spinlock violations in native simulation builds

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@tmleman
Copy link
Contributor Author

tmleman commented Aug 26, 2025

UT(https://github.com/thesofproject/sof/actions/runs/17238881393/job/48911240441?pr=10193) unfortunately already broken on the main branch (lucky for me).

Fix spinlock validation assertion failure during fuzzing on native_sim
builds with IPC3.

Recent Zephyr commit added spinlock validation that detects context
switching while holding spinlocks. This revealed that
`ipc_pm_context_save` calls `arch_irq_lock` before the EDF work queue
yields, causing:

ASSERTION FAIL [arch_irq_unlocked(key) || ...] Context switching while
holding lock!

The hardware PM operations (arch_irq_lock, platform_timer_stop, etc.)
are not needed for POSIX simulation environments. Extend the existing
guard to exclude these operations when CONFIG_ZEPHYR_POSIX is defined.

This preserves PM functionality while avoiding spinlock violations in
native simulation builds.

Signed-off-by: Tomasz Leman <tomasz.m.leman@intel.com>
Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbaluta fyi

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some opens on why exactly the lock is held, but not really a reason against this change.

platform_context_save(sof_get());

#if !defined(CONFIG_LIBRARY)
#if !defined(CONFIG_LIBRARY) && !defined(CONFIG_ZEPHYR_POSIX)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late review. I'm still a bit baffled how come spinlock is held in the POSIX simulation runs. As far as I can tell, this shouldn't be the case. Do we have a bug lurking here somewhere?

The patch itself is ok. There is no need for these actions in simulator runs, but curious if @tmleman you looked up the where the lock is taken?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In line 681:

	/* mask all DSP interrupts */
	arch_irq_lock();

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmleman Aa, right, sorry, I thought the assert is hit on L681. But ack, no, we take the lock there, and in real hw we go to D3 (and code won't return) but in simulation there is a context switch and this then hits an assert. Thanks for clarifying, then this is ok!

@kv2019i kv2019i merged commit cf8944e into thesofproject:main Aug 28, 2025
38 of 46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants