userspace: remove cache primitive usage from audio code #10232

kv2019i · 2025-09-11T10:11:10Z

Here's a series to ease moving audio code (modules or even the DAI code) to user-space by removing calls to cache primitives, which are not available in user-space (not supported in most hw architectures).

This is RFC as this is a no-op currently as we don't have userspace enabled in SOF yet, but sharing early to get feedback and align with other ongoing PRs.

First commit submitted separately as #10230 as that is useful even without user-space.

kv2019i · 2025-09-11T10:13:35Z

FYI @softwarecki @lyakh @jsarha @lgirdwood

kv2019i · 2025-09-11T10:16:11Z

src/include/sof/audio/audio_stream.h

+#if CONFIG_USERSPACE
+#include <zephyr/kernel.h>
+
+static inline bool _audio_stream_is_user_thread(void)


I couldn't find a ready helper for this in Zephyr, but this would surely be put to some common helper function. Using these local definitions for now in this RFC PR.

@lyakh @softwarecki have you found a suitable helper for this?

@kv2019i I suggest https://github.com/thesofproject/sof/blob/main/zephyr/include/rtos/userspace_helper.h

@kv2019i these functions could take a buffer pointer as an argument and check source / sink component(s) to be also usable outside of the audio streaming context. As long as we don't have any even prototype code for user-space LL components, I'd suggest to have this function always return false, and in fact its "ring" analog could just always return true because IIUC that's how those buffers are currently allocated because of multicore processing. In fact I'm not sure we need this ATM - as long as only DP goes to the user-space and it's already uncached.
As for the helper - you could use k_is_user_context() for the current thread, but I don't think that's what you need. You need to check the respective component run-time context. I think our first implementation will have all LL in kernel and all DP in user-space, so we could so far just hard-code that.

@lyakh ack RFC, this is not needed for simple DP modules.

jsarha

Should work AFAICT. And there hardly is any other way solve this without even touching the modules.

lyakh · 2025-09-12T06:15:40Z

src/include/sof/audio/buffer.h


+/* buffer usage */
+#define BUFFER_USAGE_SHARED	true	/* buffer used by multiple DSP core and/or HW blocks */
+#define BUFFER_USAGE_NOT_SHARED false	/* buffer only used by one HW block */


looks like it's always not_shared?

@lyakh I will use SHARED in the next patch.

lyakh · 2025-09-12T06:20:38Z

src/audio/module_adapter/module_adapter.c

 			struct comp_buffer *buffer = buffer_alloc(buff_size, memory_flags,
-								  PLATFORM_DCACHE_ALIGN, false);
+								  PLATFORM_DCACHE_ALIGN,
+								  BUFFER_USAGE_NOT_SHARED);


calls to buffer_new() (also always false) should be changed too then?

Done in #10230

lyakh · 2025-09-12T06:34:05Z

src/include/sof/audio/audio_stream.h

+#if CONFIG_USERSPACE
+#include <zephyr/kernel.h>
+
+static inline bool _audio_stream_is_user_thread(void)


@kv2019i these functions could take a buffer pointer as an argument and check source / sink component(s) to be also usable outside of the audio streaming context. As long as we don't have any even prototype code for user-space LL components, I'd suggest to have this function always return false, and in fact its "ring" analog could just always return true because IIUC that's how those buffers are currently allocated because of multicore processing. In fact I'm not sure we need this ATM - as long as only DP goes to the user-space and it's already uncached.
As for the helper - you could use k_is_user_context() for the current thread, but I don't think that's what you need. You need to check the respective component run-time context. I think our first implementation will have all LL in kernel and all DP in user-space, so we could so far just hard-code that.

lyakh · 2025-09-12T06:39:02Z

src/audio/buffers/comp_buffer.c

+	 * cache control is not possible in user-space, so cross-core
+	 * buffers must be allocated as coherent
+	 */
+	if (is_shared && _buffer_is_user_thread())


this assumes that buffers are allocated from the same context, where they'll be used. This likely will be the case with my upcoming DP-scheduling changes, but it probably isn't the case yet

@lyakh Ack, this is an assumption I make here and thus the high volume of asserts elsewhere to trap cases where assumption doesn't hold. To fully solve this, we need more complex actual code to be moved to user-space --- i..e. a design how buffers get allocated.

lgirdwood

@kv2019i I think we can use intel_adsp_hda_dma_link_reload() and friends to signal to DMA driver that cache WB/INV ops should be performed. These will already be in ring 0 so no extra syscall.

lgirdwood · 2025-09-15T11:56:56Z

src/audio/chain_dma.c

 	/* allocate not shared buffer */
 	cd->dma_buffer = buffer_alloc(fifo_size, SOF_MEM_FLAG_USER | SOF_MEM_FLAG_DMA,
-				      addr_align, BUFFER_USAGE_NOT_SHARED);
+				      addr_align, BUFFER_USAGE_SHARED);


This will impact performance on some platforms so these buffer should still remain cached. I think the only user for DMA buffer is the copier ? We really need to add the WB/INV operation to the Zephyr Intel ADSP HD DMA driver so that memory is made coherent on DMA operations in ring 0.

@lgirdwood In this specific case, it does not impact performance as 1) DSP does not read/write to the buffers that are dma_reloaded(), and also 2) this PR uses uncached only if chain dma is moved to user-space (no perf impact to current build where chain dma is in kernel space).

But it general point applies, in host-zephyr.c/dai-zephyr.c, if moved as-is to user-space, will have a performance hit. I'm not looking into options how to implement this with Zephyr DMA API. We currently lack means to relay information (like in Linux DMA-API) on whether the reloaded buffer was touched by DSP or not.

This PR would be an interim solution that allows to move the code to user-space, with a performance hit.

@kv2019i if we put host/dai modules in kernel then we remove this an integration surface and have to copy/synchronize 2 copies per LL pipeline and context switch for each host/dai in the pipeline.

Its preferable if we can keep the entire LL thread in user E2E. Yes we will have 2 system calls today for dma_status() and dma_reload() , but we can roll both into a single new system call based on host_copy_normal(). This would mean 1 context switch per host/dai and would align to cost of moving host/dai to kernel.

I'm also confident we have the R/W position context today in host/dai so that when we call dma_status() we can pass in our old R/W positions in stat and this would then allow

int intel_adsp_hda_dma_status(const struct device *dev, uint32_t channel, struct dma_status *stat) { /* if passed in stat read/write position is !NULL then WB/INV delta position between old and current*/ }

This should also mean minimal API change (just an addition of data by the caller).

To ease moving audio code to user-space, modify the logic for allocating shared buffers so that if a buffer is shared (accessed by multiple cores), and code is run in user-space, the buffer is allocated uncached. Similarly, in invalidate and writeback functions, check for matching conditions and skip the cache operations accordingly. This approach allows to reuse most of the audio module code in user-space and even have shared buffers between different user-space threads running on different cores. There is no impact to builds without user-space, or when running modules in user-space without shared buffers. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>

Copilot

Pull Request Overview

This PR removes cache primitive usage from audio code to enable future userspace compatibility. The changes ensure that cache operations are skipped in userspace contexts since cache control is not available there, with userspace shared buffers allocated as uncached/coherent instead.

Add userspace checks to skip cache operations when running in user context
Modify buffer allocation to use coherent memory flags for shared buffers in userspace
Include necessary Zephyr syscall headers for userspace detection

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
src/include/sof/audio/buffer.h	Add userspace checks to skip cache operations in buffer stream functions
src/include/sof/audio/audio_stream.h	Add userspace guards around cache invalidate/writeback operations
src/audio/buffers/ring_buffer.c	Skip cache operations in userspace and use coherent allocation for shared buffers
src/audio/buffers/comp_buffer.c	Use coherent memory allocation for shared buffers in userspace context

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-19T13:02:03Z

src/include/sof/audio/audio_stream.h

+#if CONFIG_USERSPACE
+	/* user-space shared buffers are allocated as uncached */
+	if (k_is_user_context()) {
+		__ASSERT_NO_MSG(sys_cache_is_ptr_cached(buffer->addr) == false);


[nitpick] Consider using !sys_cache_is_ptr_cached(buffer->addr) instead of == false for better readability and consistency with C conventions.

Suggested change

__ASSERT_NO_MSG(sys_cache_is_ptr_cached(buffer->addr) == false);

__ASSERT_NO_MSG(!sys_cache_is_ptr_cached(buffer->addr));

Copilot · 2025-09-19T13:02:03Z

src/include/sof/audio/audio_stream.h

+#if CONFIG_USERSPACE
+	/* user-space shared buffers are allocated as uncached */
+	if (k_is_user_context()) {
+		__ASSERT_NO_MSG(sys_cache_is_ptr_cached(buffer->addr) == false);


[nitpick] Consider using !sys_cache_is_ptr_cached(buffer->addr) instead of == false for better readability and consistency with C conventions.

Suggested change

__ASSERT_NO_MSG(sys_cache_is_ptr_cached(buffer->addr) == false);

__ASSERT_NO_MSG(!sys_cache_is_ptr_cached(buffer->addr));

Copilot · 2025-09-19T13:02:04Z

src/audio/buffers/ring_buffer.c

+#if CONFIG_USERSPACE
+	/* user-space shared buffers are allocated as uncached */
+	if (k_is_user_context()) {
+		__ASSERT_NO_MSG(sys_cache_is_ptr_cached(ptr) == false);


[nitpick] Consider using !sys_cache_is_ptr_cached(ptr) instead of == false for better readability and consistency with C conventions.

Suggested change

__ASSERT_NO_MSG(sys_cache_is_ptr_cached(ptr) == false);

__ASSERT_NO_MSG(!sys_cache_is_ptr_cached(ptr));

Copilot · 2025-09-19T13:02:04Z

src/audio/buffers/ring_buffer.c

+#if CONFIG_USERSPACE
+	/* user-space shared buffers are allocated as uncached */
+	if (k_is_user_context()) {
+		__ASSERT_NO_MSG(sys_cache_is_ptr_cached(ptr) == false);


[nitpick] Consider using !sys_cache_is_ptr_cached(ptr) instead of == false for better readability and consistency with C conventions.

Suggested change

__ASSERT_NO_MSG(sys_cache_is_ptr_cached(ptr) == false);

__ASSERT_NO_MSG(!sys_cache_is_ptr_cached(ptr));

kv2019i · 2025-09-19T13:02:15Z

V2:

drop dai/host DMA related patches from series (different design to be used, discussed with @lgirdwood )
use zephyr/syscall.h helpers directly, no need to add new

lgirdwood

We shouldn't be using uncache for buffers since performance impact can be quite large, better to make a sof_buffer_wb() and sof_buffer_inv() syscalls that can be called for cross core piipelines for this use case. Yes we have syscall impact, but my feeling is that this will be much cheaper than an uncache DP buffer with 10 or 20ms period.

lgirdwood · 2025-09-23T16:18:56Z

src/audio/buffers/comp_buffer.c

+#if CONFIG_USERSPACE
+	/*
+	 * cache control is not possible in user-space, so cross-core
+	 * buffers must be allocated as coherent
+	 */
+	if (is_shared && k_is_user_context())
+		flags |= SOF_MEM_FLAG_COHERENT;
+#endif
+


Lets not do this as performance impact is quite large, in this case it may be a lot faster to have a SOF syscall for buffer WB and INV.

kv2019i · 2025-09-24T19:37:58Z

Taking difference direction based on review feedback. Replaced with #10258 and #10259 .

kv2019i changed the title ~~[SKIP SOFCI-TEST][RFC] proposal to remove cache primitive usage from audio code~~ [SKIP SOFCI-TEST][RFC] userspace: proposal to remove cache primitive usage from audio code Sep 11, 2025

kv2019i commented Sep 11, 2025

View reviewed changes

jsarha approved these changes Sep 11, 2025

View reviewed changes

lyakh reviewed Sep 12, 2025

View reviewed changes

lgirdwood requested changes Sep 15, 2025

View reviewed changes

kv2019i force-pushed the 202509-cacheops-for-userspace branch from e68c92b to 3e19e01 Compare September 19, 2025 13:00

kv2019i changed the title ~~[SKIP SOFCI-TEST][RFC] userspace: proposal to remove cache primitive usage from audio code~~ userspace: remove cache primitive usage from audio code Sep 19, 2025

kv2019i marked this pull request as ready for review September 19, 2025 13:01

kv2019i requested review from mmaka1 and plbossart as code owners September 19, 2025 13:01

Copilot AI review requested due to automatic review settings September 19, 2025 13:01

kv2019i requested review from dbaluta and lbetlej as code owners September 19, 2025 13:01

Copilot AI reviewed Sep 19, 2025

View reviewed changes

lgirdwood requested changes Sep 23, 2025

View reviewed changes

kv2019i closed this Sep 24, 2025

	__ASSERT_NO_MSG(sys_cache_is_ptr_cached(buffer->addr) == false);
	__ASSERT_NO_MSG(!sys_cache_is_ptr_cached(buffer->addr));

userspace: remove cache primitive usage from audio code #10232

userspace: remove cache primitive usage from audio code #10232

Uh oh!

Conversation

kv2019i commented Sep 11, 2025

Uh oh!

kv2019i commented Sep 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jsarha left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lgirdwood left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

kv2019i commented Sep 19, 2025

Uh oh!

lgirdwood left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kv2019i commented Sep 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants