Skip to content

Conversation

@kv2019i
Copy link
Collaborator

@kv2019i kv2019i commented Sep 11, 2025

Here's a series to ease moving audio code (modules or even the DAI code) to user-space by removing calls to cache primitives, which are not available in user-space (not supported in most hw architectures).

This is RFC as this is a no-op currently as we don't have userspace enabled in SOF yet, but sharing early to get feedback and align with other ongoing PRs.

First commit submitted separately as #10230 as that is useful even without user-space.

@kv2019i
Copy link
Collaborator Author

kv2019i commented Sep 11, 2025

@kv2019i kv2019i changed the title [SKIP SOFCI-TEST][RFC] proposal to remove cache primitive usage from audio code [SKIP SOFCI-TEST][RFC] userspace: proposal to remove cache primitive usage from audio code Sep 11, 2025
#if CONFIG_USERSPACE
#include <zephyr/kernel.h>

static inline bool _audio_stream_is_user_thread(void)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find a ready helper for this in Zephyr, but this would surely be put to some common helper function. Using these local definitions for now in this RFC PR.

@lyakh @softwarecki have you found a suitable helper for this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kv2019i these functions could take a buffer pointer as an argument and check source / sink component(s) to be also usable outside of the audio streaming context. As long as we don't have any even prototype code for user-space LL components, I'd suggest to have this function always return false, and in fact its "ring" analog could just always return true because IIUC that's how those buffers are currently allocated because of multicore processing. In fact I'm not sure we need this ATM - as long as only DP goes to the user-space and it's already uncached.
As for the helper - you could use k_is_user_context() for the current thread, but I don't think that's what you need. You need to check the respective component run-time context. I think our first implementation will have all LL in kernel and all DP in user-space, so we could so far just hard-code that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh ack RFC, this is not needed for simple DP modules.

Copy link
Contributor

@jsarha jsarha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should work AFAICT. And there hardly is any other way solve this without even touching the modules.


/* buffer usage */
#define BUFFER_USAGE_SHARED true /* buffer used by multiple DSP core and/or HW blocks */
#define BUFFER_USAGE_NOT_SHARED false /* buffer only used by one HW block */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like it's always not_shared?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh I will use SHARED in the next patch.

struct comp_buffer *buffer = buffer_alloc(buff_size, memory_flags,
PLATFORM_DCACHE_ALIGN, false);
PLATFORM_DCACHE_ALIGN,
BUFFER_USAGE_NOT_SHARED);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calls to buffer_new() (also always false) should be changed too then?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #10230

#if CONFIG_USERSPACE
#include <zephyr/kernel.h>

static inline bool _audio_stream_is_user_thread(void)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kv2019i these functions could take a buffer pointer as an argument and check source / sink component(s) to be also usable outside of the audio streaming context. As long as we don't have any even prototype code for user-space LL components, I'd suggest to have this function always return false, and in fact its "ring" analog could just always return true because IIUC that's how those buffers are currently allocated because of multicore processing. In fact I'm not sure we need this ATM - as long as only DP goes to the user-space and it's already uncached.
As for the helper - you could use k_is_user_context() for the current thread, but I don't think that's what you need. You need to check the respective component run-time context. I think our first implementation will have all LL in kernel and all DP in user-space, so we could so far just hard-code that.

* cache control is not possible in user-space, so cross-core
* buffers must be allocated as coherent
*/
if (is_shared && _buffer_is_user_thread())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this assumes that buffers are allocated from the same context, where they'll be used. This likely will be the case with my upcoming DP-scheduling changes, but it probably isn't the case yet

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh Ack, this is an assumption I make here and thus the high volume of asserts elsewhere to trap cases where assumption doesn't hold. To fully solve this, we need more complex actual code to be moved to user-space --- i..e. a design how buffers get allocated.

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kv2019i I think we can use intel_adsp_hda_dma_link_reload() and friends to signal to DMA driver that cache WB/INV ops should be performed. These will already be in ring 0 so no extra syscall.

/* allocate not shared buffer */
cd->dma_buffer = buffer_alloc(fifo_size, SOF_MEM_FLAG_USER | SOF_MEM_FLAG_DMA,
addr_align, BUFFER_USAGE_NOT_SHARED);
addr_align, BUFFER_USAGE_SHARED);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will impact performance on some platforms so these buffer should still remain cached. I think the only user for DMA buffer is the copier ? We really need to add the WB/INV operation to the Zephyr Intel ADSP HD DMA driver so that memory is made coherent on DMA operations in ring 0.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lgirdwood In this specific case, it does not impact performance as 1) DSP does not read/write to the buffers that are dma_reloaded(), and also 2) this PR uses uncached only if chain dma is moved to user-space (no perf impact to current build where chain dma is in kernel space).

But it general point applies, in host-zephyr.c/dai-zephyr.c, if moved as-is to user-space, will have a performance hit. I'm not looking into options how to implement this with Zephyr DMA API. We currently lack means to relay information (like in Linux DMA-API) on whether the reloaded buffer was touched by DSP or not.

This PR would be an interim solution that allows to move the code to user-space, with a performance hit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kv2019i if we put host/dai modules in kernel then we remove this an integration surface and have to copy/synchronize 2 copies per LL pipeline and context switch for each host/dai in the pipeline.

Its preferable if we can keep the entire LL thread in user E2E. Yes we will have 2 system calls today for dma_status() and dma_reload() , but we can roll both into a single new system call based on host_copy_normal(). This would mean 1 context switch per host/dai and would align to cost of moving host/dai to kernel.

I'm also confident we have the R/W position context today in host/dai so that when we call dma_status() we can pass in our old R/W positions in stat and this would then allow

int intel_adsp_hda_dma_status(const struct device *dev, uint32_t channel,
	struct dma_status *stat)
{
    /* if passed in stat read/write position is !NULL then WB/INV delta position between old and current*/
}

This should also mean minimal API change (just an addition of data by the caller).

To ease moving audio code to user-space, modify the logic for allocating
shared buffers so that if a buffer is shared (accessed by multiple
cores), and code is run in user-space, the buffer is allocated uncached.

Similarly, in invalidate and writeback functions, check for matching
conditions and skip the cache operations accordingly.

This approach allows to reuse most of the audio module code in
user-space and even have shared buffers between different user-space
threads running on different cores.

There is no impact to builds without user-space, or when running
modules in user-space without shared buffers.

Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
@kv2019i kv2019i force-pushed the 202509-cacheops-for-userspace branch from e68c92b to 3e19e01 Compare September 19, 2025 13:00
@kv2019i kv2019i changed the title [SKIP SOFCI-TEST][RFC] userspace: proposal to remove cache primitive usage from audio code userspace: remove cache primitive usage from audio code Sep 19, 2025
@kv2019i kv2019i marked this pull request as ready for review September 19, 2025 13:01
Copilot AI review requested due to automatic review settings September 19, 2025 13:01
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes cache primitive usage from audio code to enable future userspace compatibility. The changes ensure that cache operations are skipped in userspace contexts since cache control is not available there, with userspace shared buffers allocated as uncached/coherent instead.

  • Add userspace checks to skip cache operations when running in user context
  • Modify buffer allocation to use coherent memory flags for shared buffers in userspace
  • Include necessary Zephyr syscall headers for userspace detection

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
src/include/sof/audio/buffer.h Add userspace checks to skip cache operations in buffer stream functions
src/include/sof/audio/audio_stream.h Add userspace guards around cache invalidate/writeback operations
src/audio/buffers/ring_buffer.c Skip cache operations in userspace and use coherent allocation for shared buffers
src/audio/buffers/comp_buffer.c Use coherent memory allocation for shared buffers in userspace context

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

#if CONFIG_USERSPACE
/* user-space shared buffers are allocated as uncached */
if (k_is_user_context()) {
__ASSERT_NO_MSG(sys_cache_is_ptr_cached(buffer->addr) == false);
Copy link

Copilot AI Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider using !sys_cache_is_ptr_cached(buffer->addr) instead of == false for better readability and consistency with C conventions.

Suggested change
__ASSERT_NO_MSG(sys_cache_is_ptr_cached(buffer->addr) == false);
__ASSERT_NO_MSG(!sys_cache_is_ptr_cached(buffer->addr));

Copilot uses AI. Check for mistakes.
#if CONFIG_USERSPACE
/* user-space shared buffers are allocated as uncached */
if (k_is_user_context()) {
__ASSERT_NO_MSG(sys_cache_is_ptr_cached(buffer->addr) == false);
Copy link

Copilot AI Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider using !sys_cache_is_ptr_cached(buffer->addr) instead of == false for better readability and consistency with C conventions.

Suggested change
__ASSERT_NO_MSG(sys_cache_is_ptr_cached(buffer->addr) == false);
__ASSERT_NO_MSG(!sys_cache_is_ptr_cached(buffer->addr));

Copilot uses AI. Check for mistakes.
#if CONFIG_USERSPACE
/* user-space shared buffers are allocated as uncached */
if (k_is_user_context()) {
__ASSERT_NO_MSG(sys_cache_is_ptr_cached(ptr) == false);
Copy link

Copilot AI Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider using !sys_cache_is_ptr_cached(ptr) instead of == false for better readability and consistency with C conventions.

Suggested change
__ASSERT_NO_MSG(sys_cache_is_ptr_cached(ptr) == false);
__ASSERT_NO_MSG(!sys_cache_is_ptr_cached(ptr));

Copilot uses AI. Check for mistakes.
#if CONFIG_USERSPACE
/* user-space shared buffers are allocated as uncached */
if (k_is_user_context()) {
__ASSERT_NO_MSG(sys_cache_is_ptr_cached(ptr) == false);
Copy link

Copilot AI Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider using !sys_cache_is_ptr_cached(ptr) instead of == false for better readability and consistency with C conventions.

Suggested change
__ASSERT_NO_MSG(sys_cache_is_ptr_cached(ptr) == false);
__ASSERT_NO_MSG(!sys_cache_is_ptr_cached(ptr));

Copilot uses AI. Check for mistakes.
@kv2019i
Copy link
Collaborator Author

kv2019i commented Sep 19, 2025

V2:

  • drop dai/host DMA related patches from series (different design to be used, discussed with @lgirdwood )
  • use zephyr/syscall.h helpers directly, no need to add new

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't be using uncache for buffers since performance impact can be quite large, better to make a sof_buffer_wb() and sof_buffer_inv() syscalls that can be called for cross core piipelines for this use case. Yes we have syscall impact, but my feeling is that this will be much cheaper than an uncache DP buffer with 10 or 20ms period.

Comment on lines +239 to +247
#if CONFIG_USERSPACE
/*
* cache control is not possible in user-space, so cross-core
* buffers must be allocated as coherent
*/
if (is_shared && k_is_user_context())
flags |= SOF_MEM_FLAG_COHERENT;
#endif

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not do this as performance impact is quite large, in this case it may be a lot faster to have a SOF syscall for buffer WB and INV.

@kv2019i
Copy link
Collaborator Author

kv2019i commented Sep 24, 2025

Taking difference direction based on review feedback. Replaced with #10258 and #10259 .

@kv2019i kv2019i closed this Sep 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants