-
Notifications
You must be signed in to change notification settings - Fork 349
userspace: remove cache primitive usage from audio code #10232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/include/sof/audio/audio_stream.h
Outdated
| #if CONFIG_USERSPACE | ||
| #include <zephyr/kernel.h> | ||
|
|
||
| static inline bool _audio_stream_is_user_thread(void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find a ready helper for this in Zephyr, but this would surely be put to some common helper function. Using these local definitions for now in this RFC PR.
@lyakh @softwarecki have you found a suitable helper for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kv2019i these functions could take a buffer pointer as an argument and check source / sink component(s) to be also usable outside of the audio streaming context. As long as we don't have any even prototype code for user-space LL components, I'd suggest to have this function always return false, and in fact its "ring" analog could just always return true because IIUC that's how those buffers are currently allocated because of multicore processing. In fact I'm not sure we need this ATM - as long as only DP goes to the user-space and it's already uncached.
As for the helper - you could use k_is_user_context() for the current thread, but I don't think that's what you need. You need to check the respective component run-time context. I think our first implementation will have all LL in kernel and all DP in user-space, so we could so far just hard-code that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lyakh ack RFC, this is not needed for simple DP modules.
jsarha
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should work AFAICT. And there hardly is any other way solve this without even touching the modules.
src/include/sof/audio/buffer.h
Outdated
|
|
||
| /* buffer usage */ | ||
| #define BUFFER_USAGE_SHARED true /* buffer used by multiple DSP core and/or HW blocks */ | ||
| #define BUFFER_USAGE_NOT_SHARED false /* buffer only used by one HW block */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like it's always not_shared?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lyakh I will use SHARED in the next patch.
| struct comp_buffer *buffer = buffer_alloc(buff_size, memory_flags, | ||
| PLATFORM_DCACHE_ALIGN, false); | ||
| PLATFORM_DCACHE_ALIGN, | ||
| BUFFER_USAGE_NOT_SHARED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calls to buffer_new() (also always false) should be changed too then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in #10230
src/include/sof/audio/audio_stream.h
Outdated
| #if CONFIG_USERSPACE | ||
| #include <zephyr/kernel.h> | ||
|
|
||
| static inline bool _audio_stream_is_user_thread(void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kv2019i these functions could take a buffer pointer as an argument and check source / sink component(s) to be also usable outside of the audio streaming context. As long as we don't have any even prototype code for user-space LL components, I'd suggest to have this function always return false, and in fact its "ring" analog could just always return true because IIUC that's how those buffers are currently allocated because of multicore processing. In fact I'm not sure we need this ATM - as long as only DP goes to the user-space and it's already uncached.
As for the helper - you could use k_is_user_context() for the current thread, but I don't think that's what you need. You need to check the respective component run-time context. I think our first implementation will have all LL in kernel and all DP in user-space, so we could so far just hard-code that.
src/audio/buffers/comp_buffer.c
Outdated
| * cache control is not possible in user-space, so cross-core | ||
| * buffers must be allocated as coherent | ||
| */ | ||
| if (is_shared && _buffer_is_user_thread()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this assumes that buffers are allocated from the same context, where they'll be used. This likely will be the case with my upcoming DP-scheduling changes, but it probably isn't the case yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lyakh Ack, this is an assumption I make here and thus the high volume of asserts elsewhere to trap cases where assumption doesn't hold. To fully solve this, we need more complex actual code to be moved to user-space --- i..e. a design how buffers get allocated.
lgirdwood
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kv2019i I think we can use intel_adsp_hda_dma_link_reload() and friends to signal to DMA driver that cache WB/INV ops should be performed. These will already be in ring 0 so no extra syscall.
src/audio/chain_dma.c
Outdated
| /* allocate not shared buffer */ | ||
| cd->dma_buffer = buffer_alloc(fifo_size, SOF_MEM_FLAG_USER | SOF_MEM_FLAG_DMA, | ||
| addr_align, BUFFER_USAGE_NOT_SHARED); | ||
| addr_align, BUFFER_USAGE_SHARED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will impact performance on some platforms so these buffer should still remain cached. I think the only user for DMA buffer is the copier ? We really need to add the WB/INV operation to the Zephyr Intel ADSP HD DMA driver so that memory is made coherent on DMA operations in ring 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lgirdwood In this specific case, it does not impact performance as 1) DSP does not read/write to the buffers that are dma_reloaded(), and also 2) this PR uses uncached only if chain dma is moved to user-space (no perf impact to current build where chain dma is in kernel space).
But it general point applies, in host-zephyr.c/dai-zephyr.c, if moved as-is to user-space, will have a performance hit. I'm not looking into options how to implement this with Zephyr DMA API. We currently lack means to relay information (like in Linux DMA-API) on whether the reloaded buffer was touched by DSP or not.
This PR would be an interim solution that allows to move the code to user-space, with a performance hit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kv2019i if we put host/dai modules in kernel then we remove this an integration surface and have to copy/synchronize 2 copies per LL pipeline and context switch for each host/dai in the pipeline.
Its preferable if we can keep the entire LL thread in user E2E. Yes we will have 2 system calls today for dma_status() and dma_reload() , but we can roll both into a single new system call based on host_copy_normal(). This would mean 1 context switch per host/dai and would align to cost of moving host/dai to kernel.
I'm also confident we have the R/W position context today in host/dai so that when we call dma_status() we can pass in our old R/W positions in stat and this would then allow
int intel_adsp_hda_dma_status(const struct device *dev, uint32_t channel,
struct dma_status *stat)
{
/* if passed in stat read/write position is !NULL then WB/INV delta position between old and current*/
}This should also mean minimal API change (just an addition of data by the caller).
To ease moving audio code to user-space, modify the logic for allocating shared buffers so that if a buffer is shared (accessed by multiple cores), and code is run in user-space, the buffer is allocated uncached. Similarly, in invalidate and writeback functions, check for matching conditions and skip the cache operations accordingly. This approach allows to reuse most of the audio module code in user-space and even have shared buffers between different user-space threads running on different cores. There is no impact to builds without user-space, or when running modules in user-space without shared buffers. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
e68c92b to
3e19e01
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR removes cache primitive usage from audio code to enable future userspace compatibility. The changes ensure that cache operations are skipped in userspace contexts since cache control is not available there, with userspace shared buffers allocated as uncached/coherent instead.
- Add userspace checks to skip cache operations when running in user context
- Modify buffer allocation to use coherent memory flags for shared buffers in userspace
- Include necessary Zephyr syscall headers for userspace detection
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/include/sof/audio/buffer.h | Add userspace checks to skip cache operations in buffer stream functions |
| src/include/sof/audio/audio_stream.h | Add userspace guards around cache invalidate/writeback operations |
| src/audio/buffers/ring_buffer.c | Skip cache operations in userspace and use coherent allocation for shared buffers |
| src/audio/buffers/comp_buffer.c | Use coherent memory allocation for shared buffers in userspace context |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| #if CONFIG_USERSPACE | ||
| /* user-space shared buffers are allocated as uncached */ | ||
| if (k_is_user_context()) { | ||
| __ASSERT_NO_MSG(sys_cache_is_ptr_cached(buffer->addr) == false); |
Copilot
AI
Sep 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider using !sys_cache_is_ptr_cached(buffer->addr) instead of == false for better readability and consistency with C conventions.
| __ASSERT_NO_MSG(sys_cache_is_ptr_cached(buffer->addr) == false); | |
| __ASSERT_NO_MSG(!sys_cache_is_ptr_cached(buffer->addr)); |
| #if CONFIG_USERSPACE | ||
| /* user-space shared buffers are allocated as uncached */ | ||
| if (k_is_user_context()) { | ||
| __ASSERT_NO_MSG(sys_cache_is_ptr_cached(buffer->addr) == false); |
Copilot
AI
Sep 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider using !sys_cache_is_ptr_cached(buffer->addr) instead of == false for better readability and consistency with C conventions.
| __ASSERT_NO_MSG(sys_cache_is_ptr_cached(buffer->addr) == false); | |
| __ASSERT_NO_MSG(!sys_cache_is_ptr_cached(buffer->addr)); |
| #if CONFIG_USERSPACE | ||
| /* user-space shared buffers are allocated as uncached */ | ||
| if (k_is_user_context()) { | ||
| __ASSERT_NO_MSG(sys_cache_is_ptr_cached(ptr) == false); |
Copilot
AI
Sep 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider using !sys_cache_is_ptr_cached(ptr) instead of == false for better readability and consistency with C conventions.
| __ASSERT_NO_MSG(sys_cache_is_ptr_cached(ptr) == false); | |
| __ASSERT_NO_MSG(!sys_cache_is_ptr_cached(ptr)); |
| #if CONFIG_USERSPACE | ||
| /* user-space shared buffers are allocated as uncached */ | ||
| if (k_is_user_context()) { | ||
| __ASSERT_NO_MSG(sys_cache_is_ptr_cached(ptr) == false); |
Copilot
AI
Sep 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider using !sys_cache_is_ptr_cached(ptr) instead of == false for better readability and consistency with C conventions.
| __ASSERT_NO_MSG(sys_cache_is_ptr_cached(ptr) == false); | |
| __ASSERT_NO_MSG(!sys_cache_is_ptr_cached(ptr)); |
|
V2:
|
lgirdwood
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't be using uncache for buffers since performance impact can be quite large, better to make a sof_buffer_wb() and sof_buffer_inv() syscalls that can be called for cross core piipelines for this use case. Yes we have syscall impact, but my feeling is that this will be much cheaper than an uncache DP buffer with 10 or 20ms period.
| #if CONFIG_USERSPACE | ||
| /* | ||
| * cache control is not possible in user-space, so cross-core | ||
| * buffers must be allocated as coherent | ||
| */ | ||
| if (is_shared && k_is_user_context()) | ||
| flags |= SOF_MEM_FLAG_COHERENT; | ||
| #endif | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets not do this as performance impact is quite large, in this case it may be a lot faster to have a SOF syscall for buffer WB and INV.
Here's a series to ease moving audio code (modules or even the DAI code) to user-space by removing calls to cache primitives, which are not available in user-space (not supported in most hw architectures).
This is RFC as this is a no-op currently as we don't have userspace enabled in SOF yet, but sharing early to get feedback and align with other ongoing PRs.
First commit submitted separately as #10230 as that is useful even without user-space.