Skip to content

Conversation

@lyakh
Copy link
Collaborator

@lyakh lyakh commented Aug 4, 2025

This is the beginning of the user-space SOF work. To enable DP threads to run in user-space we want to start them early to also call all callbacks from their modules in the thread context. This PR does that - starts the thread early to prepare for the context conversion

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some questions.

pdata->mod = mod;
*task = &task_memory->task;

/* create a zephyr thread for the task */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth stating why we create it here in the comment, i.e. to do this and that etc. Next PR would then take stack size from topology i.e. stack_size = topology->stack_size ? topology->stack_size : DEFAULT_DP_STACK_SIZE;

mod->dev = dev;
dev->mod = mod;

#if CONFIG_ZEPHYR_DP_SCHEDULER
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw - I guess this will always be true now ? if so can be a later optimization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no legacy platforms with no DP? IPC3?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say no non Zephyr platforms now. Everything should be Zephyr.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this Kconfig already exists and it isn't about Zephyr, it is about DP. We could at some point remove "ZEPHYR" from SOF Kconfig names at toms point though, but that would be another task for our automation

comp_dbg(dev, "start");

if (dev->task)
schedule_task_cancel(dev->task);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a sync() version here ? i.e. will this return after thread is stopped

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, the thread should be stopped at this point by exit from dp_thread_fn.
This call is "just in case"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dp thread should be released after calling the free method in the module.

Copy link
Contributor

@marcinszkudlinski marcinszkudlinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good,
i would double check it with Google AEC pipelines,

@softwarecki
Copy link
Collaborator

We need to start the DP thread earlier, before creating module_adapter, to be able to call the module entry point in the user context.

ca_copy_from_module_to_sink(const struct audio_stream *sink,
void *buff, size_t bytes)
static void ca_copy_from_module_to_sink(const struct audio_stream *sink,
void *buff, size_t bytes)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

align to open bracket

struct comp_buffer **sinks)
static uint32_t module_single_sink_setup(struct comp_dev *dev,
struct comp_buffer **source,
struct comp_buffer **sinks)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

align to open bracket

struct comp_buffer **source,
struct comp_buffer **sinks)
static uint32_t module_single_source_setup(struct comp_dev *dev,
struct comp_buffer **source,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

align to open bracket

ca_copy_from_source_to_module(const struct audio_stream *source,
void *buff, uint32_t buff_size, size_t bytes)
static void ca_copy_from_source_to_module(const struct audio_stream *source,
void *buff, uint32_t buff_size, size_t bytes)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

align to open bracket


/* success, fill the structures */
task_memory->task.priv_data = &task_memory->pdata;
task_memory->pdata.p_stack = p_stack;
Copy link
Collaborator

@softwarecki softwarecki Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stack_size is no longer needed in pdata.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and we can get it from topology now too. As discussed on #10103 we do probably want a CONFIG_DEFAULT_DP_STACK_SIZE if topology->stack_size == 0. These could come as a separate patch though for this and #10103

comp_dbg(dev, "start");

if (dev->task)
schedule_task_cancel(dev->task);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dp thread should be released after calling the free method in the module.

@lyakh
Copy link
Collaborator Author

lyakh commented Aug 11, 2025

We need to start the DP thread earlier, before creating module_adapter, to be able to call the module entry point in the user context.

the freeing is done later in a newer version of this patch, as for the entry - maybe we can make that optional, at least with LLEXT we should be able to easily enough get rid of it and just get ops directly from an ELF section

@lyakh lyakh force-pushed the dpthread branch 2 times, most recently from fa3e265 to 8b45418 Compare August 11, 2025 15:26
Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh can we split this PR into 2 PRs, one for thread move and teh other for RTIO. That way we can test the thread move in CI whilst RTIO is still WIP.


/* success, fill the structures */
task_memory->task.priv_data = &task_memory->pdata;
task_memory->pdata.p_stack = p_stack;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and we can get it from topology now too. As discussed on #10103 we do probably want a CONFIG_DEFAULT_DP_STACK_SIZE if topology->stack_size == 0. These could come as a separate patch though for this and #10103

@marcinszkudlinski
Copy link
Contributor

marcinszkudlinski commented Aug 14, 2025

one general comment

Using DP thread to other tasks but data processing is a bit risky - may lead to starvation

  • DP priority is low, lower than LL
  • IPC thread should be "task with budget", preemptible when the primary task - audio processing - is at risk of underrun
  • DP may process long, very long - even 100ms. That is was designed for. IPC would have to wait
    (100ms may occur if i.e. there are couple of DP threads, one in 1ms period, second in 100ms. Even it "100ms period" processing can finish. lets say in 10ms, it will be preempted to let "1ms' DP thread finish in its deadline, "100ms" task may start at 0ms and finish at 99.9ms and it is OK)
  • increasing priority won't help if the processing as above is in progress, thread is busy and can't be interrupted
  • situation will become more complicated with DP to DP connection and deadline chain calculation finally introduced (PR is almost ready)
  • and the worst - all of the above will be invisible and not detectable when CPU is not heavily loaded. IT will, however, limit whole system performance and cause IPC timeouts or glitches at unexpected moments.

Theory of EDF scheduling says it may utilize CPU in 100%, lets not spoil it.

I would strongly recommend using a separate thread for IPC for DP - AFAIK a thread may be switched from one user space to another, so a single worker, properly switched from one user space to another - will be enough and won;t consume too much resources. It will save a lot of problems

@lyakh
Copy link
Collaborator Author

lyakh commented Aug 14, 2025

I would strongly recommend using a separate thread for IPC for DP - AFAIK a thread may be switched from one user space to another, so a single worker, properly switched from one user space to another - will be enough and won;t consume too much resources. It will save a lot of problems

@marcinszkudlinski we don't have 100ms long DP modules so far, looks like ATM we aren't expecting any that could run longer than 20ms. If we ever get such slow modules, then yes, then we can consider adding support for a separate thread.

@marcinszkudlinski
Copy link
Contributor

marcinszkudlinski commented Aug 14, 2025

why not now? Less complicated that future refactoring. Are there advantages to re-use of DP thread vs 1 common thread - except for one less stack in RAM?

@lyakh
Copy link
Collaborator Author

lyakh commented Aug 14, 2025

why not now? Less complicated that future refactoring. Are there advantages to re-use of DP thread vs 1 common thread - except for one less stack in RAM?

@marcinszkudlinski Are we sure that we'll ever need that? I'd prefer to first enable the simpler version if we don't know if we ever need the more complex one. Besides I personally don't find the concept of switching memory domains particularly elegant. And the alternative would be at least an additional thread per "trust circle" (vendor / module interface), which is then not one but potentially several such proxy threads

lyakh added 3 commits August 15, 2025 09:54
We need to move IPC processing for DP scheduled components into their
thread context. For that the thread has to be started early. Create
it immediately when creating DP task context.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
We need to run all module callbacks in DP thread context, for this
the thread has to be started early - before the first module callback
is called.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
DP processing threads should have as long as life time as possible to
process all the relevant IPCs in the thread context. Move thread
termination to be called immediately before freeing module data.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
@lyakh
Copy link
Collaborator Author

lyakh commented Aug 15, 2025

With this state SRC can be run in DP mode and all its hooks will run on the SRC DP thread, no SRC code will execute in "base firmware" context. In fact that isn't quite right when SRC is built as an LLEXT module - then a one-line entry function will be called by the base firmware directly. Fixing that in the next step.

@lyakh lyakh force-pushed the dpthread branch 4 times, most recently from a36778e to f33ab73 Compare August 15, 2025 11:54
lyakh added 3 commits August 15, 2025 14:03
We should be able to run DP threads in user-space mode, for this we
need to move audio and IPC signalling to RTIO.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
src_set_config() and src_get_config() aren't used, they would return
an error if ever called. It's easier to just remove them.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
The current library loading API prescribes that all modules should
have entry functions whose only role in fact (in case of LLEXT at
least) is returning an interface operations popinter. LLEXT modules
don't need that, they can store that pointer directly in module
manifest.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
@lgirdwood
Copy link
Member

why not now? Less complicated that future refactoring. Are there advantages to re-use of DP thread vs 1 common thread - except for one less stack in RAM?

@marcinszkudlinski Are we sure that we'll ever need that? I'd prefer to first enable the simpler version if we don't know if we ever need the more complex one. Besides I personally don't find the concept of switching memory domains particularly elegant. And the alternative would be at least an additional thread per "trust circle" (vendor / module interface), which is then not one but potentially several such proxy threads

I think we are fine today (as the 1st step) with the single thread (since we dont have long period DP configs today), but adding the IPC thread can come next as a topology flag i.e. DP module configured for long period, pls fork a second thread for module IPC.

I think we also need to look at IPC serialization too, it seems we can have improvements to support multiple messages per doorbell ring and more concurrency. Wont help us today though, but it is possible as future improvements.

@softwarecki
Copy link
Collaborator

@lyakh

I'd prefer to first enable the simpler version.

One separate worker will certainly be simpler.

the alternative would be at least an additional thread per "trust circle" (vendor / module interface)

Since we only have one IPC thread, one worker will suffice. Having multiple workers and the concept of "trust circle" is pointless.

@lyakh
Copy link
Collaborator Author

lyakh commented Aug 19, 2025

@lyakh

I'd prefer to first enable the simpler version.

One separate worker will certainly be simpler.

the alternative would be at least an additional thread per "trust circle" (vendor / module interface)

Since we only have one IPC thread, one worker will suffice. Having multiple workers and the concept of "trust circle" is pointless.

@softwarecki Modules shouldn't be allowed to access each other's memory, so you cannot run code of module A while being allowed to access memory of module B. So if you have a single user-space IPC helper thread, you'd need to switch its permissions / domain every time you need it to execute on behalf of a different module. This supposedly means reconfiguring the MMU context / reloading page tables.

@lyakh lyakh closed this Dec 23, 2025
@lyakh lyakh deleted the dpthread branch December 23, 2025 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants