-
Notifications
You must be signed in to change notification settings - Fork 349
ams: Add initial AMS implementation #6343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can you provide examples of what this would be used for? |
|
@plbossart AMS is basically notifications on steroids. It allows broadcasting notifications from one producer to multiple consumers, as well as targeting single consumers. Producers/consumers can be registered runtime. It should also allow for unified notifications to be sent to host through host gateway (not yet implemented). AFAIK for ACE platforms it is desired to send all module events through asynchronous messages. |
Which of these can the current notifier not do? |
lgirdwood
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kfrydryx I assume AMS will use the low level Zephyr APIs for core to core or task to task messaging ? i.e. we can send a message to any object with an instance ID on any core/thread ane even the host SW ?
I also assume this PR is the high level AMS functionality for routing, synchronization, ID lookup and translation etc.
@nashif are you able to comment on how AMS would work with Zephyr low level IPC, IDC, task/thread message APIs ?
|
@lyakh AMS is a required part of ACE feature parity. It needs to be compatible with loadable modules and whatever is happening behind the host gateway. @lgirdwood Async messages between different cores are sent through IDC/IPC. I don't think we have to care about what's behind the sof abstraction. As long as we can send a message and add case to handle ams idc we should be ok. IPC part is not yet implemented, though should be pretty similar. |
ok, as long as we are using the Zephyr interface for IDC/IPC then I'm good. |
IIUC, AMS can also message host. Yeah, we cant have 2 methods of messaging between components or to host upstream. We need to update the notifier users to AMS AND IPC events users prior to v2.5. i.e. we merge those 2 existing APIs into AMS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I stopped reviewing at some point. This file isn't compiled by CI because its Kconfig is by default =n. And it looks like you didn't compile this latest version or you left at least some compiler warnings unfixed. Please compile and test every version before submission. Also, are you sure all your functions, acquiring your coherent object, are only called from the thread context? Also please check output of the sparse CI check - even if it's green now, and fix any warnings in your code, complaining about wrong address space. But that can only be done once you get CI to compile your code
kv2019i
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments on top of what @lyakh already raised. Would be good to have some in-tree user sooner rather than later for this interface. We definitely build this by default in some configuration that is in CI, but unless we can also do functional testing (with some in-tree), I fear this can get broken very easily by other changes in upstream.
66bf758 to
b8987a2
Compare
src/include/sof/lib/ams.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, slots aren't per-core, they are per message, right? Every new message takes a new slot with various descriptors, e.g. masks, describing to which cores those messages have to be delivered. Then why is the slot number equal to the core number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have basically two scenarios for an intercore am_send. You can send to a specific module, which reserves one slot - for the core, that module runs on. Or you can send to "any", which reserves, again, one slot per core with consumer modules on. Since even if multiple consumers reside on a core only 1 slot is reserved for that route, only CORE_COUNT slots are required for a message to be sent. This of course assumes, that all callbacks are handled before next message is sent. In an unlikely event of multiple multicore messages being sent at the same time (meaning slots are not freed), an error is returned and is expected to be handled by the producer. There are expected flows described in the internal intel docs. If a receiver is expected to do heavy processing, it should prepare a callback queue for incoming messages not to block the ams.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are expected flows described in the internal intel docs.
src/lib/ams.c is not located in an Intel-specific directory. Should it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a purely software feature, not hardware dependent. Should it?
Every platform can use it, if it were platform specific, than the code would be duplicated per platform. AFAIK it is agreed that sof is moving towards async messages instead of notifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, so, let me check if I understood this correctly:
- messages aren't queued, so there cannot be more than one message in processing at any time
- messages can only be directed either to 1 specific instance of a module on a specific core, or to "any" which means - all registered consumers / modules on all cores
If that's correct, then
- you have to protect sending of messages well - two cores might decide to send messages at the same time
- in principle you only need an array of booleans, indicating whether a message is pending for the respective core or not. That array can either be a bitmask, in which case you'd have to protect it, or a real
bool msg_pending[CONFIG_CORE_COUNT];
Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you want to make those assumptions? Async messages are asynchronous and don't care about the received processing part. We have to satisfy the transport layer conditions: receiver exists, there is a free slot to pass the message. There's no need to consider what type of message it is or whether the message being sent is the same, but to a different core. Or if the previous one finished processing. Or if the processing is being handled properly on the receiving end. There's only the need to satisfy transport conditions. Do we have the receiver? Where is the receiver? Do we have a free slot to send this message to this core? If yes, no problem. If no, than return error to the producer.
Having multiple messages in slots at the same time may happen and is not necessarily a bad thing, as again - the sending transport part does not care about the message type, only routing.
Let's look at an example.
Two producers send given messages: A to cores 1 and 2 from core 0 and B to core 1 from core 0
- Message A is stored in slot 0, routed to core 1
- Message A is stored in slot 1, routed to core 2
Something takes too long (assumption is no heavy processing is done in context of Asynchronous Message), Message A is not processed on any of the cores yet. - Message B is stored in slot 2, routed to core 1
- No issues there, everything works as expected. The sending function does not care what the message is, but only if there's enough place for it to be sent.
If Message A and B were not processed and third Message C is sent, than there is a problem. ams_send returns error, that should be handled by producer module, ie. retry until success.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kfrydryx I don't want to make those assumptions. I want to have a clearly documented protocol (thanks, @marc-hb). If there can be multiple messages in flight - good. But then slots are per message. Then I see no reason why the number of slots should be equal to the number of cores. Further more, in your above example I don't understand why message A should occupy 2 slots, why doesn't it occupy just one slot with the bitmask equal to 6 == BIT(1) | BIT(2)? Documentation, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, I've fooled (and made fool of) myself. That explanation was just wrong. Slots behave differently than I was trying to explain it, it's been a long time since the logic was implemented.
Slots are strictly per message, have reference counter in ams_shared_context->slot_uses[CONFIG_CORE_COUNT]; and mark which core has already handled them in ams_shared_context->uint32_t slot_done[CONFIG_CORE_COUNT];. Sorry for the confusion.
Now, slots are not strictly tied to cores, as ams_push_slot searches for first free slot to accommodate for the message being sent, but this size makes it so all cores can send a message without conflicts if they all decide to send at the same time.
And yes, the documentation is lacking, great that the part @kv2019i mentioned (thanks!) was made public. When/if we agree on the final look of the async messages, I'd be more than happy to prepare an in-depth explanation of what's happening inside. If you have more issues with why do we even need such a mechanism with all of the zephyr goodies available, we'd need to find and contact a person responsible for the decision making.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kfrydryx @lyakh . I think this is the trickiest aspect in the implementation. Maybe have a few examples of how slots are used in typical 1:1 and 1:M cases. So to @lyakh 's original question, number of slots is not related to core count. but a slot count matching number of cores has been a practically proven good value. Maybe make this a Kconfig (as the slots do take resources) and have default set to the core count?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this size makes it so all cores can send a message without conflicts if they all decide to send at the same time.
that means, that there's one slot per sender. So make the sending core use the slot with the index equal to the core ID and throw away all the free slot searching.
|
Btw, @kfrydryx I didn't see this linked anywhere yet but we do have this https://thesofproject.github.io/latest/architectures/firmware/sof-zephyr/mpp_layer/async_messaging.html . Probably should be as a link in the pull request description. I think this would help to understand context. FYI @plbossart @lyakh @dbaluta who have asked about introduction details. (And no, I didn't know we had such documentation already up at sof-docs either ) |
Thanks, @kv2019i that helps. @kfrydryx how about just using semaphores and not messing around with IDC? We are running on top of a rather advanced multi-threading multi-CPU capable OS, why not use it. Can we register consumers not with callbacks but with a semaphore, so that when producers signal consumers, they just signal those semaphores thus making it truly asynchronous. BTW, Zephyr even has sockets, maybe just use those? They even have socketpairs. Maybe we can really use an existing tested and verified API instead of adding another one. They also have kernel/mailbox.c and they have a zbus for message passing... And possibly more IPC APIs. |
|
@plbossart This will be coming later in the form of Host Async Message Gateway. That's basically a module responsible for forwarding Asynchronous Messages to the host and the other way round. |
kv2019i
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments inline. Took a brief look at the low-level tools (k_sem, k_event) and it's true these only cover a small part of the puzzle. The new zbus (https://docs.zephyrproject.org/latest/services/zbus/index.html) is a closer match, but the public interface to consumers and providers is very different from AMS. Both are still worth a look. If we don't go that path, it would seem improving the docs on the slot semantics would be the key thing left (some bits already clarified).
src/lib/ams.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is probably iffiest bit. We in general would like to move to native Zephyr APIs for scheduling. For audio tasks, the SOF scheduler does makes sense (although there is some debate) as a generic scheduler will not be aware of the audio pipeline and to do any scheduling decision whether the audio graph topology has an influence, an audio specific layer is needed on top of the RTOS layer.
But here, for generic routing of notification, this is really a task Zephyr scheduler can natively do. We don't need to route via schedule_task_init_ll() here I think.
FYI @marcinszkudlinski @aborisovich @lyakh
src/include/sof/lib/ams.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kfrydryx @lyakh . I think this is the trickiest aspect in the implementation. Maybe have a few examples of how slots are used in typical 1:1 and 1:M cases. So to @lyakh 's original question, number of slots is not related to core count. but a slot count matching number of cores has been a practically proven good value. Maybe make this a Kconfig (as the slots do take resources) and have default set to the core count?
|
Lets make sure our direction keeps the external API for compatibility (new APIs could be allowed) and integrate to backends in Zephyr (to save memory and reduce support effort), |
src/include/sof/lib/ams.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you define here a 0 elements uint32_t array?
This is undefined behavior:
ISO 9899:2011 6.7.6.2:
If the expression is a constant expression, it shall have a value greater than zero.
I know this is C11 standard but it was just not mentioned in C99. The behavior is the same and we are going soon to improve C standard as far as #7027 will be merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great article from Google's security team about making C arrays safer: https://lwn.net/Articles/921799/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ISO 9899:2011 6.7.6.2:
Does this array declarator section apply to flexible array members too? They're not regular arrays.
Also: what in C11 made pad[0] OK again? That's a surprise to me.
AFAIK pad[0] is a pre-C99 backwards compatibility hack. pad[0] incompatible with safety features like -fstrict-flex-arrays (https://people.kernel.org/kees/bounded-flexible-arrays-in-c) Simply replace it with pad[]. If all the compilers we use accept the C99 standard pad[] then everything is OK - and better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really aware of how that's handled by different standards/compiler versions. I'll go with your suggestion there @marc-hb.
|
Rebased due to merge conflicts. PR with usage is in draft: #7158 |
|
@plbossart, @lyakh please unblock the PR or list the technical issues that still need to be addressed before merge. As agreed, the improved AMS design can be implemented incrementally over time base on the proposed solution. |
plbossart
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR still feels rushed and could use a number of review cycles to clearly explain some of its details.
I am not not going to reject but not going to approve either. This is not the way a publicly-visible PR should be handled.
I would never accept a PR like this if it was at the kernel level.
src/idc/idc.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usually headers are not conditionally included?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, this was necessary in one of the previous iterations, but should not have any impact currently.
src/idc/idc.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be a one-time log, this could swamp the logs with repeated 'AMS not enabled"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, this should not happen when ams is used correctly. If only ams interface functions are used without CONFIG_AMS, than this will never be called. Nevertheless such case should be reported when looking at idc layer separately.
src/idc/idc.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use a #define and explain what this is about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, is a masking macro ok?
src/include/sof/lib/ams.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2023 for consistency with other code in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
src/include/sof/lib/ams.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really a wildcard but rather a reserved value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
src/lib/ams.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should be a static inline function for this, having compiler-dependent stuff in the middle of a long file is not a good practice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've checked out the output files and it seems like clz compiles to nsau anyway, so this distinction might not be actually needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't believe there is no known size here. If this AMS thingy has been tried and tested, surely there's a better answer than "this is what is is and will be modified later'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't see how you can have a M:N mapping, since there is ONE producer_register. it's 1:N or you are missing a lot of explanations in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then if it's a value that was experimented with before, add a comment to that effect. Otherwise this seems completely arbitrary and as you write clearly not the best way to make progress with open-source reviews
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the values look exactly the same as for MTL, which begs the question why those values need to be platform-specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have them defined once. If there's need to distinguish between platforms/scenarios we may add a config for those values
@plbossart this PR is in review since September 2022, @kfrydryx is answering questions for months now. If there is still something unclear I am open to have a call and clarify the details. There was also architecture documentation chapter added to sof-docs.
That is why I asked for a list of technical review comments that has not been yet addressed according to you. I want to have a clear line of sight what we are missing to move forward. |
@mwasko I've dismissed my request for change. I still don't think this is a good addition to the SOF code base. Both because of the implementation itself and of the chosen approach like the use of IDC. For that reason, that I think it should be rewritten on top of proper RTOS APIs I'd rather not spend more time reviewing this version. |
No time to review this initial draft further - I'll make time when the solution is based on Zephyr native APIs
133bdce to
c37c555
Compare
Add Asynchronous Messaging Service. This can be used to communicate between modules. Asynchronous Messages are one-way messages from one producer to one or multiple registered consumers. Messages between modules on different cores are sent through IDC. All inter-core communication must be proxied by the main core. Signed-off-by: Krzysztof Frydryk <krzysztofx.frydryk@intel.com>
Set config to enable AMS for ACE1x platform and add mtl specific values. Signed-off-by: Krzysztof Frydryk <krzysztofx.frydryk@intel.com>
Set config to enable AMS for CAVS25 platform. Signed-off-by: Krzysztof Frydryk <krzysztofx.frydryk@intel.com>
Add Asynchronous Messaging Service. This is used to communicate between modules in ACE.
Might still need some tinkering: verified single-core scenario and routing logic, though multi core behavior was not tested in-depth.
Signed-off-by: Krzysztof Frydryk krzysztofx.frydryk@intel.com