Please pull MPAM 24.04 linux nvidia 6.17 next.mpam.extras #230

fyu1 · 2025-10-30T02:29:58Z

After PR #222 is closed, I found ARM released a new MPAM branch that contains extra patches: https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/log/?h=mpam/snapshot%2bextras/v6.18-rc1

I think it's better to merge the new branch into 6.17. The new branch contains:

v3 MPAM base driver
Extra patches with features like cache min/max etc.

To make PR work, I need to backport two resctrl patch sets (Babu's and Tony's) from 6.17 upstream before I can backport the new branch.

Ian and Matt suggest to ignore PR #222 and create this new PR for clean backport.

nvmochs · 2025-10-30T14:12:48Z

@fyu1 Can you pick these with -x -s and add the source after the SHA in the pick tag?

a0506716c52c535f677b0fdc498ddc8a0325473f - NVIDIA: SAUCE: arm_mpam: resctrl: Allow resctrl to allocate monitors
76bfc8a7f506a8ed662a83af20dbc8edd1baf802 - NVIDIA: SAUCE: cacheinfo: Add helper to find the cache size from cpu+level
3a01ed77dbd97420cd96556b1ef2b0e05b8d2503 - NVIDIA: SAUCE: arm_mpam: Add kunit tests for props_mismatch()
6f0e7638090ed22950359ccfff07a66a127d5c3f - NVIDIA: SAUCE: arm_mpam: Add kunit test for bitmap reset
7d9ed818050dd1066cf23d522861b034191afb3e - NVIDIA: SAUCE: arm_mpam: Add helper to reset saved mbwu state
691674810737622d1b4c05e00e0f0b4f31233ecb - NVIDIA: SAUCE: arm_mpam: Use long MBWU counters if supported
e1ae558ea58061ebcc6e20171016d5520d120dbf - NVIDIA: SAUCE: arm_mpam: Probe for long/lwd mbwu counters
e304d9636b51699816459ad2516714c8629486ca - NVIDIA: SAUCE: arm_mpam: Track bandwidth counter state for overflow and power management
f8cc695d7263c0562f1dc262a5986143dd5dfdde - NVIDIA: SAUCE: arm_mpam: Add mpam_msmon_read() to read monitor value
f9aac7fc1818af57ec68a346630d1e7b0a900adf - NVIDIA: SAUCE: arm_mpam: Add helpers to allocate monitors
b10bf6385e98cc9bf64fc33c2bd198b980df889f - NVIDIA: SAUCE: arm_mpam: Probe and reset the rest of the features
911fe53463c620ebe0cc614b95a257737822fd4f - NVIDIA: SAUCE: arm_mpam: Allow configuration to be applied and restored during cpu online
01d9ed41d7b2263c8e81b85bc86eaebb8efae276 - NVIDIA: SAUCE: arm_mpam: Use a static key to indicate when mpam is enabled
e1cbddd5415859214a85cc361f80c44dabc13091 - NVIDIA: SAUCE: arm_mpam: Register and enable IRQs
9205feb49fe51841cd338ba8cd0a5a4975fb9329 - NVIDIA: SAUCE: arm_mpam: Extend reset logic to allow devices to be reset any time
a4041c8237a2e11aa4f6a00e91476dba9ebc865e - NVIDIA: SAUCE: arm_mpam: Reset MSC controls from cpuhp callbacks
1e8b3416c599532ca3f9a36aa174eab149fd92e8 - NVIDIA: SAUCE: arm_mpam: Merge supported features during mpam_enable() into mpam_class
4015668ce84aa39ae517acd9a8a0e646e82371d7 - NVIDIA: SAUCE: arm_mpam: Probe the hardware features resctrl supports
aba5e9bf5b9b641acf6e93b94b7f89b40ba5c0e5 - NVIDIA: SAUCE: arm_mpam: Add helpers for managing the locking around the mon_sel registers
1aa12fe59e7e508141ad2075f5619c8818cfedc4 - NVIDIA: SAUCE: arm_mpam: Probe hardware to find the supported partid/pmg values
0f49d084fdf9c66d1dbd8251fd7a953e68f116af - NVIDIA: SAUCE: arm_mpam: Add cpuhp callbacks to probe MSC hardware
18d92ec00c98591d11ccd05a6a769ac71040d49c - NVIDIA: SAUCE: arm_mpam: Add MPAM MSC register layout definitions
3b0705188ae0b3d506c59972d85efc04da2f1363 - NVIDIA: SAUCE: arm_mpam: Add the class and component structures for firmware described ris
670efd06b09b869a763aec851623edd948bdf43e - NVIDIA: SAUCE: DT: arm_mpam: Add support for memory controller MSC on DT platforms
d42aa40fcdd1f4193e778baa065ca45e3cb5900a - NVIDIA: SAUCE: arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
ba0d0c83db123bad87c94673f89d2e5a0f11cd94 - NVIDIA: SAUCE: DT: dt-bindings: arm: Add MPAM MSC binding
2606dfcfa7662010e1ec18489992941cc0003dc8 - NVIDIA: SAUCE: arm64: kconfig: Add Kconfig entry for MPAM
bc8ffaca99358dbe9a7d508ce43fcc8454175519 - NVIDIA: SAUCE: ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
5747f3c1d3f9d32c7329c269126b87ac7819eb0a - NVIDIA: SAUCE: ACPI / PPTT: Find cache level by cache-id
cf7a7746b760d1062f52bec64e094930b22c1ef4 - NVIDIA: SAUCE: ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels
41bfada281ed13bb8f70d3183448fc60172f8bce - NVIDIA: SAUCE: ACPI / PPTT: Add a helper to fill a cpumask from a processor container eacf99caadc7b0cb06c9fa2e061777632e2c8858 - NVIDIA: SAUCE: DT: cacheinfo: Expose the code to generate a cache-id from a device_node

clsotog · 2025-10-30T16:59:37Z

I started looking at the first 7 commits have the SAUCE tag but I think they are taking upstream. Then we do not need the SAUCE.

nvmochs · 2025-10-30T17:20:50Z

I started looking at the first 7 commits have the SAUCE tag but I think they are taking upstream. Then we do not need the SAUCE.

That's a good point and something I missed.

@fyu1 - Please remove the NVIDIA:SAUCE tags from any patches that are picked from upstream.

Also, as a nit, I noticed on the patches that were picked from upstream and contain the pick tag, there is whitespace between the SHA and the closing parenths:

(cherry picked from commit d79bab8a48bfcf5495f72d10bf609478a4a3b916 )

For consistency it would be nice if this can be removed.

e.g.
(cherry picked from commit d79bab8a48bfcf5495f72d10bf609478a4a3b916)

fyu1 · 2025-10-30T22:09:22Z

@clsotog @nvmochs Thank you very much for your review! Could you please review the branch again?

nvmochs

Thanks for addressing my prior comments. Nothing further from me.

Acked-by: Matthew R. Ochs <mochs@nvidia.com>

clsotog · 2025-10-31T04:04:24Z

This is a note to myself the kernel tree from james morse is the mpam/snaphot+extras/v6.18-rc1
Some observations but will ack anyway:
The commits that have the HACK label, have you try to use them? At the commit it says not for upstream but if it helps debug then Im ok with them.
There were 2 commits about a Makefile change and then revert that I was wondering why adding them. I guess it match the flow of morse's kernel tree.

Acked-by: Carol L Soto <csoto@nvidia.com>

fyu1 · 2025-10-31T16:22:35Z

This is a note to myself the kernel tree from james morse is the mpam/snaphot+extras/v6.18-rc1 Some observations but will ack anyway: The commits that have the HACK label, have you try to use them? At the commit it says not for upstream but if it helps debug then Im ok with them. There were 2 commits about a Makefile change and then revert that I was wondering why adding them. I guess it match the flow of morse's kernel tree.

Acked-by: Carol L Soto <csoto@nvidia.com>

@clsotog Thank you for your review!

Only first 29 of 100+ James patches in the branch was released to LKML. The rest needs to be cleaned up to be released to LKML. So there are messy code in them. I want to keep James original patches as much as possible so it's easier to trace back to original code to help debug potential issues. I did test the backported patches on Grace machines.

ianm-nv · 2025-10-31T17:13:54Z

PR sent to CKT

BugLink: https://bugs.launchpad.net/bugs/2114230 Please refer https://github.com/OpenDevicePartnership/documentation/blob/main/bookshelf/Shelf%204%20Specifications/EC%20Interface/src/secure-ec-services-overview.md for details regarding FFA device details for secure EC services communication. The HID 'MSFT000C' is reserved for FFA devices. This HID is documented in https://github.com/OpenDevicePartnership/documentation/blob/main/bookshelf/Shelf%204%20Specifications/EC%20Interface/src/secure-ec-services-overview.md#hid-definition This commit adds a platform driver which binds with FFA device. In its probe routine, it executes the AVAL method to check if FFA can be used for secure EC services communication. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Noah Wager <noah.wager@canonical.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Signed-off--by: Brad Figg <bfigg@nvidia.com> (cherry picked from commit 555e41e noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

BugLink: https://bugs.launchpad.net/bugs/2114230 Please refer https://github.com/OpenDevicePartnership/documentation/blob/main/bookshelf/Shelf%204%20Specifications/EC%20Interface/src/secure-ec-services-overview.md for details regarding FFA device details for secure EC services communication. Each secure EC service is identified by separate UUID. When generic FFA module loads (ffa_module), then it gets the list of partitions. Each EC service is a FFA partition and ffa_module creates a device for each partition. These devices will be added in arm_ffa bus type. The device will be named as arm-ffa-<number>. For binding with these devices, a driver needs to be registered in arm_ffa bus type. This driver uses structure ‘struct ffa_driver’ where it uses UUID as ID table. The binding of the driver to device happens on basis of UUID. The secure EC services FFA driver is dependent upon main FFA device to be created (which uses ACPI ID MSFT000C), so ffa_driver_register()/ffa_driver_unregister() is invoked from nvidia_ffa_probe()/nvidia_ffa_remove(). Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Noah Wager <noah.wager@canonical.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Signed-off--by: Brad Figg <bfigg@nvidia.com> (cherry picked from commit 9613a5c noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

BugLink: https://bugs.launchpad.net/bugs/2114230 Please refer https://github.com/OpenDevicePartnership/documentation/blob/main/bookshelf/Shelf%204%20Specifications/EC%20Interface/src/secure-ec-services-overview.md for details regarding FFA device details for secure EC services communication. When ACPI interpreter runs code with FFH operation region offset 4, then this data is meant for EC secure services. The FFH buffer has data in FFA_REQ_PACKET format. In this packet, it has UUID for EC service and then the service specific raw data. This commit adds a custom FFH offset handler. When request comes with custom offset then it will be handled by nvdia FFA EC driver. Inside the custom ffh callback, it extracts the UUID and gets the ffa_device for it. Then it fills raw data in ffa_send_direct_data2 and invoke sync_send_receive2() routine for that ffa_device. Once it gets the response back, then it fill data in FFA_RESP_PACKET format and ACPI interpreter passes that data to upper layer. NOTE: In the above document, the FFA_REQ_PACKET and FFA_RESP_PACKET uses different format. But in latest firmware code, the ACPI implementation is done using same format for both request and response (follows the FFA_REQ_PACKET format). The status bit will be updated in the response (0 for success and 1 for failure). This mixed endian is documented in https://cdrdv2-public.intel.com/772722/asl-tutorial-v20190625.pdf In addition to Concatenate, there are several useful macros that generate buffers from strings. For example, the ToUUID macro takes a string of the form aabbccdd-eeff-gghh-iijj-kkllmmnnoopp where aa through pp represent one byte values encoded with hexadecimal characters. This string gets converted to a 16-byte buffer that looks like the following: Buffer() { dd, cc, bb, aa, ff, ee, hh, gg, ii, jj, kk, ll, mm, nn, oo, pp } This mixture of little endian and big-endian encoding UUID is called a mixed-endian format. The use of strings and the ToUUID macro is a convenient way to avoid having to manually encode the mixed-endian format. There are many other macros that provide similar conveniences, such as EISAID. In kernel, it is represented with guid_t. Inside nvidia_ffh_handler(), we need to covert buffer of 16 bytes from FFA UUID to AML UUID format. nvidia_get_uuid_from_aml_buf() converts the AML UUID buffer into FFA UUID format. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Noah Wager <noah.wager@canonical.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Signed-off--by: Brad Figg <bfigg@nvidia.com> (cherry picked from commit 40ca7bc noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

BugLink: https://bugs.launchpad.net/bugs/2114230 - During boot time, ACPI probe happens first. It calls _STA method for each added device. - Inside _STA method for device managed by EC, it uses FFH offset 4. - The request will fail since there is no custom handler registered for offset 0x4 and device will be disabled. - If rescan happens on acpi bus, then device _STA method will be called again. This commit adds support to get acpi id from UUID and invokes acpi_bus_scan(). NOTE: nvidia_get_acpi_id_from_uuid() returns ACPI ID only for few services. We don't have a corresponding driver available for all the services in the current code. For few services only, its node uses generic ACPI ID and has driver available. For rest of the service, the driver is not yet available, or the published spec is not updated with full ACPI sample code. Once we have driver available for that, then we can add those ACPI IDs in this list. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Noah Wager <noah.wager@canonical.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Signed-off--by: Brad Figg <bfigg@nvidia.com> (cherry picked from commit 971a25e noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

BugLink: https://bugs.launchpad.net/bugs/2114230 The commit 897e9e6 ("firmware: arm_ffa: Initial support for scheduler receiver interrupt") adds support for SGI interrupts in the FFA driver. However, the validation for SGIs in the GICv3 is too strict, causing the driver probe to fail. This patch relaxes the SGI validation check, allowing callers to use SGIs if the requested SGI number is greater than or equal to MAX_IPI, which fixes the TFA driver probe failure. This issue is observed on NVIDIA server platform with FFA-v1.1. PTP clock support registered EDAC MC: Ver: 3.0.0 ARM FF-A: Driver version 1.1 ARM FF-A: Firmware version 1.1 found GICv3: [Firmware Bug]: Illegal GSI8 translation request ARM FF-A: Failed to create IRQ mapping! ARM FF-A: Notification setup failed -61, not enabled ARM FF-A: Failed to register driver sched callback -95 scmi_core: SCMI protocol bus registered This patch was sent in arm mailing list for upstream but it got rejected. https://patchwork.kernel.org/project/linux-arm-kernel/patch/20240813033925.925947-1-sdonthineni@nvidia.com/ The proper fix requires some kind of mechanism by which a SGI can be requested by module but that needs discussion with arm and it will take time. This patch will break only if MAX_IPI value gets changed. This patch adds a BUILD_BUG_ON() to catch that situation. Once proper solution is concluded then this patch will be reverted. Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Noah Wager <noah.wager@canonical.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Signed-off--by: Brad Figg <bfigg@nvidia.com> (backported from commit fd136cf) [maskedarray: removed enum ipi_msg_type definition as it appears in upstream commit "irqchip/gic-v5: Add GICv5 LPI/IPI support"] Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

BugLink: https://bugs.launchpad.net/bugs/2114230 Please refer https://github.com/OpenDevicePartnership/documentation/blob/main/bookshelf/Shelf%204%20Specifications/EC%20Interface/src/secure-ec-services-overview.md for details regarding FFA device details for secure EC services communication. 1. We need to get virtual IDs which a EC service supports. In the FFA node, the _DSD object contains this information. If we look the sample from above document, Name(_DSD, Package() { ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"), //Device Prop UUID Package() { Package(2) { "arm-arml0002-ffa-ntf-bind", Package() { 1, // Revision 2, // Count of following packages Package () { ToUUID("330c1273-fde5-4757-9819-5b6539037502"), // Service1 UUID Package () { 0x01, //Cookie1 (UINT32) 0x07, //Cookie2 } }, Package () { ToUUID("b510b3a3-59f6-4054-ba7a-ff2eb1eac765"), // Service2 UUID Package () { 0x01, //Cookie1 0x03, //Cookie2 } } } } } }) // _DSD() Then it uses a nexted package structure. nvidia_ffa_fill_notification_map() added in this commit parses the _DSD object and fill the notification id map for that service. 2. Once the virtual ID is get then it needs to map to physical ID by invoking function 1 in the notify service. 3. The UUID for notification service is B510B3A3-59F6-4054-BA7A-FF2EB1EAC765. An FFA device will be created for this notification service by ffa_module. This notify service needs to be probed first. To make that happen, a separate ffa_driver instance is created and it is getting registered first. 4. We can do 1:1 mapping between virtual ID and hardware ID. 5. We need to invoke notify_request() with hardware notification ID. It registers callback function for notification. 6. Once notification comes then we need to evaluate _DSM method with virtual ID (which will be mapped same as hardware ID). 7. The function 2 in the notify service should destroy the mapping. But it is nither implemented in the firmware not its documentation is available. A TODO comment is added in nvidia_ffa_notification_destroy(). Also, if we unload and reload the modules, the existing mapping still exists. In nvidia_ffa_notification_setup(), ignore the error for this case. When firmware is updated, then the error will be returned. 8. The notification service FFA device is needed by each EC secure services FFA device to get virtual notification list. Now following device dependency chain is created. FFA device <- notification service FFA device <- EC secure services FFA device To satisfy this, call driver registration in its dependent driver probe routine. Similarly, do the driver registration in its dependent driver removed routine. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Noah Wager <noah.wager@canonical.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Signed-off--by: Brad Figg <bfigg@nvidia.com> (cherry picked from commit 1287a1d noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

… EC driver BugLink: https://bugs.launchpad.net/bugs/2114230 The NVIDIA FFA and EC secure services driver enables the communication with EC (Embedded Controller). Make this driver built-in to enable EC communication at early boot. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Noah Wager <noah.wager@canonical.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Signed-off--by: Brad Figg <bfigg@nvidia.com> (cherry picked from commit 9ea0251) (cherry picked from commit 9ea0251 noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

BugLink: https://bugs.launchpad.net/bugs/2114759 Add quirk function to skip pcie secondary bus reset. PCIe gen4 link will downgrade to gen1 after SBR, so we have to skip this operation. Signed-off-by: Jerry.Guo <jerry.guo@mediatek.com> Signed-off-by: Yenchia Chen <yenchia.chen@mediatek.com> Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Noah Wager <noah.wager@canonical.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Signed-off--by: Brad Figg <bfigg@nvidia.com> (cherry picked from commit 0185574 noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

…pinctrl driver BugLink: https://bugs.launchpad.net/bugs/2117784 Kernel GPIO subsystem mapping hardware pin number to a different range of gpio number. Add gpio-range structure to hold the mapped gpio range in pinctrl driver. That enables the kernel to search a range of mapped gpio range against a pinctrl device. Signed-off-by: Jonas Chen <yung-chi.chen@mediatek.com> Signed-off-by: Yenchia Chen <yenchia.chen@mediatek.com> Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Acked-by: nvmochs Acked-by: Noah Wager <noah.wager@canonical.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Signed-off--by: Brad Figg <bfigg@nvidia.com> (cherry picked from commit 1049985 noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

BugLink: https://bugs.launchpad.net/bugs/2117784 Add acpi support in the shared part of pinctrl driver. Parsing hardware base addresses and irq naumber to initialize eint accroding to the acpi table data. Signed-off-by: Jonas Chen <yung-chi.chen@mediatek.com> Signed-off-by: Yenchia Chen <yenchia.chen@mediatek.com> Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Acked-by: nvmochs Acked-by: Noah Wager <noah.wager@canonical.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Signed-off--by: Brad Figg <bfigg@nvidia.com> (backported from commit cdce65d noble:linux-nvidia-6.14) [maskedarray: context adjusted due to commit 86dee87: "pinctrl: mediatek: Fix the invalid conditions"] Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

BugLink: https://bugs.launchpad.net/bugs/2117784 Add mt8901 pinctrl, gpio and eint driver implementation. Signed-off-by: Jonas Chen <yung-chi.chen@mediatek.com> Signed-off-by: Yenchia Chen <yenchia.chen@mediatek.com> Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Acked-by: nvmochs Acked-by: Noah Wager <noah.wager@canonical.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Signed-off--by: Brad Figg <bfigg@nvidia.com> (backported from commit 1fc7a58 noble:linux-nvidia-6.14) [maskedarray: context adjusted for missing commit a3fe132: "pinctrl: mediatek: Add pinctrl driver for mt8189"] Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

…CTRL_MT8901 BugLink: https://bugs.launchpad.net/bugs/2117784 Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Acked-by: nvmochs Acked-by: Noah Wager <noah.wager@canonical.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Signed-off--by: Brad Figg <bfigg@nvidia.com> (cherry picked from commit 0bd85d0) (cherry picked from commit 0bd85d0 noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

BugLink: https://bugs.launchpad.net/bugs/2118357 commit d0038ee ("NVIDIA: SAUCE: Add support for EC secure service communication") added nvidia_ffh_handler() function. While copying the data back into ACPI FFH packet, it uses the request length. The response data can be larger than request length. The response length can't be fetched in the linux FFH handler function. We can copy all the bytes from ffa_data.data. The ACPI AML code will only use the required number bytes from this. Normally we don't need response length to be known. The ACPI table are not using that. It is parsing response data directly. In the latest revision of spec, the length field itself has been removed https://github.com/OpenDevicePartnership/documentation/blob/b23acb09f7cf03a5c3167509533f396d547e6291/guide_book/src/specs/ec_interface/secure-ec-services-overview.md#operation-region-definition For DIGITS GB10, it is using older revision of spec and the launch is planned with older revision of spec. When we move to latest revision, then we need to copy all data bytes for both request and response. The info->length is corresponding to FFH buffer length in ACPI table. Following is the code in ACPI table Name (_HID, "MSFT000C") // _HID: Hardware ID OperationRegion (AFFH, FFixedHW, 0x04, 0x90) info->length will be 0x90 (144) bytes. ffa_packet->length in the older revision is valid data bytes (https://github.com/OpenDevicePartnership/documentation/blob/45ad9b30be0f40e229deed2fef7a60d0b0b591f5/bookshelf/Shelf%204%20Specifications/EC%20Interface/src/secure-ec-services-overview.md) struct nvidia_ec_ffa_packet *ffa_packet = (struct nvidia_ec_ffa_packet *)value; This value buffer length should be info->length. We are taking minimum of sizeof(ffa_data.data) = 112 and (info->length = 144) - (offsetof(struct nvidia_ec_ffa_packet, rawdata) = 18) = 126, so ffh_copy_len will be 112 for the current DIGITS ACPI implementation. In the latest revision, this length mismatch is also fixed. Raw data will start at offset 32, so there both will come as 112. Fixes: d0038ee ("NVIDIA: SAUCE: Add support for EC secure service communication") Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Noah Wager <noah.wager@canonical.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Signed-off--by: Brad Figg <bfigg@nvidia.com> (cherry picked from commit 141bd56 noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

BugLink: https://bugs.launchpad.net/bugs/2118663 Add cpu part and model macro definitions for NVIDIA Olympus core. Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Noah Wager <noah.wager@canonical.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Signed-off--by: Brad Figg <bfigg@nvidia.com> (cherry picked from commit 9273361 noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

BugLink: https://bugs.launchpad.net/bugs/2118663 Set CONFIG_ARM64_BRBE=y for arm64 linux-nvidia-6.14. Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Noah Wager <noah.wager@canonical.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Signed-off--by: Brad Figg <bfigg@nvidia.com> (cherry picked from commit 26a417a noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

Resctrl specifies the schema format for MB and SMBA in rdt_resources_all[]. Intel platforms take a percentage for MB, AMD platforms take an absolute value which isn't MB/s. Currently these are both treated as a 'range'. Adding support for additional types of control shows that user-space needs to be told what the control formats are. Today users of resctrl must already know if their platform is Intel or AMD to know how the MB resource will behave. The MPAM support exposes new control types that take a 'percentage'. The Intel MB resource is also configured by a percentage, so should be able to expose this to user-space. Remove the static configuration for schema_fmt in rdt_resources_all[] and specify it with the other control properties in __get_mem_config_intel() or __get_mem_config_amd(). Signed-off-by: James Morse <james.morse@arm.com> (cherry picked from commit 20f0c13f4ffd01cb6fc239248afa05d602f9e8d4 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

MPAMs bandwidth controls are both exposed to resctrl as if they take a percentage. Update the schema format so that user-space can be told this is a perentage, and files that describe this control format are exposed. (e.g. min_percent) Existing variation in this area is covered by requiring user-space to know if it is running on an Intel or AMD platform. Exposing the schema format directly will avoid modifying user-space to know it is running on an MPAM or RISCV platform. MPAM can also expose bitmap controls for memory bandwidth, which may become important for use-cases in the future. These are currently converted to a percentage to fit the existing definition of the MB resource. Signed-off-by: James Morse <james.morse@arm.com> (cherry picked from commit ea03ef359eb04c8c0f557f589578bb4777b8e2b5 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

Resctrl previously had a 'range' schema format that took some kind of number. This has since been split into percentage, MB/s and an AMD platform specific scheme. As range is no longer used, remove it. The last user is mba_sc which should be described as taking MB/s. Signed-off-by: James Morse <james.morse@arm.com> (cherry picked from commit 93fda1d6632174fefddfe5e712110dd1e2947c95 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

…tmap controls MPAM has cache capacity controls that effectively take a percentage. Resctrl supports percentages, but the collection of files that are exposed to describe this control belong to the MB resource. To find the minimum granularity of the percentage cache capacity controls, user-space is expected to rad the banwdidth_gran file, and know this has nothing to do with bandwidth. The only problem here is the name of the file. Add duplicates of these properties with percentage and bitmap in the name. These will be exposed based on the schema format. The existing files must remain tied to the specific resources so that they remain visible to user-space. Using the same helpers ensures the values will always be the same regardless of the file used. These files are not exposed until the new RFTYPE schema flags are set on a resource 'fflags'. Signed-off-by: James Morse <james.morse@arm.com> (cherry picked from commit 673bcb00d2371a2876e164da55d642fdf7657b8d https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

…n schema format MPAM has cache capacity controls that effectively take a percentage. Resctrl supports percentages, but the collection of files that are exposed to describe this control belong to the MB resource. New files have been added that are selected based on the schema format. Apply the flags to enable these files based on the schema format. Add a new fflags_from_schema() that is used for controls. Signed-off-by: James Morse <james.morse@arm.com> (cherry picked from commit a837ccc258380d6aeef86df709cc0484b60a4acf https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

If more schemas are added to resctrl, user-space needs to know how to configure them. To allow user-space to configure schema it doesn't know about, it would be helpful to tell user-space the format, e.g. percentage. Add a file under info that describes the schema format. Percentages and 'mbps' are implicitly decimal, bitmaps are expected to be in hex. Signed-off-by: James Morse <james.morse@arm.com> (cherry picked from commit b457019d995b2849e683aef0fd89066e64c679a4 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

MPAM can have both cache portion and cache capacity controls on any cache that supports MPAM. Cache portion bitmaps can be exposed via resctrl if they are implemented on L2 or L3. The cache capacity controls can not be used to isolate portions, which is in implicit in the L2 or L3 bitmap provided by user-space. These controls need to be configured with something more like a percentage. Add the resource enum entries for these two resources. No additional resctrl code is needed because the architecture code will specify this resource takes a 'percentage', re-using the support previously used only for the MB resource. Signed-off-by: James Morse <james.morse@arm.com> (cherry picked from commit b601bbf375b016c417db4ec0e8bd6ae58b9057aa https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

…m cmax MPAM's maximum cache-capacity controls take a fixed point fraction format. Instead of dumping this on user-space, convert it to a percentage. User-space using resctrl already knows how to handle percentages. Signed-off-by: James Morse <james.morse@arm.com> (cherry picked from commit 183d4c43260089e6b51518e50427d0f04a6af875 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

The cpu hotplug lock has a helper lockdep_assert_cpus_held() that makes it easy to annotate functions that must be called with the cpu hotplug lock held. Do the same for memory. Signed-off-by: James Morse <james.morse@arm.com> (cherry picked from commit f40d4b8451b3d9e197166ff33104bd63f93709d0 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

…PU hotplug lock resctrl takes the read side CPU hotplug lock whenever it is working with the list of domains. This prevents a CPU being brought online and the list being modified while resctrl is walking the list, or picking CPUs from the CPU masks. If resctrl domains for CPU-less NUMA nodes are to be supported, this would not be enough to prevent the domain list form being modified as a NUMA node can come online with only memory. Take the memory hotplug lock whenever the CPU hotplug lock is taken. Signed-off-by: James Morse <james.morse@arm.com> (cherry picked from commit f5a082989a5f40b9b95515d68b230f8125648fdb https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

…arch stubs Resctrl expects the domain IDs for the 'MB' resource to be the corresponding L3 cache-ids. This is a problem for platforms where the memory bandwidth controls are implemented somewhere other than the L3 cache, and exist on a platform with CPU-less NUMA nodes. Such platforms can't currently be exposed via resctrl as not all the memory bandwidth can be controlled. Add a mount option to allow user-space to opt-in to the domain IDs for the MB resource to be the NUMA nid instead. Signed-off-by: James Morse <james.morse@arm.com> (cherry picked from commit ae8929caac02dccdc932666c1d8c906dda541bf1 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

idx is not used. Remove it to avoid build warning. The author is James but he doesn't add his Signed-off-by. (backported from commit c9b4fabe0b1b4805186d4326d47547993a02d191 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) [fenghuay: Change subject to a meaningfull one. Add commit message.] Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

…stead of cache-id The MB domain ids are the L3 cache-id. This is unfortunate if the memory bandwidth controls are implemented for CPU-less NUMA nodes as there is no L3 whose cache-id can be used to expose these controls to resctrl. When picking the class to use as MB, note whether it is possible for the NUMA nid to be used as the domain-id. By default the MB resource will use the cache-id. Signed-off-by: James Morse <james.morse@arm.com> (cherry picked from commit c2506e7fdb9e9de624af635f5060a1fe56a6bb80 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

… work with a set of CPUs mpam_resctrl_offline_domain_hdr() expects to take a single CPU that is going offline. Once all CPUs are offline, the domain header is removed from its parent list, and the structure can be freed. This doesn't work for NUMA nodes. Change the CPU passed to mpam_resctrl_offline_domain_hdr() and mpam_resctrl_domain_hdr_init to be a cpumask. This allows a single CPU to be passed for CPUs going offline, and cpu_possible_mask to be passed for a NUMA node going offline. Signed-off-by: James Morse <james.morse@arm.com> (cherry picked from commit 093483e5bca0aef546208b32eedf59f3aac665ff https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

…domain() to have CPU and node mpam_resctrl_alloc_domain() brings a domain with CPUs online. To allow for domains that don't have any CPUs, split it into a CPU and NUMA node version. Signed-off-by: James Morse <james.morse@arm.com> (cherry picked from commit 817d04bd296871b61dd70f68d160b85837dfe9a8 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

…nline/offline To expose resctrl resources that contain CPU-less NUMA domains, resctrl needs to be told when a CPU-less NUMA domain comes online. This can't be done with the cpuhp callbacks. Add a memory hotplug notifier, and use this to create and destroy resctrl domains. Signed-off-by: James Morse <james.morse@arm.com> (cherry picked from commit caf4034229d8df2c306658c2ddbe3c1ab73df109 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

…UMA nid as MB domain-id Enable resctrl's use of NUMA nid as the domain-id for the MB resource. Changing this state involves changing the IDs of all the domains visible to resctrl. Writing to this list means preventing CPU and memory hotplug. Signed-off-by: James Morse <james.morse@arm.com> (cherry picked from commit a795ac909c6c050daaf095abc9043217ddf5e746 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2122432 Modified for latest MPAM. Signed-off-by: Brad Figg <bfigg@nvidia.com> Signed-off-by: Koba Ko <kobak@nvidia.com> Signed-off-by: Fenghua Yu <fenghuay@nvidia.com> (forward ported from commit 77bd02c https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-6.14-next) [fenghuay: change 6.14 path to 6.17] Signed-off-by: Fenghua Yu <fenghuay@nvidia.com> Acked-by: Matt Ochs <mochs@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Koba Ko <kobak@nvidia.com>

Define the missing SHIFT definitions to fix build errors. Fixes: a76ea20 ("NVIDIA: SAUCE: arm_mpam: Add quirk framework") Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

partid is from 0 to partid_max, inclusively. partid_max + 1 is out of valid partid range. Accessing partid_max + 1 will generate error interrupt and cause MPAM disabled. Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

fyu1 · 2025-11-20T13:48:23Z

Based on feedbacks from Canonical, I update the MPAM branches in the PR branch to remove HACK patches, remove one untested patch, modify subjects and commit message for some patches. Please see details below.

To make the PR work, I need to backport two resctrl patch sets (Babu's and Tony's) from 6.17 upstream before I can backport MPAM patches. No changes on these patches.

The MPAM patches are backported from https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/snapshort+extras/v6.18-rc1. The patches are formatted from commit range 2af39084438cebc0053e8ddcc4a855873125b518^..HEAD. There are totally 140 patches.

To clean up the 140 MPAM patches and retain as much of the existing code as possible, I made the following changes:

Remove these two patches because 0068 reverts 0044:
0044-DROP-Makefile-fixup.patch
0068-DROP-Revert-Makefile-fixup.patch
Remove this patch because it doesn't do any useful things:
0090-TAG-extras-branch-here.patch
Change subjects to meaningful ones. Add commit messages:
0011-DT-code-for-PREV.patch
0133-DISAPPEAR.patch:
These "HACK" patches are remove safely:
0093-HACK-make-quirks-writable.patch
0139-HACK-fs-resctrl-Add-cranky-debug-for-reading-CPU-msr.patch
0140-HACK-arm_mpam-Add-cranky-debug-for-reading-CSU-hardw.patch
This untested patch is removed. So MPAM KVM won't work. But it's safer to remove this patch than keeping it:
0041-untested-KVM-arm64-Force-guest-EL1-to-use-user-space.patch
These untested patches are kept. Changing or removing them may cause conflicts or other issues. They only change MPAM driver code.
If there is any issue in these patches (or any MPAM patches), a workaround is to disable MPAM driver completely by kernel boot option: "arm64.nompam".
0058-untested-arm_mpam-resctrl-pick-classes-for-use-as-mb.patch
0066-untested-arm_mpam-resctrl-Allow-monitors-to-be-confi.patch
0108-untested-mpam-Convert-pcc_channels-list-to-XArray-an.patch
0136-untested-arm_mpam-resctrl-Split-mpam_resctrl_alloc_d.patch
0138-untested-arm_mpam-resctrl-Allow-resctrl-to-enable-NU.patch

clsotog · 2025-11-20T18:30:45Z

@fyu1 with the new commits, would that fix the mount/unmount issue Colin saw last time? I saw it again today.

nvmochs · 2025-11-20T20:12:07Z

I re-reviewed the latest updates to the PR (0dfe3c7b2d47^..6b47273c9904) and confirmed 165 out of 170 match their source exactly. I manually reviewed the remaining 5 patches and have no issues with them; I also see no issues with the strategy proposed here to address issues with the patches that were previously flagged.

Acked-by: Matthew R. Ochs <mochs@nvidia.com>

clsotog · 2025-11-20T21:29:09Z

@fyu1 with the new commits, would that fix the mount/unmount issue Colin saw last time? I saw it again today.
Sorry for the noise. I booted to wrong kernel but with latest changes do not see the issue. Thanks.

clsotog

Acked-by: Carol L Soto <csoto@nvidia.com>

fyu1 changed the title ~~Please pull 24.04 linux nvidia 6.17 next.mpam.extras~~ Please pull MPAM 24.04 linux nvidia 6.17 next.mpam.extras Oct 30, 2025

fyu1 force-pushed the 24.04_linux-nvidia-6.17-next.mpam.extras branch from ebd7620 to 8c698aa Compare October 30, 2025 02:56

fyu1 force-pushed the 24.04_linux-nvidia-6.17-next.mpam.extras branch 2 times, most recently from fff8a68 to 42ab8ea Compare October 30, 2025 21:47

nvmochs self-requested a review October 30, 2025 22:19

nvmochs approved these changes Oct 30, 2025

View reviewed changes

clsotog self-requested a review October 31, 2025 04:04

clsotog approved these changes Oct 31, 2025

View reviewed changes

abhsahu and others added 15 commits November 14, 2025 21:05

James Morse and others added 20 commits November 20, 2025 12:43

NVIDIA: SAUCE: arm_mpam: Fix missing SHIFT definitions

b082ef8

Define the missing SHIFT definitions to fix build errors. Fixes: a76ea20 ("NVIDIA: SAUCE: arm_mpam: Add quirk framework") Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

NVIDIA: SAUCE: Fix partid_max range issue

6b47273

partid is from 0 to partid_max, inclusively. partid_max + 1 is out of valid partid range. Accessing partid_max + 1 will generate error interrupt and cause MPAM disabled. Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

fyu1 force-pushed the 24.04_linux-nvidia-6.17-next.mpam.extras branch from 42ab8ea to 6b47273 Compare November 20, 2025 13:30

clsotog self-requested a review November 20, 2025 21:27

clsotog approved these changes Nov 20, 2025

View reviewed changes

tdavenvidia mentioned this pull request Dec 16, 2025

24.04 linux nvidia 6.17 next.mpam.extras #265

Open

nvidia-bfigg force-pushed the 24.04_linux-nvidia-6.17-next branch 2 times, most recently from c7fca69 to 6a9a932 Compare December 18, 2025 13:01

Please pull MPAM 24.04 linux nvidia 6.17 next.mpam.extras #230

Are you sure you want to change the base?

Please pull MPAM 24.04 linux nvidia 6.17 next.mpam.extras #230

Uh oh!

Conversation

fyu1 commented Oct 30, 2025

Uh oh!

nvmochs commented Oct 30, 2025

Uh oh!

clsotog commented Oct 30, 2025

Uh oh!

nvmochs commented Oct 30, 2025

Uh oh!

fyu1 commented Oct 30, 2025

Uh oh!

nvmochs left a comment

Choose a reason for hiding this comment

Uh oh!

clsotog commented Oct 31, 2025

Uh oh!

fyu1 commented Oct 31, 2025

Uh oh!

ianm-nv commented Oct 31, 2025

Uh oh!

fyu1 commented Nov 20, 2025

Uh oh!

clsotog commented Nov 20, 2025

Uh oh!

nvmochs commented Nov 20, 2025

Uh oh!

clsotog commented Nov 20, 2025

Uh oh!

clsotog left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants