lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <196146e6-6366-b09b-c367-facf10af6b9b@huawei.com>
Date: Tue, 3 Feb 2026 17:22:24 +0800
From: Zeng Heng <zengheng4@...wei.com>
To: Ben Horgan <ben.horgan@....com>, <james.morse@....com>,
	<fenghuay@...dia.com>, <reinette.chatre@...el.com>
CC: <linux-kernel@...r.kernel.org>, <wangkefeng.wang@...wei.com>, Jonathan
 Cameron <Jonathan.Cameron@...wei.com>
Subject: Re: [PATCH] arm64/mpam: Support partial-core boot for MPAM



On 2026/2/2 20:46, Zeng Heng wrote:
> Hi Ben,
> 
> On 2026/2/2 19:34, Ben Horgan wrote:
>> Hi Zeng,
>>
>> On 2/2/26 09:16, Zeng Heng wrote:
>>>
>>>
>>> On 2026/2/2 16:41, Zeng Heng wrote:
>>>>
>>>>
>>>> On 2026/1/29 18:11, Ben Horgan wrote:
>>>>> Hi Zeng,
>>>>>
>>>>> I think I've just managed to whitelist your email address. So, all 
>>>>> being
>>>>> well I'll get your emails in my inbox.
>>>>>
>>>>> On 1/7/26 03:13, Zeng Heng wrote:
>>>>>> Some MPAM MSCs (like L2 MSC) shares the same power domain with its
>>>>>> associated CPUs. Therefore, in scenarios where only partial cores 
>>>>>> power
>>>>>> up, the MSCs belonging to the un-powered cores don't need and should
>>>>>> not
>>>>>> be accessed, otherwise bus-access fault would occur.
>>>>>
>>>>> The MPAM driver intentionally to waits until all MSCs have been
>>>>> discovered before allowing MPAM to be used so that it can check the
>>>>> properties of all the MSC and determine the configuration based on 
>>>>> full
>>>>> knowledge. Once a CPU affine with each MSC has been enabled then MPAM
>>>>> will be enabled and usable.
>>>>>
>>>>> Suppose we weren't to access all MSCs in an asymmetric configuration.
>>>>> E.g. if different L2 had different lengths of cache portion bit 
>>>>> maps and
>>>>> MPAM was enabled with only the CPUs with the same L2 then the driver
>>>>> wouldn't know and we'd end up with a bad configuration which would
>>>>> become a problem when the other CPUs are eventually turned on.
>>>>>
>>>>> Hence, I think we should retain the restriction that MPAM is only
>>>>> enabled once all MSC are probed. Is this a particularly onerous
>>>>> resctriction for you?
>>>>>
>>>>
>>>> I have no objection to the restriction that "MPAM is only enabled once
>>>> all MSC are probed." This constraint ensures the driver has complete
>>>> knowledge of all Memory System Components before establishing the
>>>> configuration.
>>>>
>>>>
>>>> However, this patch is specifically designed to address CPU core
>>>> isolation scenarios (Such as adding the 'isolcpus=xx' kernel command
>>>> line parameter).
>>
>> In the isolation scenario are you for some cpus, enabling MPAM, using
>> those cpus but not taking into account the parameters of the 
>> associated MSC?
> 
> In the CPU core isolation scenario, the CPU affinity information of MSC
> must be reported. In fact, ACPI MPAM table has already designed a
> mechanism for reporting the affinity information of each MSC instance.
> 
> Through the "Hardware ID of linked device" and "Instance ID of linked
> device" fields, the container and container ID to which the MSC belongs
> are specified respectively, thereby obtaining the MSC affinity
> information.
> 
> The kernel is responsible for parsing this information and determines
> which MSCs should be initialized based on the currently online CPUs.
> 
>>
>>>>
>>>> The patch allows the MPAM driver to successfully complete the
>>>> initialization of online MSCs even when the system is booted with
>>>> certain cores isolated or disabled. The patch ensures that MPAM
>>>> initialization is decoupled from the requirement that all CPUs must be
>>>> online during the probing phase.
>>>>
>>>> CPU core isolation is indeed a common production scenario. This
>>>> functionality requires the kernel to enable functionalities in the
>>>> presence of faulty cores (which cannot be recovered through cold boot).
>>>> This ensures system reliability and availability on multi-core
>>>> processors where single-core faults.
>>>>
>>>> Without this patch would prevent MPAM from initialization under CPU 
>>>> core
>>>> isolation scenarios. Apologies for not mentioning in the patch: we can
>>>> verify the functionality by adding 'maxcpus=1' to the boot parameters.
>>
>> For 'maxcpus=1' I think the correct behaviour is to not enable MPAM as
>> the other CPUs can then be turned on afterwards. E.g by
>> echo 1 > /sys/devices/system/cpu/cpuX/online
>>
>> For faulty cores how would you ensure they are never turned on?
>>
> 
> The maxcpus=1 is merely an extreme simulation scenario. In production
> environments, detected faulty cores have already been disabled by the
> BIOS firmware and cannot be brought online again.
> 
> 

Even the faulty cores or offline CPUs are turned on, the patch does not
affect the automatic recovery and bring-up of MPAM MSCs.

Adding 'maxcpus=1' to the boot parameters, and testing with the patch
applied is as follows:

  # mount -t resctrl l resctrl /sys/fs/resctrl/
  # cat /sys/fs/resctrl/schemata
  L2:4=ff
  L3:1=1ffff

  # echo 1 > /sys/devices/system/cpu/cpu2/online
  # cat /sys/fs/resctrl/schemata
  L2:4=ff;7=ff
  L3:1=1ffff

  # echo 1 > /sys/devices/system/cpu/cpu16/online
  # cat /sys/fs/resctrl/schemata
  L2:4=ff;7=ff;29=ff 

  L3:1=1ffff;26=1ffff

  # echo 0 > /sys/devices/system/cpu/cpu16/online
  # cat /sys/fs/resctrl/schemata
  L2:4=ff;7=ff
  L3:1=1ffff

  # echo 0 > /sys/devices/system/cpu/cpu2/online
  # cat /sys/fs/resctrl/schemata
  L2:4=ff
  L3:1=1ffff



Best Regards,
Zeng Heng

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ