lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b6c5ea13-1d7e-48f2-a1c4-e24d0ba30a63@redhat.com>
Date: Mon, 21 Jul 2025 14:59:01 +1000
From: Gavin Shan <gshan@...hat.com>
To: James Morse <james.morse@....com>, linux-kernel@...r.kernel.org
Cc: sdonthineni@...dia.com, rohit.mathew@....com,
 carl@...amperecomputing.com, shan.gavin@...il.com, Koba Ko <kobak@...dia.com>
Subject: Re: [PATCH MPAM 6.16.rc1] arm_mpam: Enforce to recalculate mbw_min on
 configuration

Hi James,

On 7/14/25 8:15 PM, James Morse wrote:
> On 26/06/2025 10:52, Gavin Shan wrote:
>> mpam_feat_mbw_max is exposed to user by resctrlfs, but mpam_feat_mbw_min
>> should be recalculated based on the maximal memory bandwidth in every
>> configuration, which has been missed unfortunately. When a new group is
>> created, the default and maximal memory bandwidth percentage (100%) is
>> configured, the configuration instance (struct mpam_config) on the stack
>> is updated, including the minimal and maximal memory bandwidth. It's
>> copied to the domain's configuration instance. On next time when user
>> tries to configure by writing 'schemata', the minimal memory bandwidth
>> isn't never recalculated because mpam_feat_mbw_min has been seen in the
>> configuration, which inherits from the domain's instance.
>>
>> For example, the value of register MPAMCFG_MBW_MIN is never changed no
>> matter what configuration is set.
> 
> Thanks for catching this. I think this was introduced by the patch that picked
> the component copy of the configuration, modified it - and passed it back into
> the driver. That had the additional side effect of changes going missing as the
> mpam_update_config() had the same config twice...
> 
> I fixed that shortly after rc1, with that fixed I don't think this can happen -
> resctrl never sets the 'min', meaning mpam_extent_config() will always regenerate
> the value.
> 
> This shouldn't happen with the latest version of the tree.
> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/log/?h=mpam/snapshot/v6.16-rc4
> 

The issue still exists in v6.16-rc4 and v6.16-rc5. Here are the dumped registers
on the varied MBW configurations, and MPAMCFG_MBW_MIN doesn't alter.

   # uname -r
   6.16.0-rc5-gavin
   # mount -tresctrl none /sys/fs/resctrl
   # mkdir -p /sys/fs/resctrl/all
   # mkdir -p /sys/fs/resctrl/test
   # cd /sys/fs/resctrl/test

   # echo "MB:1=100" > schemata
   # cat /proc/dump
   MPAMCFG_PART_SEL            00000002
   MAPMF_MBW_IDR               00000c07
   MPAMCFG_MBW_MAX             0000ffff
   MPAMCFG_MBW_MIN             0000f000

   # echo "MB:1=2" > schemata
   # cat /proc/dump
   MPAMCFG_PART_SEL            00000002
   MAPMF_MBW_IDR               00000c07
   MPAMCFG_MBW_MAX             000005ff
   MPAMCFG_MBW_MIN             0000f000

The problem is mpam_feat_mbw_min is always set in 'struct mpam_config'. With this,
cfg->mbw_min won't be recalculated. I guess we probably just need to simply drop
the check to that cfg->mbw_min can be recalculated every time? Could you please
fix it up in place if the resolution looks good to you, I also can post (v2) to
include the changes if you prefer. I don't know how it can be handled since the
code isn't merged to upstream yet.

static void mpam_extend_config(struct mpam_class *class, struct mpam_config *cfg)
{
	:
         /* The mpam_feat_mbw_min needs to be dropped */
	if (mpam_has_feature(mpam_feat_mbw_max, cfg) &&
             !mpam_has_feature(mpam_feat_mbw_min, cfg)) {
                 delta = ((5 * MPAMCFG_MBW_MAX_MAX) / 100) - 1;
                 if (cfg->mbw_max > delta)
                         min = cfg->mbw_max - delta;
                 else
                         min = 0;

                 cfg->mbw_min = max(min, min_hw_granule);
                 mpam_set_feature(mpam_feat_mbw_min, cfg);
         }

         if (mpam_has_quirk(T241_FORCE_MBW_MIN_TO_ONE, class) &&
             cfg->mbw_min <= min_hw_granule) {
                 cfg->mbw_min = min_hw_granule + 1;
                 mpam_set_feature(mpam_feat_mbw_min, cfg);
         }
}

> 
>> diff --git a/drivers/platform/arm64/mpam/mpam_devices.c
> b/drivers/platform/arm64/mpam/mpam_devices.c
>> index df8730491de2..4845dcb8e601 100644
>> --- a/drivers/platform/arm64/mpam/mpam_devices.c
>> +++ b/drivers/platform/arm64/mpam/mpam_devices.c
>> @@ -3192,6 +3192,9 @@ static void mpam_extend_config(struct mpam_class *class, struct
> mpam_config *cfg
>>   	u16 min, min_hw_granule, delta;
>>   	u16 max_hw_value, res0_bits;
>>
>> +	if (!mpam_has_feature(mpam_feat_mbw_max, cfg))
>> +		return;
> 
> N.B. - The T421 quirk needs to ensure the minimum value is never set to 0 - which is what
> mpam_reprogram_ris_partid() will do if there is no minimum config set.
> 
> The aim of the quirk is that on affected platforms the mbm_min will always be set, and
> never be zero in any configuration. Whether it then gets applied to the hardware depends
> on whether the hardware has the feature.
> This makes it simpler to think about - and easier to test on platforms that aren't
> affected by the errata.
> 
> I'd prefer not to to try and spot configurations where its going to matter as it will be
> hard to debug the workaround going wrong!
> 

Ok, thanks for the explanation. In that case, we shouldn't bail early. The mpam_feat_mbw_min
check in mpam_extend_config() needs to be dropped, as explained above.

> 
>> @@ -3211,23 +3214,17 @@ static void mpam_extend_config(struct mpam_class *class, struct
> mpam_config *cfg
>>   	 *
>>   	 * Resctrl can only configure the MAX.
>>   	 */
>> -	if (mpam_has_feature(mpam_feat_mbw_max, cfg) &&
>> -	    !mpam_has_feature(mpam_feat_mbw_min, cfg)) {
>> -		delta = ((5 * MPAMCFG_MBW_MAX_MAX) / 100) - 1;
>> -		if (cfg->mbw_max > delta)
>> -			min = cfg->mbw_max - delta;
>> -		else
>> -			min = 0;
>> +	delta = ((5 * MPAMCFG_MBW_MAX_MAX) / 100) - 1;
>> +	if (cfg->mbw_max > delta)
>> +		min = cfg->mbw_max - delta;
>> +	else
>> +		min = 0;
>>
>> -		cfg->mbw_min = max(min, min_hw_granule);
>> -		mpam_set_feature(mpam_feat_mbw_min, cfg);
>> -	}
>> +	cfg->mbw_min = max(min, min_hw_granule);
> 
> While resctrl doesn't use it - the idea is the mpam_devices interface should support any
> mpam control being set. This overwrites what the user asked for.
> 
> The devices/resctrl split might look strange - but its so the resctrl ABI implications are
> contained to one file, and another user of the mpam_devices interface can be added later.
> (KVM in-kernel MSC emulation was one proposal, another came from people who believe they
>   need to change the hardware policy more frequently than they can with a syscall ... I
>   don't like either of these!)
> 

hmm, it's intresting to know we can't tolerate the time consumed to finish a syscall.
The MPAM would have static configurations, meaning it's static after the configuration
is given. Could you share who is looking into MPAM virtualization? Actually, David
and I also looked into MPAM virtualization support, but we had everything emulated in
QEMU in the draft design.

Thanks,
Gavin

> 
> Thanks,
> 
> James
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ