[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b6c5ea13-1d7e-48f2-a1c4-e24d0ba30a63@redhat.com>
Date: Mon, 21 Jul 2025 14:59:01 +1000
From: Gavin Shan <gshan@...hat.com>
To: James Morse <james.morse@....com>, linux-kernel@...r.kernel.org
Cc: sdonthineni@...dia.com, rohit.mathew@....com,
carl@...amperecomputing.com, shan.gavin@...il.com, Koba Ko <kobak@...dia.com>
Subject: Re: [PATCH MPAM 6.16.rc1] arm_mpam: Enforce to recalculate mbw_min on
configuration
Hi James,
On 7/14/25 8:15 PM, James Morse wrote:
> On 26/06/2025 10:52, Gavin Shan wrote:
>> mpam_feat_mbw_max is exposed to user by resctrlfs, but mpam_feat_mbw_min
>> should be recalculated based on the maximal memory bandwidth in every
>> configuration, which has been missed unfortunately. When a new group is
>> created, the default and maximal memory bandwidth percentage (100%) is
>> configured, the configuration instance (struct mpam_config) on the stack
>> is updated, including the minimal and maximal memory bandwidth. It's
>> copied to the domain's configuration instance. On next time when user
>> tries to configure by writing 'schemata', the minimal memory bandwidth
>> isn't never recalculated because mpam_feat_mbw_min has been seen in the
>> configuration, which inherits from the domain's instance.
>>
>> For example, the value of register MPAMCFG_MBW_MIN is never changed no
>> matter what configuration is set.
>
> Thanks for catching this. I think this was introduced by the patch that picked
> the component copy of the configuration, modified it - and passed it back into
> the driver. That had the additional side effect of changes going missing as the
> mpam_update_config() had the same config twice...
>
> I fixed that shortly after rc1, with that fixed I don't think this can happen -
> resctrl never sets the 'min', meaning mpam_extent_config() will always regenerate
> the value.
>
> This shouldn't happen with the latest version of the tree.
> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/log/?h=mpam/snapshot/v6.16-rc4
>
The issue still exists in v6.16-rc4 and v6.16-rc5. Here are the dumped registers
on the varied MBW configurations, and MPAMCFG_MBW_MIN doesn't alter.
# uname -r
6.16.0-rc5-gavin
# mount -tresctrl none /sys/fs/resctrl
# mkdir -p /sys/fs/resctrl/all
# mkdir -p /sys/fs/resctrl/test
# cd /sys/fs/resctrl/test
# echo "MB:1=100" > schemata
# cat /proc/dump
MPAMCFG_PART_SEL 00000002
MAPMF_MBW_IDR 00000c07
MPAMCFG_MBW_MAX 0000ffff
MPAMCFG_MBW_MIN 0000f000
# echo "MB:1=2" > schemata
# cat /proc/dump
MPAMCFG_PART_SEL 00000002
MAPMF_MBW_IDR 00000c07
MPAMCFG_MBW_MAX 000005ff
MPAMCFG_MBW_MIN 0000f000
The problem is mpam_feat_mbw_min is always set in 'struct mpam_config'. With this,
cfg->mbw_min won't be recalculated. I guess we probably just need to simply drop
the check to that cfg->mbw_min can be recalculated every time? Could you please
fix it up in place if the resolution looks good to you, I also can post (v2) to
include the changes if you prefer. I don't know how it can be handled since the
code isn't merged to upstream yet.
static void mpam_extend_config(struct mpam_class *class, struct mpam_config *cfg)
{
:
/* The mpam_feat_mbw_min needs to be dropped */
if (mpam_has_feature(mpam_feat_mbw_max, cfg) &&
!mpam_has_feature(mpam_feat_mbw_min, cfg)) {
delta = ((5 * MPAMCFG_MBW_MAX_MAX) / 100) - 1;
if (cfg->mbw_max > delta)
min = cfg->mbw_max - delta;
else
min = 0;
cfg->mbw_min = max(min, min_hw_granule);
mpam_set_feature(mpam_feat_mbw_min, cfg);
}
if (mpam_has_quirk(T241_FORCE_MBW_MIN_TO_ONE, class) &&
cfg->mbw_min <= min_hw_granule) {
cfg->mbw_min = min_hw_granule + 1;
mpam_set_feature(mpam_feat_mbw_min, cfg);
}
}
>
>> diff --git a/drivers/platform/arm64/mpam/mpam_devices.c
> b/drivers/platform/arm64/mpam/mpam_devices.c
>> index df8730491de2..4845dcb8e601 100644
>> --- a/drivers/platform/arm64/mpam/mpam_devices.c
>> +++ b/drivers/platform/arm64/mpam/mpam_devices.c
>> @@ -3192,6 +3192,9 @@ static void mpam_extend_config(struct mpam_class *class, struct
> mpam_config *cfg
>> u16 min, min_hw_granule, delta;
>> u16 max_hw_value, res0_bits;
>>
>> + if (!mpam_has_feature(mpam_feat_mbw_max, cfg))
>> + return;
>
> N.B. - The T421 quirk needs to ensure the minimum value is never set to 0 - which is what
> mpam_reprogram_ris_partid() will do if there is no minimum config set.
>
> The aim of the quirk is that on affected platforms the mbm_min will always be set, and
> never be zero in any configuration. Whether it then gets applied to the hardware depends
> on whether the hardware has the feature.
> This makes it simpler to think about - and easier to test on platforms that aren't
> affected by the errata.
>
> I'd prefer not to to try and spot configurations where its going to matter as it will be
> hard to debug the workaround going wrong!
>
Ok, thanks for the explanation. In that case, we shouldn't bail early. The mpam_feat_mbw_min
check in mpam_extend_config() needs to be dropped, as explained above.
>
>> @@ -3211,23 +3214,17 @@ static void mpam_extend_config(struct mpam_class *class, struct
> mpam_config *cfg
>> *
>> * Resctrl can only configure the MAX.
>> */
>> - if (mpam_has_feature(mpam_feat_mbw_max, cfg) &&
>> - !mpam_has_feature(mpam_feat_mbw_min, cfg)) {
>> - delta = ((5 * MPAMCFG_MBW_MAX_MAX) / 100) - 1;
>> - if (cfg->mbw_max > delta)
>> - min = cfg->mbw_max - delta;
>> - else
>> - min = 0;
>> + delta = ((5 * MPAMCFG_MBW_MAX_MAX) / 100) - 1;
>> + if (cfg->mbw_max > delta)
>> + min = cfg->mbw_max - delta;
>> + else
>> + min = 0;
>>
>> - cfg->mbw_min = max(min, min_hw_granule);
>> - mpam_set_feature(mpam_feat_mbw_min, cfg);
>> - }
>> + cfg->mbw_min = max(min, min_hw_granule);
>
> While resctrl doesn't use it - the idea is the mpam_devices interface should support any
> mpam control being set. This overwrites what the user asked for.
>
> The devices/resctrl split might look strange - but its so the resctrl ABI implications are
> contained to one file, and another user of the mpam_devices interface can be added later.
> (KVM in-kernel MSC emulation was one proposal, another came from people who believe they
> need to change the hardware policy more frequently than they can with a syscall ... I
> don't like either of these!)
>
hmm, it's intresting to know we can't tolerate the time consumed to finish a syscall.
The MPAM would have static configurations, meaning it's static after the configuration
is given. Could you share who is looking into MPAM virtualization? Actually, David
and I also looked into MPAM virtualization support, but we had everything emulated in
QEMU in the draft design.
Thanks,
Gavin
>
> Thanks,
>
> James
>
Powered by blists - more mailing lists