[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6c99fe9e-4f7c-8a82-bf43-48449d553172@arm.com>
Date: Wed, 28 Sep 2022 15:09:17 +0100
From: James Morse <james.morse@....com>
To: Shawn Wang <shawnwang@...ux.alibaba.com>,
Reinette Chatre <reinette.chatre@...el.com>,
fenghua.yu@...el.com
Cc: tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] x86/resctrl: Clear the staged configs when destroying
schemata list
Hi Shawn,
On 28/09/2022 14:37, Shawn Wang wrote:
> On 9/28/22 5:21 AM, Reinette Chatre wrote:
>> On 9/27/2022 6:06 AM, James Morse wrote:
>>> On 27/09/2022 03:54, Shawn Wang wrote:
>>>> Array staged_config in struct rdt_domain still maintains the original value when
>>>> resctrl is unmounted. If resctrl is mounted with cdp option and then remounted
>>>> without cdp option, field have_new_ctrl in staged_config[CDP_CODE] and
>>>> staged_config[CDP_DATA] will still be true.
>>>
>>> staged_config[CDP_DATA] is an array - its always 'true'. I think you mean
>>> staged_config[CDP_DATA].have_new_ctrl, which will still be true because it is only
>>> memset() when the schemata file is written to.
>>>
>>>
>>>> Since resctrl_arch_update_domains()
>>>> traverses all resctrl_conf_type, it will continue to update CDP_CODE and
>>>> CDP_DATA configurations, which can cause overflow problem.
>>>
>>> Only if its called with a stale staged config, and it should only be called when the
>>> schemata file is written to, which would memset() the staged config first.
>>>
>>>
>>>> The problem can be reproduced by the following commands:
>>>> # A system with 16 usable closids and mba disabled
>>>> mount -t resctrl resctrl -o cdp /sys/fs/resctrl
>>>> mkdir /sys/fs/resctrl/p{1..7}
>>>> umount /sys/fs/resctrl/
>>>> mount -t resctrl resctrl /sys/fs/resctrl
>>>> mkdir /sys/fs/resctrl/p{1..8}
>>>
>>> Thanks for the reproducer - but I don't see what could set have_new_ctrl in this sequence.
>>> You can't call apply_config() to set CPUs in the mask without that being set.
>>>
>>> Creating a new control group, (your mkdir step) shouldn't touch the hardware at all, as it
>>> should be left in its reset state from the last umount(), or setup.
>>
>> There is an attempt to configure the hardware in the mkdir path:
>> rdtgroup_mkdir()->rdtgroup_mkdir_ctrl_mon()->rdtgroup_init_alloc()->resctrl_arch_update_domains()
>>
>>
>> This is required in support of the different resource group modes. When a new
>> resource group is created via mkdir the configuration should respect any
>> exclusive resource groups that exist at that point.
>>
>>>
>>> I can't reproduce this on v6.0-rc7.
>>> Even if I dirty the configuration by writing to the schemata file, I can't reproduce this.
>>>
>>
>> From what I can tell the reproducer is dependent on (a) whether hardware
>> supports CDP, and (b) the number of CLOSIDs supported by the system. The reproducer
>> works for hardware that has 16 CLOSIDs (see later).
>>
>>> (I have mba enabled, but all this should affect is the number of closid available)
> I reproduce this on v6.0-rc6. The key to reproduction is to ensure that the number of
> usable groups is different between CDP enabled and CDP disabled.
>
> The system I use has 16 CLOSIDs for L3 CAT and 8 CLOSIDs for MBA. MBA limits the max
> number of groups to 8, even if CDP is disabled. This is the reason why I disable MBA.
bingo - could that detail be included in the commit message?
[..]
>> What do you think about clearing the staged config within resctrl_arch_update_domains()
>> after the configuration is complete and there is no more need for it? That may reduce
>> complexity where each caller no longer need to remember to do so.
>> I see "staged_config" as a temporary storage and it my help to understand the code better
>> if it is treated as such.
>>
>
> I think this is better. I have tested it and will give a new version.
Great, thanks!
Could you mention have_new_ctrl in the commit message, and this path via
rdtgroup_init_alloc().
I think you also need:
Fixes: 75408e43509ed ("x86/resctrl: Allow different CODE/DATA configurations to be staged")
Cc: <stable@...r.kernel.org>
Thanks,
James
Powered by blists - more mailing lists