[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <46767ca7-1f1b-48e8-8ce6-be4b00d129f9@intel.com>
Date: Fri, 6 Dec 2024 08:14:56 -0800
From: Reinette Chatre <reinette.chatre@...el.com>
To: Peter Newman <peternewman@...gle.com>
CC: Fenghua Yu <fenghua.yu@...el.com>, James Morse <james.morse@....com>,
Stephane Eranian <eranian@...gle.com>, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, Dave Hansen
<dave.hansen@...ux.intel.com>, <x86@...nel.org>, "H. Peter Anvin"
<hpa@...or.com>, Peter Zijlstra <peterz@...radead.org>, Juri Lelli
<juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt
<rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman
<mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, Uros Bizjak
<ubizjak@...il.com>, Mike Rapoport <rppt@...nel.org>, "Kirill A. Shutemov"
<kirill.shutemov@...ux.intel.com>, Rick Edgecombe
<rick.p.edgecombe@...el.com>, Xin Li <xin3.li@...el.com>, Babu Moger
<babu.moger@....com>, Shaopeng Tan <tan.shaopeng@...itsu.com>, "Maciej
Wieczor-Retman" <maciej.wieczor-retman@...el.com>, Jens Axboe
<axboe@...nel.dk>, Christian Brauner <brauner@...nel.org>, Oleg Nesterov
<oleg@...hat.com>, Andrew Morton <akpm@...ux-foundation.org>, Tycho Andersen
<tandersen@...flix.com>, Nicholas Piggin <npiggin@...il.com>, Beau Belgrave
<beaub@...ux.microsoft.com>, "Matthew Wilcox (Oracle)" <willy@...radead.org>,
<linux-kernel@...r.kernel.org>, Tony Luck <tony.luck@...el.com>
Subject: Re: [PATCH v1 3/6] x86/resctrl: Disallow mongroup rename on MPAM
Hi Peter,
On 12/6/24 2:23 AM, Peter Newman wrote:
> On Fri, Dec 6, 2024 at 5:14 AM Reinette Chatre
> <reinette.chatre@...el.com> wrote:
>> On 12/5/24 7:03 AM, Peter Newman wrote:
>>> On Sat, Apr 6, 2024 at 12:10 AM Peter Newman <peternewman@...gle.com> wrote:
>>>> On Thu, Apr 4, 2024 at 4:11 PM Reinette Chatre
>>>> <reinette.chatre@...el.com> wrote:
>>>>>
>>>>> Hi Peter,
>>>>>
>>>>> On 3/25/2024 10:27 AM, Peter Newman wrote:
>>>>>> Moving a monitoring group to a different parent control assumes that the
>>>>>> monitors will not be impacted. This is not the case on MPAM where the
>>>>>> PMG is an extension of the PARTID.
>>>>>>
>>>>>> Detect this situation by requiring the change in CLOSID not to affect
>>>>>> the result of resctrl_arch_rmid_idx_encode(), otherwise return
>>>>>> -EOPNOTSUPP.
>>>>>>
>>>>>
>>>>> Thanks for catching this. This seems out of place in this series. It sounds
>>>>> more like an independent fix that should go in separately.
>>>>
>>>> I asserted in a comment in a patch later in the series that the
>>>> mongroup parent pointer never changes on MPAM, then decided to follow
>>>> up on whether it was actually true, so it's only here because this
>>>> series depends on it. I'll post it again separately with the fix you
>>>> requested below.
>>>
>>> I'm preparing to finally re-post this patch, but another related
>>> question came up...
>>>
>>> It turns out the check added to mongroup rename in this patch is an
>>> important property that the user needs in order to correctly interpret
>>> the value returned by info/L3_MON/num_rmids.
>>>
>>> I had told some Google users to attempt to move a monitoring group to
>>> a new parent to determine whether the monitoring groups are
>>> independent of their parent ctrl_mon groups. This approach proved
>>> unwieldy when used on a system where the maximum number of allowed
>>> groups has already been created. (In their problem case, a
>>> newly-created process wanted to determine this property independently
>>> of the older process which had originally mounted resctrl.)
>>
>> Could you please elaborate why users need the information that monitoring
>> groups are independent of their parent ctrl_mon group?
>> I understood from [2] that Arm can be expected to support moving monitor
>> groups so I am concerned about userspace building some assumptions like
>> "if the monitoring groups are dependent on their parent ctrl_mon groups then
>> monitor groups cannot be moved".
>
> It's to make the high-level decision of whether ctrl_mon groups or
> mon_groups are to be used as the direct container of a job's tasks.
>
> Given the semantics of PARTIDs vs PMGs on MPAM, the ctrl_mon groups
> will always be chosen as the container for tasks, so I don't have a
> use case for moving a monitoring group on MPAM.
>
> Perhaps determining whether moving a monitoring group is allowed in
> order to decide whether to use the model that requires it is
> backwards.
>
>>
>>>
>>> Also, this information does not require an active rdtgroup pointer or
>>> CLOSID/RMID. The following will also work:
>>>
>>> resctrl_arch_rmid_idx_encode(0, 0) != resctrl_arch_rmid_idx_encode(1, 0);
>>>
>>> I propose adding a new info file returning the result of this
>>> expression to be placed beside num_rmids. I would name it
>>> "mon_id_includes_control_id", as it implies that the hardware
>>> logically interprets the monitoring ID as concatenated with its parent
>>> control ID. This tells the user whether num_rmids defines the number
>>> of monitoring groups (and ctrl_mon groups) which can be created system
>>> wide or whether it's one more than the number of monitoring groups
>>> which can be created below every ctrl_mon group.
>>
>> Is the goal purely to guide interpretation of "info/L3_MON/num_rmids"?
>
> Partially, I suppose. It is necessary to know how many monitoring
> groups can be created.
>
>>
>> The "mon_id_includes_control_id" seems unique to a use case and if I understand
>> correctly needs additional interpretation from user space to reach the actual
>> data of interest, which I understand to be the number of monitor groups that
>> can be created.
>>
>> A while ago James proposed [1] adding a new "num_groups" inside each "mon_groups"
>> directory, on x86 it will be the same as num_rmids, on Arm it will be the maximum PMG bits.
>> At a high level(*) I think something like this may be an intuitive way to address this
>> issue. What do you think?
>
> On ARM, it would mean num_groups child mon groups can be created,
> while on x86, it would mean between 0 and num_groups child mon_groups
> can be created.
>
> In either case, all instances of the file would return the same value
> at any time.
hm, yes.
The documentation currently states:
"num_rmids":
The number of RMIDs available. This is the
upper bound for how many "CTRL_MON" + "MON"
groups can be created.
On x86 the "upper bound for how many "CTRL_MON" + "MON"
groups can be created" is the number of RMIDs, while on Arm it is
"<num PARTIDs> * <num PMGs>".
It seems reasonable to have a global file that indicates total number of monitoring
groups that can be created while a local file indicates number of monitoring groups per
control group.
We could thus follow your proposal to start referring to the global "num_rmids"
as "num_mon_ids" that for x86 would contain number of RMIDs and for Arm would
contain "<num PARTIDs> * <num PMGs>".
Partnering the global "num_mon_ids" file with a per CTRL_MON "num_mon_groups" file,
where "num_mon_groups" contains number of RMIDs on x86 and number of PMG on Arm, may
give necessary insight to user space to make needed decisions?
I do not know what would be more useful here ... if expectation is that the
local "num_mon_groups" would just be the same value for all groups then it could be a
global file also (with more appropriate name). Another alternative is if it is local it
could be a dynamic value of how many monitoring groups are available for creation. This
may be complicated with the limbo handling but worth considering when thinking what would
be most useful to user space.
>> (*) If considering this I do think it may be more intuitive to have the file
>> be at the top level of the control group and be named "num_mon_groups" (or something better)
>> to support the idea that it includes the CTRL_MON group self and not that all monitor groups
>> are within the mon_groups directory.
>>
>> Also please keep in mind that during the discussions about moving monitoring groups
>> there were some ideas of needing to provide additional information to user space about
>> whether counters are reset on a monitor group move. I think that if we are starting
>> to look at these random properties as files in resctrl (like "mon_id_includes_control_id"
>> and maybe "counter_reset_on_monitor_group_move") then we should consider some alternatives
>> to present these flags to prevent resctrl's info directory from turning into a mess.
>
> In the meantime, num_rmids > num_closids will probably be a good
> enough heuristic to correctly guess the ID model.
While it works it is not ideal how user space is required to be familiar with implementation
details. Interesting is that I notice that while the documentation already links num_rmids with
the number of monitoring groups that can be created, it does not do the same for num_closids and
control groups. Closest is the vague "creation of a "CTRL_MON" directory *may* fail if we run out
of either CLOSID or RMID" (emphasis added by me).
Reinette
Powered by blists - more mailing lists