[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <32a588e2-7b09-4257-b838-4268583a724d@intel.com>
Date: Mon, 26 Feb 2024 13:20:50 -0800
From: Reinette Chatre <reinette.chatre@...el.com>
To: <babu.moger@....com>, James Morse <james.morse@....com>, <corbet@....net>,
<fenghua.yu@...el.com>, <tglx@...utronix.de>, <mingo@...hat.com>,
<bp@...en8.de>, <dave.hansen@...ux.intel.com>
CC: <x86@...nel.org>, <hpa@...or.com>, <paulmck@...nel.org>,
<rdunlap@...radead.org>, <tj@...nel.org>, <peterz@...radead.org>,
<yanjiewtw@...il.com>, <kim.phillips@....com>, <lukas.bulwahn@...il.com>,
<seanjc@...gle.com>, <jmattson@...gle.com>, <leitao@...ian.org>,
<jpoimboe@...nel.org>, <rick.p.edgecombe@...el.com>,
<kirill.shutemov@...ux.intel.com>, <jithu.joseph@...el.com>,
<kai.huang@...el.com>, <kan.liang@...ux.intel.com>,
<daniel.sneddon@...ux.intel.com>, <pbonzini@...hat.com>,
<sandipan.das@....com>, <ilpo.jarvinen@...ux.intel.com>,
<peternewman@...gle.com>, <maciej.wieczor-retman@...el.com>,
<linux-doc@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<eranian@...gle.com>
Subject: Re: [PATCH v2 00/17] x86/resctrl : Support AMD Assignable Bandwidth
Monitoring Counters (ABMC)
Hi Babu,
On 2/26/2024 9:59 AM, Moger, Babu wrote:
> On 2/23/24 16:21, Reinette Chatre wrote:
>> On 2/23/2024 12:11 PM, Moger, Babu wrote:
>>> On 2/23/24 11:17, Reinette Chatre wrote:
>>>>
>>>>
>>>> On 2/20/2024 12:48 PM, Moger, Babu wrote:
>>>>> On 2/20/24 09:21, James Morse wrote:
>>>>>> On 19/01/2024 18:22, Babu Moger wrote:
>>>>
>>>>>>> e. Enable ABMC mode.
>>>>>>>
>>>>>>> #echo 1 > /sys/fs/resctrl/info/L3_MON/mbm_assign_enable
>>>>>>> #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_enable
>>>>>>> 1
>>>>>>
>>>>>> Why does this mode need enabling? Can't it be enabled automatically on hardware that
>>>>>> supports it, or enabled implicitly when the first assignment attempt arrives?
>>>>>>
>>>>>> I guess this is really needed for a reset - could we implement that instead? This way
>>>>>> there isn't an extra step user-space has to do to make the assignments work.
>>>>>
>>>>> Mostly the new features are added as an opt-in method. So, kept it that
>>>>> way. If we enable this feature automatically, then we have provide an
>>>>> option to disable it.
>>>>>
>>>>
>>>> At the same time it sounds to me like ABMC can improve current users'
>>>> experience without requiring them to do anything. This sounds appealing.
>>>> For example, if I understand correctly, it may be possible to start resctrl
>>>> with ABMC enabled by default and the number of monitoring groups (currently
>>>> exposed to user space via "num_rmids") limited to the number of counters
>>>> supported by ABMC. Existing users would then by default obtain better behavior
>>>> of counters not resetting.
>>>
>>> Yes, I like the idea. But i will break compatibility with pqos
>>> tool(intel_cmt_cat utility). pqos tool monitoring will not work without
>>> supporting ABMC enablement in the tool. ABMC feature requires an extra
>>> step to assign the counters for monitor to work.
>>
>> I am considering two scenarios, the "default behavior" is what a user will
>> experience when booting resctrl on an ABMC system and the "new feature
>> behavior" where a user can take full advantage of all that ABMC (and soft
>> RMID, and MPAM) can offer.
>>
>> So, first, on an ABMC system in the "default behavior" scenario I expect
>> that resctrl can do required ABMC counter configuration automatically at
>> the time a monitor group is created. In this "default behavior" scenario
>> resctrl would expose "num_rmids" to be half of the number of assignable
>> counters. When a user then creates a monitor group two counters will be
>> used and configured to count the local and total bytes respectively. If
>> two counters are not available then ENOSPC returned, just like when system
>> is out of closid/rmid. With this "default behavior" user space thus gets
>> improved behavior without making any changes on its part. I do not have
>
> We can automatically assign the h/w counter when monitor group is created
> until we run out of h/w counters. That is good idea. By default user will
> not notice any difference in ABMC mode.
>
>> insight into how many counters ABMC could be expected to expose though ...
>> so some users may be surprised at how few monitor groups can be created
>> with new hardware? This may not be an issue since that would accurately
>> reflect how many _reliable_ monitor groups can be created and if user needs
>> more monitor groups then that would be a time to explore the "new feature"
>> that requires changes in how user interacts with resctrl.
>
> Currently, 32 h/w counters are available to configure. With two counters
> for each group, we can create 16 groups(15 new groups plus the default
> group). That should be fine as pqos tool creates only 16 groups when it is
> started.
user space can never assume that a certain number of groups can
be created.
>> Apart from the "default behavior" there are two options to consider ...
>> (a) the "original" behavior(? I do not know what to call it) - this would be
>> where user space wants(?) to have the current non-ABMC behavior on an ABMC
>> system, where the previous "num_rmids" monitor groups can be created but
>> the counters are reset unpredictably ... should this still be supported
>> on ABMC systems though?
>
> I would say yes. For some reason user(hardware or software issues) is not
> able to use ABMC mode, they have an option to go back to legacy mode.
I see. Should this perhaps be protected behind the resctrl "debug" mount option?
>> (b) the "new feature" behavior where user space gets full benefit of ABMC
>> that allows user space to create any number of monitor groups but then
>> user space needs to let hardware (via resctrl) know which
>> events should be counted.
>
> Is this "new feature" is enabled by default when ABMC is available?
Not in this design, no. In these scenarios ABMC will be available and enabled
in both the "default" and "new feature" behavior. The difference is no user
space changes are needed in "default" scenario and resctrl limits the number
of monitor groups to support all monitor groups to be backed by hardware
counters.
When "new feature" is enabled when ABMC is available and enabled then
user space is able to create more monitor groups than available hardware
counters and new user interface is required to manage associating counters
with monitor events.
>
> Or we need to provide an interface to enable this feature?
Yes, an interface will be needed to enable this feature.
>
>
>>
>> I expect that only (b) above would require user space change. Considering
>> that per documentation, "num_rmids" means "This is the upper bound for how
>> many "CTRL_MON" + "MON" groups can be created" I expect that "num_rmids"
>> becomes undefined when "new feature" is enabled. When this new feature is enabled
>> then user space is no longer limited by number of RMIDs on how many monitor
>
> With ABMC, we will have a new field "mbm_assignable_counters". We don't
> have to change the definition of "num_rmids".
The problem here is that "num_rmids" is (as per Documentation/arch/x86/resctrl.rst)
documented to be an upper bound for how many monitor groups can be created.
As I understand, when ABMC is enabled and its full capability exposed to user
space then there is no limit to how many monitor groups can be created, no?
For example, if I understand correctly, theoretically, when ABMC is enabled then
"num_rmids" can be U32_MAX (after a quick look it is not clear to me why r->num_rmid
is not unsigned, tbd if number of directories may also be limited by kernfs).
User space could theoretically create more monitor groups than the number of
rmids that a resource claims to support using current upstream enumeration.
Instead, it is the "mbm_assignable_counters" that is of interest, that is what
user space uses to determine how many of the (potentially very large number of)
monitor groups/monitor events can be counted at any particular time.
>> groups can be created and this is the point that the user interface that you
>> and Peter have ideas about comes into play. Specifically, user space needing
>> a way to specify:
>> (a) "let me create more monitor groups that the hardware can support"/"let me
>> control which events/monitor groups are counted"
>> (like the "mbm_assign" file in your proposal)
>> (b) "here are the events that need to be counted"
>> (like the "monitor_state" and "mbm_{local,total}_bytes_assigned" proposals)
>
> With global assignment option out of way for now(may be introduced later),
> we can provide two interfaces.
>
> 1. /sys/fs/resctrl/info/L3_MON/mbm_assign
> This will be enabled by default when ABMC is available. Users can disable
> this option to go back to legacy mode.
Potentially (all naming placeholders that will only be visible on systems that
actually supports particular mode):
legacy [default] new_feature soft_rmid
>
> 2. /sys/fs/resctrl/monitor_state.
> This can used to individually assign or unassign the counters in each group.
>
> When assigned:
> #cat /sys/fs/resctrl/monitor_state
> 0=total-assign,local-assign;1=total-assign,local-assign
>
> When unassigned:
> #cat /sys/fs/resctrl/monitor_state
> 0=total-unassign,local-unassign;1=total-unassign,local-unassign
>
>
> Thoughts?
How do you expect this interface to be used? I understand the mechanics
of this interface but on a higher level, do you expect user space to
once in a while assign a new counter to a single event or monitor group
(for which a fine grained interface works) or do you expect user space to
shift multiple counters across several monitor events at intervals?
Across resctrl's lifetime we have seen examples of user space wanting
to accomplish more with a single resctrl interaction. For example moving
multiple tasks to a group that you added support for and moving a monitor
group feature from Peter.
I thus think that it would be valuable to consider more efficient
interfaces from the beginning. I do not think that this is the type
of work that is an optimization to be delayed until an unspecified later
time, but instead multiple usage of interface can be considered from the
start with a most optimal interface created from the beginning. Specifically,
why does resctrl need to be "extended" to support a global assignment as proposed
by Peter at a later time, why can it not be done as the original and (ideally)
only mechanism?
Reinette
Powered by blists - more mailing lists