[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9b94b97e-4a8c-415e-af7a-d3f832592cf9@intel.com>
Date: Fri, 23 Feb 2024 14:21:12 -0800
From: Reinette Chatre <reinette.chatre@...el.com>
To: <babu.moger@....com>, James Morse <james.morse@....com>, <corbet@....net>,
<fenghua.yu@...el.com>, <tglx@...utronix.de>, <mingo@...hat.com>,
<bp@...en8.de>, <dave.hansen@...ux.intel.com>
CC: <x86@...nel.org>, <hpa@...or.com>, <paulmck@...nel.org>,
<rdunlap@...radead.org>, <tj@...nel.org>, <peterz@...radead.org>,
<yanjiewtw@...il.com>, <kim.phillips@....com>, <lukas.bulwahn@...il.com>,
<seanjc@...gle.com>, <jmattson@...gle.com>, <leitao@...ian.org>,
<jpoimboe@...nel.org>, <rick.p.edgecombe@...el.com>,
<kirill.shutemov@...ux.intel.com>, <jithu.joseph@...el.com>,
<kai.huang@...el.com>, <kan.liang@...ux.intel.com>,
<daniel.sneddon@...ux.intel.com>, <pbonzini@...hat.com>,
<sandipan.das@....com>, <ilpo.jarvinen@...ux.intel.com>,
<peternewman@...gle.com>, <maciej.wieczor-retman@...el.com>,
<linux-doc@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<eranian@...gle.com>
Subject: Re: [PATCH v2 00/17] x86/resctrl : Support AMD Assignable Bandwidth
Monitoring Counters (ABMC)
Hi Babu,
On 2/23/2024 12:11 PM, Moger, Babu wrote:
> On 2/23/24 11:17, Reinette Chatre wrote:
>>
>>
>> On 2/20/2024 12:48 PM, Moger, Babu wrote:
>>> On 2/20/24 09:21, James Morse wrote:
>>>> On 19/01/2024 18:22, Babu Moger wrote:
>>
>>>>> e. Enable ABMC mode.
>>>>>
>>>>> #echo 1 > /sys/fs/resctrl/info/L3_MON/mbm_assign_enable
>>>>> #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_enable
>>>>> 1
>>>>
>>>> Why does this mode need enabling? Can't it be enabled automatically on hardware that
>>>> supports it, or enabled implicitly when the first assignment attempt arrives?
>>>>
>>>> I guess this is really needed for a reset - could we implement that instead? This way
>>>> there isn't an extra step user-space has to do to make the assignments work.
>>>
>>> Mostly the new features are added as an opt-in method. So, kept it that
>>> way. If we enable this feature automatically, then we have provide an
>>> option to disable it.
>>>
>>
>> At the same time it sounds to me like ABMC can improve current users'
>> experience without requiring them to do anything. This sounds appealing.
>> For example, if I understand correctly, it may be possible to start resctrl
>> with ABMC enabled by default and the number of monitoring groups (currently
>> exposed to user space via "num_rmids") limited to the number of counters
>> supported by ABMC. Existing users would then by default obtain better behavior
>> of counters not resetting.
>
> Yes, I like the idea. But i will break compatibility with pqos
> tool(intel_cmt_cat utility). pqos tool monitoring will not work without
> supporting ABMC enablement in the tool. ABMC feature requires an extra
> step to assign the counters for monitor to work.
I am considering two scenarios, the "default behavior" is what a user will
experience when booting resctrl on an ABMC system and the "new feature
behavior" where a user can take full advantage of all that ABMC (and soft
RMID, and MPAM) can offer.
So, first, on an ABMC system in the "default behavior" scenario I expect
that resctrl can do required ABMC counter configuration automatically at
the time a monitor group is created. In this "default behavior" scenario
resctrl would expose "num_rmids" to be half of the number of assignable
counters. When a user then creates a monitor group two counters will be
used and configured to count the local and total bytes respectively. If
two counters are not available then ENOSPC returned, just like when system
is out of closid/rmid. With this "default behavior" user space thus gets
improved behavior without making any changes on its part. I do not have
insight into how many counters ABMC could be expected to expose though ...
so some users may be surprised at how few monitor groups can be created
with new hardware? This may not be an issue since that would accurately
reflect how many _reliable_ monitor groups can be created and if user needs
more monitor groups then that would be a time to explore the "new feature"
that requires changes in how user interacts with resctrl.
Apart from the "default behavior" there are two options to consider ...
(a) the "original" behavior(? I do not know what to call it) - this would be
where user space wants(?) to have the current non-ABMC behavior on an ABMC
system, where the previous "num_rmids" monitor groups can be created but
the counters are reset unpredictably ... should this still be supported
on ABMC systems though?
(b) the "new feature" behavior where user space gets full benefit of ABMC
that allows user space to create any number of monitor groups but then
user space needs to let hardware (via resctrl) know which
events should be counted.
I expect that only (b) above would require user space change. Considering
that per documentation, "num_rmids" means "This is the upper bound for how
many "CTRL_MON" + "MON" groups can be created" I expect that "num_rmids"
becomes undefined when "new feature" is enabled. When this new feature is enabled
then user space is no longer limited by number of RMIDs on how many monitor
groups can be created and this is the point that the user interface that you
and Peter have ideas about comes into play. Specifically, user space needing
a way to specify:
(a) "let me create more monitor groups that the hardware can support"/"let me
control which events/monitor groups are counted"
(like the "mbm_assign" file in your proposal)
(b) "here are the events that need to be counted"
(like the "monitor_state" and "mbm_{local,total}_bytes_assigned" proposals)
Reinette
Powered by blists - more mailing lists