[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <05b4e345-ad14-4ea9-a13f-2c9b3a6eb422@intel.com>
Date: Thu, 1 Aug 2024 14:49:43 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: <babu.moger@....com>, <corbet@....net>, <fenghua.yu@...el.com>,
<tglx@...utronix.de>, <mingo@...hat.com>, <bp@...en8.de>,
<dave.hansen@...ux.intel.com>
CC: <x86@...nel.org>, <hpa@...or.com>, <paulmck@...nel.org>,
<rdunlap@...radead.org>, <tj@...nel.org>, <peterz@...radead.org>,
<yanjiewtw@...il.com>, <kim.phillips@....com>, <lukas.bulwahn@...il.com>,
<seanjc@...gle.com>, <jmattson@...gle.com>, <leitao@...ian.org>,
<jpoimboe@...nel.org>, <rick.p.edgecombe@...el.com>,
<kirill.shutemov@...ux.intel.com>, <jithu.joseph@...el.com>,
<kai.huang@...el.com>, <kan.liang@...ux.intel.com>,
<daniel.sneddon@...ux.intel.com>, <pbonzini@...hat.com>,
<sandipan.das@....com>, <ilpo.jarvinen@...ux.intel.com>,
<peternewman@...gle.com>, <maciej.wieczor-retman@...el.com>,
<linux-doc@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<eranian@...gle.com>, <james.morse@....com>
Subject: Re: [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth
Monitoring Counters (ABMC)
Hi Babu,
On 7/17/24 10:19 AM, Moger, Babu wrote:
> Hi Reinette,
>
> On 7/12/24 17:03, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 7/3/24 2:48 PM, Babu Moger wrote:
>>> # Linux Implementation
>>>
>>> Linux resctrl subsystem provides the interface to count maximum of two
>>> memory bandwidth events per group, from a combination of available total
>>> and local events. Keeping the current interface, users can enable a maximum
>>> of 2 ABMC counters per group. User will also have the option to enable only
>>> one counter to the group. If the system runs out of assignable ABMC
>>> counters, kernel will display an error. Users need to disable an already
>>> enabled counter to make space for new assignments.
>>
>> The implementation appears to be converging on an interface that can
>> be generic enough to be used by other features discussed along the way.
>> "Linux implementation" summary can thus add:
>>
>> Create a generic interface aimed to support user space assignment
>> of scarce counters used for monitoring. First usage of interface
>> is by ABMC with option to expand usage to "soft-RMID" and MPAM
>> counters in future.
>
> Sure.
>
>>
>>
>>> # Examples
>>>
>>> a. Check if ABMC support is available
>>> #mount -t resctrl resctrl /sys/fs/resctrl/
>>>
>>> #cat /sys/fs/resctrl/info/L3_MON/mbm_mode
>>> [abmc]
>>> legacy
>>>
>>> Linux kernel detected ABMC feature and it is enabled.
>>
>> How about renaming "abmc" to "mbm_cntrs"? This will match the num_mbm_cntrs
>> info file and be the final step to make this generic so that another
>> architecture
>> can more easily support assignining hardware counters without needing to call
>> the feature AMD's "abmc".
>
> I think we aleady settled this with "mbm_cntr_assignable".
>
> For soft-RMID" it will be mbm_sw_assignable.
Maybe getting a bit long but how about "mbm_cntr_sw_assignable" to match
with the term "mbm_cntr" in accompanying "num_mbm_cntrs"?
>> Expanding on this it may be possible to add a new "sw_mbm_cntrs" feature that
>> will be the "soft-RMID" feature while also reflecting the "mbm_cntrs" name
>> so that when user space enables that feature its properties can be found in
>> "num_mbm_cntrs".
>>
>> The "abmc" kernel parameter remains but that does seem separate from this
>> resctrl fs feature since it is explicitly tied to X86_FEATURE_ABMC surely
>> making it architecture specific.
>>
>>>
>>> b. Check how many ABMC counters are available.
>>>
>>> #cat /sys/fs/resctrl/info/L3_MON/num_cntrs
>>> 32
>>
>> This is now num_mbm_cntrs
>
> Sure.
>
>>
>>>
>>> c. Create few resctrl groups.
>>>
>>> # mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
>>> # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
>>> # mkdir
>>> /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
>>>
>>>
>>> d. This series adds a new interface file
>>> /sys/fs/resctrl/info/L3_MON/mbm_control
>>> to list and modify the group's monitoring states. File provides
>>> single place
>>> to list monitoring states of all the resctrl groups. It makes it
>>> easier for
>>> user space to learn about the counters are used without needing to
>>> traverse
>>> all the groups thus reducing the number of filesystem calls.
>>>
>>> The list follows the following format:
>>>
>>> "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>>>
>>> Format for specific type of groups:
>>>
>>> * Default CTRL_MON group:
>>> "//<domain_id>=<flags>"
>>>
>>> * Non-default CTRL_MON group:
>>> "<CTRL_MON group>//<domain_id>=<flags>"
>>>
>>> * Child MON group of default CTRL_MON group:
>>> "/<MON group>/<domain_id>=<flags>"
>>>
>>> * Child MON group of non-default CTRL_MON group:
>>> "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>>>
>>> Flags can be one of the following:
>>>
>>> t MBM total event is enabled.
>>> l MBM local event is enabled.
>>> tl Both total and local MBM events are enabled.
>>> _ None of the MBM events are enabled
>>
>> The language needs to be changed here (and in the many copied places) to
>> be specific about what setting the flag accomplishes. For example, in
>> "legacy" mode user space can be expected to find all events enabled, no?
>> Needing a new feature to set a flag to accomplish something that is
>> possible in legacy mode can thus cause confusion.
>
> Yes. It is possible to do it. But I feel unnessassary.
>
>>
>> If I understand the implementation reading "mbm_control" will fail
>> if system is ABMC capable but it is disabled. Why can "mbm_control" not
>> always be displayed to user space? For example, what if "mbm_control" is
>> always available to user space and it can provide specific information to
>> user space. For example:
>> t MBM total event is enabled but may not always be counted.
>> T MBM total event is enabled and being counted.
>>
>> On AMD systems resource groups will have "t" associated with monitor
>> groups when ABMC disabled, "T" when ABMC enabled and a counter assigned.
>> On Intel systems monitor groups will always have "T".
>
> I think more flags will add more confusion.
>
>>
>> For "soft-RMID" the flag could possible continue to be "T"?
>>
>> I am trying to find ways to communicate to user space consistently
>> and clearly and any insights will be appreciated. We really do not want
>> to add this interface and then find that it just causes confusion.
>>
>> It is not quite obvious to me when the new files should be visible and
>> what they should present to the user. "mbm_mode" is now always visible.
>> Should "num_mbm_cntrs" not also always be visible? Right now "num_mbm_cntrs"
>> appears to be only associated to ABMC, should it not also, for example,
>> be the file that "soft-RMID" may use to share how many counters are
>> available? Its contents will thus be dynamic based on which "MBM mode" is
>> active, begging the question, what should it contain when "legacy" mode is
>> enabled, should "num_mbm_cntrs" perhaps show "0" to user space when
>> "legacy" mode is active?
>
> Its good we have this discussion.
>
> How about we go with simple way for now. The mbm_mode will only available
> when ABMC or Soft_RMID(MPAM feature) is supported. Same way for the
> num_mbm_cntrs.
If ABMC or Soft_RMID is supported then user can still enable "legacy" instead.
What will num_mbm_cntrs and mbm_control display when user enables
"legacy"?
Reinette
Powered by blists - more mailing lists