[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <715aed9d-dbfd-4b7b-a70d-b69262553c1a@intel.com>
Date: Wed, 12 Feb 2025 15:40:48 -0800
From: Reinette Chatre <reinette.chatre@...el.com>
To: Dave Martin <Dave.Martin@....com>, Babu Moger <babu.moger@....com>,
<peternewman@...gle.com>
CC: <corbet@....net>, <tglx@...utronix.de>, <mingo@...hat.com>,
<bp@...en8.de>, <dave.hansen@...ux.intel.com>, <tony.luck@...el.com>,
<x86@...nel.org>, <hpa@...or.com>, <paulmck@...nel.org>,
<akpm@...ux-foundation.org>, <thuth@...hat.com>, <rostedt@...dmis.org>,
<xiongwei.song@...driver.com>, <pawan.kumar.gupta@...ux.intel.com>,
<daniel.sneddon@...ux.intel.com>, <jpoimboe@...nel.org>,
<perry.yuan@....com>, <sandipan.das@....com>, <kai.huang@...el.com>,
<xiaoyao.li@...el.com>, <seanjc@...gle.com>, <xin3.li@...el.com>,
<andrew.cooper3@...rix.com>, <ebiggers@...gle.com>,
<mario.limonciello@....com>, <james.morse@....com>,
<tan.shaopeng@...itsu.com>, <linux-doc@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <maciej.wieczor-retman@...el.com>,
<eranian@...gle.com>
Subject: Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth
Monitoring Counters (ABMC)
-Fenghua (his email address does not work anymore)
On 2/12/25 3:33 PM, Reinette Chatre wrote:
> Hi Dave,
>
> On 2/12/25 9:46 AM, Dave Martin wrote:
>> Hi there,
>>
>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>>
>>> This series adds the support for Assignable Bandwidth Monitoring Counters
>>> (ABMC). It is also called QoS RMID Pinning feature
>>>
>>> Series is written such that it is easier to support other assignable
>>> features supported from different vendors.
>>>
>>> The feature details are documented in the APM listed below [1].
>>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>>> Monitoring (ABMC). The documentation is available at
>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>>>
>>> The patches are based on top of commit
>>> d361b84d51bfe (tip/master) Merge branch into tip/master: 'x86/tdx'
>>>
>>> # Introduction
>>
>> [...]
>>
>>> # Examples
>>>
>>> a. Check if ABMC support is available
>>> #mount -t resctrl resctrl /sys/fs/resctrl/
>>>
>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>>> [mbm_cntr_assign]
>>> default
>>
>> (Nit: can this be called "mbm_counter_assign"? The name is already
>> long, so I wonder whether anything is gained by using a cryptic
>> abbreviation for "counter". Same with all the "cntrs" elsewhere.
>> This is purely cosmetic, though -- the interface works either way.)
>>
>>> ABMC feature is detected and it is enabled.
>>>
>>> b. Check how many ABMC counters are available.
>>>
>>> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
>>> 32
>>
>> Is this file needed?
>>
>> With MPAM, it is more difficult to promise that the same number of
>> counters will be available everywhere.
>>
>> Rather than lie, or report a "safe" value here that may waste some
>> counters, can we just allow the number of counters to be be discovered
>> per domain via available_mbm_cntrs?
>
> This sounds reasonable to me. I think us having trouble with the
> user documentation of this file so late in development should also have been
> a sign to rethink its value.
>
> For a user to discover the number of counters supported via available_mbm_cntrs
> would require the file's contents to be captured right after mount. Since we've
> had scenarios where new userspace needs to discover an up-and-running system's
> configuration this may not be possible. I thus wonder instead of removing
> num_mbm_cntrs, it could be modified to return the per-domain supported counters
> instead of a single value?
>
>> num_closids and num_rmids are already problematic for MPAM, so it would
>> be good to avoid any more parameters of this sort from being reported
>> to userspace unless there is a clear understanding of why they are
>> needed.
>
> Yes. Appreciate your help in identifying what could be problematic for MPAM.
>
>>
>> Reporting number of counters per monitoring domain is a more natural
>> fit for MPAM, as below:
>>
>>> c. Check how many ABMC counters are available in each domain.
>>>
>>> # cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
>>> 0=30;1=30
>>
>> For MPAM, this seems supportable. Each monitoring domain will have
>> some counters, and a well-defined number of them will be available for
>> allocation at any one time.
>>
>>> d. Create few resctrl groups.
>>>
>>> # mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
>>> # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
>>> # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
>>>
>>> e. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>> to list and modify any group's monitoring states. File provides single place
>>> to list monitoring states of all the resctrl groups. It makes it easier for
>>> user space to learn about the used counters without needing to traverse all
>>> the groups thus reducing the number of file system calls.
>>>
>>> The list follows the following format:
>>>
>>> "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>>>
>>> Format for specific type of groups:
>>>
>>> * Default CTRL_MON group:
>>> "//<domain_id>=<flags>"
>>
>> [...]
>>
>>> Flags can be one of the following:
>>>
>>> t MBM total event is enabled.
>>> l MBM local event is enabled.
>>> tl Both total and local MBM events are enabled.
>>> _ None of the MBM events are enabled
>>>
>>> Examples:
>>
>> [...]
>>
>> I think that this basically works for MPAM.
>>
>> The local/total distinction doesn't map in a consistent way onto MPAM,
>> but this problem is not specific to ABMC. It feels sensible for ABMC
>> to be built around the same concepts that resctrl already has elsewhere
>> in the interface. MPAM will do its best to fit (as already).
>>
>> Regarding Peter's use case of assiging multiple counters to a
>> monitoring group [1], I feel that it's probably good enough to make
>> sure that the ABMC interface can be extended in future in a backwards
>> compatible way so as to support this, without trying to support it
>> immediately.
>>
>> [1] https://lore.kernel.org/lkml/CALPaoCjY-3f2tWvBjuaQPfoPhxveWxxCxHqQMn4BEaeBXBa0bA@mail.gmail.com/
>>
>
> I do not think that resctrl's current support of the mbm_total_bytes and
> mbm_local_bytes should be considered as the "only" two available "slots"
> into which all possible events should be forced into. "mon_features" exists
> to guide user space to which events are supported and as I see it new events
> can be listed here to inform user space of their availability, with their
> associated event files available in the resource groups.
>
>>
>> For example, if we added new generic "letters" -- say, "0" to "9",
>> combined with new counter files in resctrlfs, that feels like a
>> possible approach. ABMC (as in this series) should just reject such
>> such assignments, and the new counter files wouldn't exist.
>>
>> Availability of this feature could also be reported as a distinct mode
>> in mbm_assign_mode, say "mbm_cntr_generic", or whatever.
>>
>>
>> A _sketch_ of this follows. This is NOT a proposal -- the key
>> question is whether we are confident that we can extend the interface
>> in this way in the future without breaking anything.
>>
>> If "yes", then the ABMC interface (as proposed by this series) works as
>> a foundation to build on.
>>
>> --8<--
>>
>> [artists's impression]
>>
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> mbm_cntr_generic
>> [mbm_cntr_assign]
>> default
>>
>> # echo mbm_cntr_generic >/sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> # echo '//0=01;1=23' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> # echo t >/sys/fs/resctrl/info/L3_MON/mbm_counter0_bytes_type
>> # echo l >/sys/fs/resctrl/info/L3_MON/mbm_counter1_bytes_type
>> # echo t >/sys/fs/resctrl/info/L3_MON/mbm_counter2_bytes_type
>> # echo l >/sys/fs/resctrl/info/L3_MON/mbm_counter3_bytes_type
>>
>> ...
>>
>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_counter1_bytes
>>
>> etc.
>>
>
> It is not clear to me what additional features such an interface enables. It
> also looks like user space will need to track and manage counter IDs?
>
> It sounds to me as though the issue starts with your statement
> "The local/total distinction doesn't map in a consistent way onto MPAM". To
> address this I expect that an MPAM system will not support nor list
> mbm_total_bytes and/or mbm_local_bytes in its mon_features file (*)? Instead,
> it would list the events that are appropriate to the system? Trying to match
> with what Peter said [1] in the message you refer to, this may be possible:
>
> # cat /sys/fs/resctrl/info/L3_MON/mon_features
> mbm_local_read_bytes
> mbm_local_write_bytes
> mbm_local_bytes
>
> (*) I am including mbm_local_bytes since it could be an event that can be software
> defined as a sum of mbm_local_read_bytes and mbm_local_write_bytes when they are both
> counted.
>
> I see the support for MPAM events distinct from the support of assignable counters.
> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
> Please help me understand if you see it differently.
>
> Doing so would need to come up with alphabetical letters for these events,
> which seems to be needed for your proposal also? If we use possible flags of:
>
> mbm_local_read_bytes a
> mbm_local_write_bytes b
>
> Then mbm_assign_control can be used as:
> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
> <value>
> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>
> One issue would be when resctrl needs to support more than 26 events (no more flags available),
> assuming that upper case would be used for "shared" counters (unless this interface is defined
> differently and only few uppercase letters used for it). Would this be too low of a limit?
>
> Reinette
>
> [1] https://lore.kernel.org/lkml/CALPaoCjY-3f2tWvBjuaQPfoPhxveWxxCxHqQMn4BEaeBXBa0bA@mail.gmail.com/
Powered by blists - more mailing lists