[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9e849476-7c4b-478b-bd2a-185024def3a3@intel.com>
Date: Wed, 12 Feb 2025 15:33:31 -0800
From: Reinette Chatre <reinette.chatre@...el.com>
To: Dave Martin <Dave.Martin@....com>, Babu Moger <babu.moger@....com>,
<peternewman@...gle.com>
CC: <corbet@....net>, <tglx@...utronix.de>, <mingo@...hat.com>,
<bp@...en8.de>, <dave.hansen@...ux.intel.com>, <tony.luck@...el.com>,
<fenghua.yu@...el.com>, <x86@...nel.org>, <hpa@...or.com>,
<paulmck@...nel.org>, <akpm@...ux-foundation.org>, <thuth@...hat.com>,
<rostedt@...dmis.org>, <xiongwei.song@...driver.com>,
<pawan.kumar.gupta@...ux.intel.com>, <daniel.sneddon@...ux.intel.com>,
<jpoimboe@...nel.org>, <perry.yuan@....com>, <sandipan.das@....com>,
<kai.huang@...el.com>, <xiaoyao.li@...el.com>, <seanjc@...gle.com>,
<xin3.li@...el.com>, <andrew.cooper3@...rix.com>, <ebiggers@...gle.com>,
<mario.limonciello@....com>, <james.morse@....com>,
<tan.shaopeng@...itsu.com>, <linux-doc@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <maciej.wieczor-retman@...el.com>,
<eranian@...gle.com>
Subject: Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth
Monitoring Counters (ABMC)
Hi Dave,
On 2/12/25 9:46 AM, Dave Martin wrote:
> Hi there,
>
> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>
>> This series adds the support for Assignable Bandwidth Monitoring Counters
>> (ABMC). It is also called QoS RMID Pinning feature
>>
>> Series is written such that it is easier to support other assignable
>> features supported from different vendors.
>>
>> The feature details are documented in the APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC). The documentation is available at
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>>
>> The patches are based on top of commit
>> d361b84d51bfe (tip/master) Merge branch into tip/master: 'x86/tdx'
>>
>> # Introduction
>
> [...]
>
>> # Examples
>>
>> a. Check if ABMC support is available
>> #mount -t resctrl resctrl /sys/fs/resctrl/
>>
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> [mbm_cntr_assign]
>> default
>
> (Nit: can this be called "mbm_counter_assign"? The name is already
> long, so I wonder whether anything is gained by using a cryptic
> abbreviation for "counter". Same with all the "cntrs" elsewhere.
> This is purely cosmetic, though -- the interface works either way.)
>
>> ABMC feature is detected and it is enabled.
>>
>> b. Check how many ABMC counters are available.
>>
>> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
>> 32
>
> Is this file needed?
>
> With MPAM, it is more difficult to promise that the same number of
> counters will be available everywhere.
>
> Rather than lie, or report a "safe" value here that may waste some
> counters, can we just allow the number of counters to be be discovered
> per domain via available_mbm_cntrs?
This sounds reasonable to me. I think us having trouble with the
user documentation of this file so late in development should also have been
a sign to rethink its value.
For a user to discover the number of counters supported via available_mbm_cntrs
would require the file's contents to be captured right after mount. Since we've
had scenarios where new userspace needs to discover an up-and-running system's
configuration this may not be possible. I thus wonder instead of removing
num_mbm_cntrs, it could be modified to return the per-domain supported counters
instead of a single value?
> num_closids and num_rmids are already problematic for MPAM, so it would
> be good to avoid any more parameters of this sort from being reported
> to userspace unless there is a clear understanding of why they are
> needed.
Yes. Appreciate your help in identifying what could be problematic for MPAM.
>
> Reporting number of counters per monitoring domain is a more natural
> fit for MPAM, as below:
>
>> c. Check how many ABMC counters are available in each domain.
>>
>> # cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
>> 0=30;1=30
>
> For MPAM, this seems supportable. Each monitoring domain will have
> some counters, and a well-defined number of them will be available for
> allocation at any one time.
>
>> d. Create few resctrl groups.
>>
>> # mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
>> # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
>> # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
>>
>> e. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> to list and modify any group's monitoring states. File provides single place
>> to list monitoring states of all the resctrl groups. It makes it easier for
>> user space to learn about the used counters without needing to traverse all
>> the groups thus reducing the number of file system calls.
>>
>> The list follows the following format:
>>
>> "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>>
>> Format for specific type of groups:
>>
>> * Default CTRL_MON group:
>> "//<domain_id>=<flags>"
>
> [...]
>
>> Flags can be one of the following:
>>
>> t MBM total event is enabled.
>> l MBM local event is enabled.
>> tl Both total and local MBM events are enabled.
>> _ None of the MBM events are enabled
>>
>> Examples:
>
> [...]
>
> I think that this basically works for MPAM.
>
> The local/total distinction doesn't map in a consistent way onto MPAM,
> but this problem is not specific to ABMC. It feels sensible for ABMC
> to be built around the same concepts that resctrl already has elsewhere
> in the interface. MPAM will do its best to fit (as already).
>
> Regarding Peter's use case of assiging multiple counters to a
> monitoring group [1], I feel that it's probably good enough to make
> sure that the ABMC interface can be extended in future in a backwards
> compatible way so as to support this, without trying to support it
> immediately.
>
> [1] https://lore.kernel.org/lkml/CALPaoCjY-3f2tWvBjuaQPfoPhxveWxxCxHqQMn4BEaeBXBa0bA@mail.gmail.com/
>
I do not think that resctrl's current support of the mbm_total_bytes and
mbm_local_bytes should be considered as the "only" two available "slots"
into which all possible events should be forced into. "mon_features" exists
to guide user space to which events are supported and as I see it new events
can be listed here to inform user space of their availability, with their
associated event files available in the resource groups.
>
> For example, if we added new generic "letters" -- say, "0" to "9",
> combined with new counter files in resctrlfs, that feels like a
> possible approach. ABMC (as in this series) should just reject such
> such assignments, and the new counter files wouldn't exist.
>
> Availability of this feature could also be reported as a distinct mode
> in mbm_assign_mode, say "mbm_cntr_generic", or whatever.
>
>
> A _sketch_ of this follows. This is NOT a proposal -- the key
> question is whether we are confident that we can extend the interface
> in this way in the future without breaking anything.
>
> If "yes", then the ABMC interface (as proposed by this series) works as
> a foundation to build on.
>
> --8<--
>
> [artists's impression]
>
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> mbm_cntr_generic
> [mbm_cntr_assign]
> default
>
> # echo mbm_cntr_generic >/sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> # echo '//0=01;1=23' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
> # echo t >/sys/fs/resctrl/info/L3_MON/mbm_counter0_bytes_type
> # echo l >/sys/fs/resctrl/info/L3_MON/mbm_counter1_bytes_type
> # echo t >/sys/fs/resctrl/info/L3_MON/mbm_counter2_bytes_type
> # echo l >/sys/fs/resctrl/info/L3_MON/mbm_counter3_bytes_type
>
> ...
>
> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_counter1_bytes
>
> etc.
>
It is not clear to me what additional features such an interface enables. It
also looks like user space will need to track and manage counter IDs?
It sounds to me as though the issue starts with your statement
"The local/total distinction doesn't map in a consistent way onto MPAM". To
address this I expect that an MPAM system will not support nor list
mbm_total_bytes and/or mbm_local_bytes in its mon_features file (*)? Instead,
it would list the events that are appropriate to the system? Trying to match
with what Peter said [1] in the message you refer to, this may be possible:
# cat /sys/fs/resctrl/info/L3_MON/mon_features
mbm_local_read_bytes
mbm_local_write_bytes
mbm_local_bytes
(*) I am including mbm_local_bytes since it could be an event that can be software
defined as a sum of mbm_local_read_bytes and mbm_local_write_bytes when they are both
counted.
I see the support for MPAM events distinct from the support of assignable counters.
Once the MPAM events are sorted, I think that they can be assigned with existing interface.
Please help me understand if you see it differently.
Doing so would need to come up with alphabetical letters for these events,
which seems to be needed for your proposal also? If we use possible flags of:
mbm_local_read_bytes a
mbm_local_write_bytes b
Then mbm_assign_control can be used as:
# echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
<value>
# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
<sum of mbm_local_read_bytes and mbm_local_write_bytes>
One issue would be when resctrl needs to support more than 26 events (no more flags available),
assuming that upper case would be used for "shared" counters (unless this interface is defined
differently and only few uppercase letters used for it). Would this be too low of a limit?
Reinette
[1] https://lore.kernel.org/lkml/CALPaoCjY-3f2tWvBjuaQPfoPhxveWxxCxHqQMn4BEaeBXBa0bA@mail.gmail.com/
Powered by blists - more mailing lists