[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0b689e0e-7742-407e-a01e-99788eb5cbb9@intel.com>
Date: Thu, 22 May 2025 13:44:31 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Babu Moger <babu.moger@....com>, <corbet@....net>, <tony.luck@...el.com>,
<tglx@...utronix.de>, <mingo@...hat.com>, <bp@...en8.de>,
<dave.hansen@...ux.intel.com>
CC: <james.morse@....com>, <dave.martin@....com>, <fenghuay@...dia.com>,
<x86@...nel.org>, <hpa@...or.com>, <paulmck@...nel.org>,
<akpm@...ux-foundation.org>, <thuth@...hat.com>, <rostedt@...dmis.org>,
<ardb@...nel.org>, <gregkh@...uxfoundation.org>,
<daniel.sneddon@...ux.intel.com>, <jpoimboe@...nel.org>,
<alexandre.chartre@...cle.com>, <pawan.kumar.gupta@...ux.intel.com>,
<thomas.lendacky@....com>, <perry.yuan@....com>, <seanjc@...gle.com>,
<kai.huang@...el.com>, <xiaoyao.li@...el.com>, <kan.liang@...ux.intel.com>,
<xin3.li@...el.com>, <ebiggers@...gle.com>, <xin@...or.com>,
<sohil.mehta@...el.com>, <andrew.cooper3@...rix.com>,
<mario.limonciello@....com>, <linux-doc@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <peternewman@...gle.com>,
<maciej.wieczor-retman@...el.com>, <eranian@...gle.com>,
<Xiaojian.Du@....com>, <gautham.shenoy@....com>
Subject: Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth
Monitoring Counters (ABMC)
Hi Babu,
On 5/15/25 3:51 PM, Babu Moger wrote:
>
> This series adds the support for Assignable Bandwidth Monitoring Counters
> (ABMC). It is also called QoS RMID Pinning feature
>
> Series is written such that it is easier to support other assignable
> features supported from different vendors.
>
> The feature details are documented in the APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC). The documentation is available at
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>
> The patches are based on top of commit
> 92a09c47464d0 (tag: v6.15-rc5, tip/irq/merge) Linux 6.15-rc5
> plus
> https://lore.kernel.org/lkml/20250515165855.31452-1-james.morse@arm.com/
>
> It is very clear these patches will go after James's resctrl FS/ARCH
> restructure. Hoping to avoid one review cycle due to the merge.
>
> # Introduction
>
> Users can create as many monitor groups as RMIDs supported by the hardware.
> However, bandwidth monitoring feature on AMD system only guarantees that
"bandwidth monitoring feature on AMD system" -> " the bandwidth monitoring
feature on AMD systems"? or "the bandwidth monitoring feature on an AMD system".
Not sure.
> RMIDs currently assigned to a processor will be tracked by hardware.
> The counters of any other RMIDs which are no longer being tracked will be
> reset to zero. The MBM event counters return "Unavailable" for the RMIDs
> that are not tracked by hardware. So, there can be only limited number of
> groups that can give guaranteed monitoring numbers. With ever changing
> configurations there is no way to definitely know which of these groups
> are being tracked for certain point of time. Users do not have the option
"for certain point of time" -> "during a particular/certain(?) time"?
> to monitor a group or set of groups for certain period of time without
"for certain period of time" -> "for a certain period of time"?
This series contains many duplicate snippets. When you update one, please
check that all the duplicates are updated also.
> worrying about counter being reset in between.
>
> The ABMC feature provides an option to the user to assign a hardware
> counter to an RMID, event pair and monitor the bandwidth as long as it is
> assigned. The assigned RMID will be tracked by the hardware until the user
> unassigns it manually. There is no need to worry about counters being reset
> during this period. Additionally, the user can specify a bitmask identifying
> the specific bandwidth types from the given source to track with the counter.
Instead of tacking it on as an "additionally" I see this capability now as essential
to this new implementation. I tried to give this series a thorough review to help finalize
this work but I kept being turned around by all the descriptions and finally it dawned
that all the descriptions are at their code still based on the original "event ID"
based implementation with either a small append or as little change as possible to
adjust to the "extended event ID" based implementation.
The previous implementation still used (and copy&pasted many times) in these descriptions
as "assign a hardware counter to an RMID, event pair" can only be accurate for this new
implementation if an event is re-defined ... it is no longer the original constrained
"event IDs" but instead an MBM event has become a generic name that identifies the
configurable "bandwidth types" (but, see note about terminology later) to be monitored.
This is never done.
I assume "the given source" is the assigned RMID? If so I think it will help to
understand if this is specific: "bandwidth types from the assigned RMID ..."
I find this series to use several terms for the same concept,
for example, "bandwidth types", "memory transactions", "types of L3 transactions",
"bandwidth sources", etc. This work will be easier to consume if it uses consistent
and specific terminology.
> Without ABMC enabled, monitoring will work in current 'default' mode without
> assignment option.
>
> # History
>
> Earlier implementation of ABMC had dependancy on BMEC (Bandwidth Monitoring
> Event Configuration). Peter had concerns with that implementation because
> it may be not be compatible with ARM's MPAM.
>
> Here are the threads discussing the concerns and new interface to address the concerns.
> https://lore.kernel.org/lkml/CALPaoCg97cLVVAcacnarp+880xjsedEWGJPXhYpy4P7=ky4MZw@mail.gmail.com/
> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>
> Here are the finalized requirements based on the discussion:
>
> * Remove BMEC dependency on the ABMC feature.
Even stronger, BMEC and ABMC are now "incompatible" in that resctrl will not let them be used
at the same time.
>
> * Eliminate global assignment listing. The interface
> /sys/fs/resctrl/info/L3_MON/mbm_assign_control is no longer required.
>
> * Create the configuration directories at /sys/fs/resctrl/info/L3_MON/counter_configs/.
> The configuration file names should be free-form, allowing users to create them as needed.
>
> * Perform assignment listing at the group level by introducing mbm_L3_assignments
"the group level" -> "the monitoring group level"
> in each monitoring group. The listing should provide the following details:
>
> Event Configuration: Specifies the event configuration applied. This will be crucial
> when "mkdir" on event configuration is added in the future, leading to the creation
> of mon_data/mon_l3_*/<event configuration>.
hmmm ... sounds like it has become more natural to refer to it as "event configuration", which is
a good match for what the purpose is. This thus sounds like good motivation to change "counter_configs"
to "event_configs".
>
> Domains: Identifies the domains where the configuration is applied, supporting multi-domain setups.
>
> Assignment Type: Indicates whether the assignment is Exclusive (e or d), Shared (s), or Unassigned (_).
Could you please add definition of what "exclusive" and "shared" means?
>
> * Provide option to enable or disable auto assignment when new group is created.
>
> This series tries to address all the requirements listed above.
>
> # Implementation details
>
> Create a generic interface aimed to support user space assignment of scarce
> counters used for monitoring. First usage of interface is by ABMC with option
> to expand usage to "soft-ABMC" and MPAM counters in future.
>
> Feature adds following interface files:
>
> /sys/fs/resctrl/info/L3_MON/mbm_assign_mode: Reports the list of assignable
> monitoring features supported. The enclosed brackets indicate which
> feature is enabled.
>
> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
> counters available for assignment.
Please aim to use consistent and clear terms to help understand this work. It is
confusing that above uses "available" in description for num_mbm_cntrs and then below
there is a new interface "available_mbm_cntrs" that uses the "available" term in name
but not description.
>
> /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs: Reports the number of monitoring
> counters free in each domain.
>
> /sys/fs/resctrl/info/L3_MON/counter_configs : Directory to hold the counter configuration.
Everywhere else seems to refer to this as "event configurations". Please just stick to one,
"event configuration" seems most appropriate.
>
> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter : Default configuration
> for MBM total events.
I think "default" should be dropped to make it clear that this is the actual configuration
that is always used, not a static "default" that may be used in "some" circumstances.
>
> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter : Default configuration
> for MBM local events.
Same wrt "default"
>
> /sys/fs/resctrl/mbm_L3_assignments: Interface to list or modify assignment states on each group.
"Per monitor group interface to list or modify counters assigned to the group."? (Please improve.)
>
> # Examples
>
> a. Check if ABMC support is available
Please drop the "ABMC" from all the descriptions since this is intended to be a generic interface.
> #mount -t resctrl resctrl /sys/fs/resctrl/
>
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> [mbm_cntr_assign]
> default
>
I believe the naming has been finalized in
https://lore.kernel.org/lkml/7628cec8-5914-4895-8289-027e7821777e@amd.com/.
> ABMC feature is detected and it is enabled.
>
> b. Check how many ABMC counters are available.
available -> supported? This will help distinguish it from the
next interface file named "available_mbm_cntrs".
>
> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> 32
Please update to reflect what implementation does.
>
> c. Check how many ABMC counters are available in each domain.
"available" -> "available for assignment"
>
> # cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
> 0=30;1=30
>
> d. Check default counter configuration.
>
> # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
> local_reads, remote_reads, local_non_temporal_writes, remote_non_temporal_writes,
> local_reads_slow_memory, remote_reads_slow_memory, dirty_victim_writes_all
>
> # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> local_reads, local_non_temporal_writes, local_reads_slow_memory
Does not look like this matches implementation wrt spacing?
>
> e. Series adds a new interface file "mbm_L3_assignments" in each monitoring group
> to list and modify any group's monitoring states.
"any group's" -> "that group's"
>
> The list is displayed in the following format:
>
> <Event configuration>:<Domain id>=<Assignment type>
>
> Event configuration: A valid event configuration listed in the
> /sys/fs/resctrl/info/L3_MON/counter_configs directory.
>
> Domain ID: A valid domain ID number.
"A valid domain ID number" -> "A valid domain ID"
>
> Assignment types:
>
> _ : No event configuration assigned
>
> e : Event configuration assigned in exclusive mode
>
> To list the default group states:
> # cat /sys/fs/resctrl/mbm_L3_assignments
> mbm_total_bytes:0=e;1=e
> mbm_local_bytes:0=e;1=e
>
> To unassign the configuration of mbm_total_bytes on domain 0:
This unassigns a counter, as opposed to a configuration, no? How about
"To unassign the counter associated with the mbm_total_bytes event"?
> #echo "mbm_total_bytes:0=_" > mbm_L3_assignments
> #cat mbm_L3_assignments
(May help to follow if the examples consistently uses full path.)
> mbm_total_bytes:0=_;1=e
> mbm_local_bytes:0=e;1=e
>
> To unassign the mbm_total_bytes configuration on all domains:
same wrt unassigning a counter
> $echo "mbm_total_bytes:*=_" > mbm_L3_assignments
> $cat mbm_L3_assignments
# prompt is usually used for administrator and $ for user without
administrator privileges. Switching between # and $ in these examples
is confusing.
> mbm_total_bytes:0=_;1=_
> mbm_local_bytes:0=e;1=e
>
> To assign the mbm_total_bytes configuration on all domains in exclusive mode:
same wrt unassigning a counter
> $echo "mbm_total_bytes:*=e" > mbm_L3_assignments
> $cat mbm_L3_assignments
> mbm_total_bytes:0=e;1=e
> mbm_local_bytes:0=e;1=e
>
> g. Read the events mbm_total_bytes and mbm_local_bytes of the default group.
> There is no change in reading the events with ABMC. If the event is unassigned
> when reading, then the read will come back as "Unassigned".
>
> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> 779247936
> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> 765207488
>
> h. Check the default event configurations.
>
> #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
> local_reads, remote_reads, local_non_temporal_writes, remote_non_temporal_writes,
> local_reads_slow_memory, remote_reads_slow_memory, dirty_victim_writes_all
>
> #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> local_reads, local_non_temporal_writes, local_reads_slow_memory
>
> i. Change the event configuration for mbm_local_bytes.
>
> #echo "local_reads, local_non_temporal_writes, local_reads_slow_memory, remote_reads" >
> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>
> #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> local_reads, local_non_temporal_writes, local_reads_slow_memory, remote_reads
>
> This will update the assignments where mbm_local_bytes are configured.
"This will update all (across all domains of all monitor groups) counter assignments
associated with the mbm_local_bytes event." (Please improve).
>
> j. Now read the total event again. The first read may come back with "Unavailable"
> status. The subsequent read of mbm_total_bytes will display only the read events.
Was this intended to be example of reading *local* bytes after modification in previous step?
>
> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> Unavailable
> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> 314101
>
> k. Users will have the option to go back to 'default' mbm_assign_mode if required.
"Users will have the option" -> "Users have the option"
> This can be done using the following command. Note that switching the
> mbm_assign_mode will reset all the MBM counters of all resctrl groups.
"all the MBM counters " -> "all the MBM counters (and thus all MBM events)"?
>
> # echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> mbm_cntr_assign
> [default]
>
> l. Unmount the resctrl
>
> #umount /sys/fs/resctrl/
> ---
Reinette
Powered by blists - more mailing lists