lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e911e9c0-be02-4e9d-93e6-0c04ae717905@intel.com>
Date: Tue, 24 Jun 2025 14:25:29 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Babu Moger <babu.moger@....com>, <corbet@....net>, <tony.luck@...el.com>,
	<Dave.Martin@....com>, <james.morse@....com>, <tglx@...utronix.de>,
	<mingo@...hat.com>, <bp@...en8.de>, <dave.hansen@...ux.intel.com>
CC: <x86@...nel.org>, <hpa@...or.com>, <akpm@...ux-foundation.org>,
	<rostedt@...dmis.org>, <paulmck@...nel.org>, <thuth@...hat.com>,
	<ardb@...nel.org>, <gregkh@...uxfoundation.org>, <seanjc@...gle.com>,
	<thomas.lendacky@....com>, <pawan.kumar.gupta@...ux.intel.com>,
	<manali.shukla@....com>, <perry.yuan@....com>, <kai.huang@...el.com>,
	<peterz@...radead.org>, <xiaoyao.li@...el.com>, <kan.liang@...ux.intel.com>,
	<mario.limonciello@....com>, <xin3.li@...el.com>, <gautham.shenoy@....com>,
	<xin@...or.com>, <chang.seok.bae@...el.com>, <fenghuay@...dia.com>,
	<peternewman@...gle.com>, <maciej.wieczor-retman@...el.com>,
	<eranian@...gle.com>, <linux-doc@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v14 00/32] fs,x86/resctrl: Support AMD Assignable
 Bandwidth Monitoring Counters (ABMC)

Hi Babu,

On 6/13/25 2:04 PM, Babu Moger wrote:
> 
> This series adds the support for Assignable Bandwidth Monitoring Counters
> (ABMC). It is also called QoS RMID Pinning feature
> 
> Series is written such that it is easier to support other assignable
> features supported from different vendors.
> 
> The feature details are documented in the  APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC). The documentation is available at
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> 
> The patches are based on top of commit
> b75dc5e1399df (tip/master) Merge branch into tip/master: 'sched/core'
> 
> # Introduction
> 
> Users can create as many monitor groups as RMIDs supported by the hardware.
> However, the bandwidth monitoring feature on AMD systems only guarantees
> that RMIDs currently assigned to a processor will be tracked by hardware.
> The counters of any other RMIDs which are no longer being tracked will be
> reset to zero. The MBM event counters return "Unavailable" for the RMIDs
> that are not tracked by hardware. So, there can be only limited number of
> groups that can give guaranteed monitoring numbers. With ever changing
> configurations there is no way to definitely know which of these groups
> are being tracked during a particular time. Users do not have the option
> to monitor a group or set of groups for a certain period of time without
> worrying about counter being reset in between.

"about counter" -> "about counters" ?

>     
> The ABMC feature allows users to assign a hardware counter ID to an RMID,
> event pair and monitor bandwidth usage as long as it is assigned. The
> hardware continues to track the assigned counter until it is explicitly
> unassigned by the user. Additionally, the user can specify the type of
> memory transactions (e.g., reads, writes) to be tracked by the counter
> for the assigned RMID.
> 
> Without ABMC enabled, monitoring will work in current 'default' mode without
> assignment option.
> 
> # History
> 
> Earlier implementation of ABMC had dependancy on BMEC (Bandwidth Monitoring
> Event Configuration). Peter had concerns with that implementation because
> it may be not be compatible with ARM's MPAM.
> 
> Here are the threads discussing the concerns and new interface to address the concerns.
> https://lore.kernel.org/lkml/CALPaoCg97cLVVAcacnarp+880xjsedEWGJPXhYpy4P7=ky4MZw@mail.gmail.com/
> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
> 
> Here are the finalized requirements based on the discussion:
> 
> *   BMEC and ABMC are incompatible with each other. They need to be mutually exclusive.
> 
> *   Eliminate global assignment listing. The interface
>     /sys/fs/resctrl/info/L3_MON/mbm_assign_control is no longer required.
> 
> *   Create the configuration directories at /sys/fs/resctrl/info/L3_MON/counter_configs/.
>     The configuration file names should be free-form, allowing users to create them as needed.
> 
> *   Perform assignment listing at the group level by introducing mbm_L3_assignments
>     in each monitoring group level. The listing should provide the following details:
> 
>     Event Configuration: Specifies the event configuration applied. This will be crucial
>     when "mkdir" on event configuration is added in the future, leading to the creation
>     of mon_data/mon_l3_*/<event configuration>.
> 
>     Domains: Identifies the domains where the configuration is applied, supporting multi-domain setups.
> 
>     Assignment Type: Indicates whether the assignment is Exclusive (e or d), Shared (s), or Unassigned (_).
> 
>     Exclusive assignment: Assign the counter ID the RMID, event pair exclusively.
>     
>     Shared assignment: A shared assignment applies to both soft-ABMC and ABMC. A user can designate a
>                        "counter" (could be hardware counter or "active" RMID) as shared and that means
>                        the counter within that domain is shared between different monitor groups and actual
>                        assignment is scheduled by resctrl.  
> 
>     Unassigned: No longer assigned.
> 
> *   Provide option to enable or disable auto assignment when new group is created.
> 
> *   Keep the flexibilty to support future assign options like Soft-ABMC etc.
>     https://lore.kernel.org/lkml/7f10fa69-d1fe-4748-b10c-fa0c9b60bd66@intel.com/
>     
> 
> This series tries to address all the requirements listed above.

Please drop the "tries to". Also please do not say "address all requirements" when this
is not the case. This series does not address all the requirements listed
(no dynamic event configurations via mkdir and no shared assignment). Please be specific
about what this series addresses and what it leaves for "future", but highlight that
while this series does not implement all requirements it does create framework
to support their future implementation.

> 
> # Implementation details
> 
> Create a generic interface aimed to support user space assignment of scarce

drop "aimed"

> counters used for monitoring. First usage of interface is by ABMC with option
> to expand usage to "soft-ABMC" and MPAM counters in future.
> 
> Feature adds following interface files:
> 
> /sys/fs/resctrl/info/L3_MON/mbm_assign_mode: Reports the list of assignable
> monitoring features supported. The enclosed brackets indicate which
> feature is enabled.
> 
> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: The maximum number of monitoring counters
> (total of available and assigned counters) in each domain when the system supports
> mbm_assign_mode. 
> 
> /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs: The number of monitoring counters
> available for assignment in each domain when mbm_event mode is enabled on the system.

Why is "num_mbm_cntrs" connected to mbm_assign_mode while "available_mbm_cntrs" is
connected to mbm_event mode? Perhaps both can be "mbm_event" mode to reduce confusion?

> 
> /sys/fs/resctrl/info/L3_MON/event_configs: Contains sub-directory for each MBM event
> 					   that can be assigned to a counter.
> 
> /sys/fs/resctrl/info/L3_MON/event_configs/mbm_total_bytes/event_filter: The type of
> 			memory transactions tracked by the event mbm_total_bytes.
> 
> /sys/fs/resctrl/info/L3_MON/event_configs/mbm_local_bytes/event_filter: The type of
> 			memory transactions tracked by the event mbm_local_bytes.
> 
> /sys/fs/resctrl/mbm_L3_assignments: Per monitor group interface to list or modify
> 				    counters assigned to the group.
> 
> # Examples
> 
> a. Check if MBM assign support is available
> 	#mount -t resctrl resctrl /sys/fs/resctrl/
> 
> 	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> 	[mbm_event]
> 	default
> 
> 	mbm_event feature is detected and it is enabled.
> 
> b. Check how many assignable counters are supported. 
> 
> 	# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs 
> 	0=32;1=32
> 
> c. Check how many assignable counters are available for assignment in each domain.
> 
> 	# cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs 
> 	0=30;1=30
> 
> d. Check default event configuration.
> 
> 	# cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_total_bytes/event_filter 
> 	local_reads,remote_reads,local_non_temporal_writes,remote_non_temporal_writes,
>         local_reads_slow_memory,remote_reads_slow_memory,dirty_victim_writes_all
> 
> 	# cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_local_bytes/event_filter 
> 	local_reads,local_non_temporal_writes,local_reads_slow_memory
> 
> e. Series adds a new interface file "mbm_L3_assignments" in each monitoring group
>    to list and modify that group's monitoring states.
> 
> 	The list is displayed in the following format:
> 
>         <Event>:<Domain ID>=<Assignment type>

Suggest adding multiple domains to example. Above creates impression that each domain
is listed on its own line (until example below clears that up).

> 
>         Event: A valid MBM event listed in the
>         /sys/fs/resctrl/info/L3_MON/event_configs directory.
> 
>         Domain ID: A valid domain ID.
> 
>         Assignment types:
> 
>         _ : No counter assigned.
> 
>         e : Counter assigned exclusively.
> 
> 	To list the default group states:
> 	# cat /sys/fs/resctrl/mbm_L3_assignments
> 	mbm_total_bytes:0=e;1=e
> 	mbm_local_bytes:0=e;1=e
> 
> 	To unassign the counter associated with the mbm_total_bytes event on domain 0:
> 	# echo "mbm_total_bytes:0=_" > /sys/fs/resctrl/mbm_L3_assignments
> 	# cat /sys/fs/resctrl/mbm_L3_assignments
> 	mbm_total_bytes:0=_;1=e
> 	mbm_local_bytes:0=e;1=e
> 
> 	To unassign the counter associated with the mbm_total_bytes event on all domains:
>     	# echo "mbm_total_bytes:*=_" > /sys/fs/resctrl/mbm_L3_assignments
> 	# cat /sys/fs/resctrl/mbm_L3_assignment
> 	mbm_total_bytes:0=_;1=_
> 	mbm_local_bytes:0=e;1=e
> 
> 	To assign a counter associated with the mbm_total_bytes event on all domains in exclusive mode:
>     	# echo "mbm_total_bytes:*=e" > /sys/fs/resctrl/mbm_L3_assignments
> 	# cat /sys/fs/resctrl/mbm_L3_assignments
> 	mbm_total_bytes:0=e;1=e
> 	mbm_local_bytes:0=e;1=e
> 
> g. Read the events mbm_total_bytes and mbm_local_bytes of the default group.
>    There is no change in reading the events with the assignment.  If the event is unassigned
>    when reading, then the read will come back as "Unassigned".
> 	
> 	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> 	779247936
> 	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes 
> 	765207488
> 	
> h. Check the default event configurations.
> 
> 	# cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_total_bytes/event_filter
> 	local_reads,remote_reads,local_non_temporal_writes,remote_non_temporal_writes,
> 	local_reads_slow_memory,remote_reads_slow_memory,dirty_victim_writes_all
> 
> 	# cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_local_bytes/event_filter
> 	local_reads,local_non_temporal_writes,local_reads_slow_memory
> 
> i. Change the event configuration for mbm_local_bytes.
> 
> 	# echo "local_reads, local_non_temporal_writes, local_reads_slow_memory, remote_reads" >
> 	/sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> 
> 	# cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> 	local_reads, local_non_temporal_writes, local_reads_slow_memory, remote_reads

Note that examples are inconsistent wrt spacing in output of this file. This is expected
to match how the implementation in series does the spacing.

> 	
> 	This will update all (across all domains of all monitor groups) counter assignments 
>         associated with the mbm_local_bytes event.
> 
> j. Now read the local event again. The first read may come back with "Unavailable"
>    status. The subsequent read of mbm_local_bytes will display only the read events.

Above specifies "will display only the read events" while previous step added
"local_non_temporal_writes" to the memory transactions. What is meant with "only the read events"?

> 	
> 	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> 	Unavailable
> 	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> 	314101
> 
> k. Users have the option to go back to 'default' mbm_assign_mode if required.
>    This can be done using the following command. Note that switching the
>    mbm_assign_mode will reset all the MBM counters (and thus all MBM events) of all
>    the resctrl groups.
> 
> 	# echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> 	# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> 	mbm_event
> 	[default]
> 	
> l. Unmount the resctrl
> 	 
> 	#umount /sys/fs/resctrl/
> ---

Reinette


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ