lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e04b721a-b5cb-4089-a3ad-125a6247e5b8@intel.com>
Date: Fri, 12 Jul 2024 15:03:42 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Babu Moger <babu.moger@....com>, <corbet@....net>, <fenghua.yu@...el.com>,
	<tglx@...utronix.de>, <mingo@...hat.com>, <bp@...en8.de>,
	<dave.hansen@...ux.intel.com>
CC: <x86@...nel.org>, <hpa@...or.com>, <paulmck@...nel.org>,
	<rdunlap@...radead.org>, <tj@...nel.org>, <peterz@...radead.org>,
	<yanjiewtw@...il.com>, <kim.phillips@....com>, <lukas.bulwahn@...il.com>,
	<seanjc@...gle.com>, <jmattson@...gle.com>, <leitao@...ian.org>,
	<jpoimboe@...nel.org>, <rick.p.edgecombe@...el.com>,
	<kirill.shutemov@...ux.intel.com>, <jithu.joseph@...el.com>,
	<kai.huang@...el.com>, <kan.liang@...ux.intel.com>,
	<daniel.sneddon@...ux.intel.com>, <pbonzini@...hat.com>,
	<sandipan.das@....com>, <ilpo.jarvinen@...ux.intel.com>,
	<peternewman@...gle.com>, <maciej.wieczor-retman@...el.com>,
	<linux-doc@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	<eranian@...gle.com>, <james.morse@....com>
Subject: Re: [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth
 Monitoring Counters (ABMC)

Hi Babu,

On 7/3/24 2:48 PM, Babu Moger wrote:
> # Linux Implementation
> 
> Linux resctrl subsystem provides the interface to count maximum of two
> memory bandwidth events per group, from a combination of available total
> and local events. Keeping the current interface, users can enable a maximum
> of 2 ABMC counters per group. User will also have the option to enable only
> one counter to the group. If the system runs out of assignable ABMC
> counters, kernel will display an error. Users need to disable an already
> enabled counter to make space for new assignments.

The implementation appears to be converging on an interface that can
be generic enough to be used by other features discussed along the way.
"Linux implementation" summary can thus add:

	Create a generic interface aimed to support user space assignment
	of scarce counters used for monitoring. First usage of interface
	is by ABMC with option to expand usage to "soft-RMID" and MPAM
	counters in future.


> # Examples
> 
> a. Check if ABMC support is available
> 	#mount -t resctrl resctrl /sys/fs/resctrl/
> 
> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_mode
> 	[abmc]
> 	legacy
> 
> 	Linux kernel detected ABMC feature and it is enabled.

How about renaming "abmc" to "mbm_cntrs"? This will match the num_mbm_cntrs
info file and be the final step to make this generic so that another architecture
can more easily support assignining hardware counters without needing to call
the feature AMD's "abmc".

Expanding on this it may be possible to add a new "sw_mbm_cntrs" feature that
will be the "soft-RMID" feature while also reflecting the "mbm_cntrs" name
so that when user space enables that feature its properties can be found in
"num_mbm_cntrs".

The "abmc" kernel parameter remains but that does seem separate from this
resctrl fs feature since it is explicitly tied to X86_FEATURE_ABMC surely
making it architecture specific.

> 
> b. Check how many ABMC counters are available.
> 
> 	#cat /sys/fs/resctrl/info/L3_MON/num_cntrs
> 	32

This is now num_mbm_cntrs

> 
> c. Create few resctrl groups.
> 
> 	# mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
> 	# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
> 	# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
> 
> 
> d. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_control
>     to list and modify the group's monitoring states. File provides single place
>     to list monitoring states of all the resctrl groups. It makes it easier for
>     user space to learn about the counters are used without needing to traverse
>     all the groups thus reducing the number of filesystem calls.
> 
> 	The list follows the following format:
> 
> 	"<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
> 
> 	Format for specific type of groups:
> 
> 	* Default CTRL_MON group:
> 	 "//<domain_id>=<flags>"
> 
>         * Non-default CTRL_MON group:
>                 "<CTRL_MON group>//<domain_id>=<flags>"
> 
>         * Child MON group of default CTRL_MON group:
>                 "/<MON group>/<domain_id>=<flags>"
> 
>         * Child MON group of non-default CTRL_MON group:
>                 "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
> 
>         Flags can be one of the following:
> 
>          t  MBM total event is enabled.
>          l  MBM local event is enabled.
>          tl Both total and local MBM events are enabled.
>          _  None of the MBM events are enabled

The language needs to be changed here (and in the many copied places) to
be specific about what setting the flag accomplishes. For example, in
"legacy" mode user space can be expected to find all events enabled, no?
Needing a new feature to set a flag to accomplish something that is
possible in legacy mode can thus cause confusion.

If I understand the implementation reading "mbm_control" will fail
if system is ABMC capable but it is disabled. Why can "mbm_control" not
always be displayed to user space? For example, what if "mbm_control" is
always available to user space and it can provide specific information to
user space. For example:
	t  MBM total event is enabled but may not always be counted.
	T  MBM total event is enabled and being counted.

On AMD systems resource groups will have "t" associated with monitor
groups when ABMC disabled, "T" when ABMC enabled and a counter assigned.
On Intel systems monitor groups will always have "T".

For "soft-RMID" the flag could possible continue to be "T"?

I am trying to find ways to communicate to user space consistently
and clearly and any insights will be appreciated. We really do not want
to add this interface and then find that it just causes confusion.

It is not quite obvious to me when the new files should be visible and
what they should present to the user. "mbm_mode" is now always visible.
Should "num_mbm_cntrs" not also always be visible? Right now "num_mbm_cntrs"
appears to be only associated to ABMC, should it not also, for example,
be the file that "soft-RMID" may use to share how many counters are
available? Its contents will thus be dynamic based on which "MBM mode" is
active, begging the question, what should it contain when "legacy" mode is
enabled, should "num_mbm_cntrs" perhaps show "0" to user space when
"legacy" mode is active?


> 
> 	Examples:
> 
> 	# cat /sys/fs/resctrl/info/L3_MON/mbm_control
> 	non_default_ctrl_mon_grp//0=tl;1=tl;
> 	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> 	//0=tl;1=tl;
> 	/child_default_mon_grp/0=tl;1=tl;
> 	
> 	There are four groups and all the groups have local and total
> 	event enabled on domain 0 and 1.

"local and total event" is vague, can it be made specific with, for example,
"local and total MBM events"

> 
> 	=tl means both total and local events are enabled.

Same here (and all copied places in this series)

> 
> 	"//" - This is a default CTRL_MON group
> 
> 	"non_default_ctrl_mon_grp//" - This is non-default CTRL_MON group
> 
> 	"/child_default_mon_grp/"  - This is Child MON group of the defult group

Same typos as in previous version of cover letter.

> 
> 	"non_default_ctrl_mon_grp/child_non_default_mon_grp/" - This is child
> 	MON group of the non-default group
> 
> e. Update the group assignment states using the interface file /sys/fs/resctrl/info/L3_MON/mbm_control.
> 
> 	The write format is similar to the above list format with addition of
> 	op-code for the assignment operation.
> 	
> 	* Default CTRL_MON group:
> 	        "//<domain_id><op-code><flags>"
> 	
> 	* Non-default CTRL_MON group:
> 	        "<CTRL_MON group>//<domain_id><op-code><flags>"
> 	
> 	* Child MON group of default CTRL_MON group:
> 	        "/<MON group>/<domain_id><op-code><flags>"
> 	
> 	* Child MON group of non-default CTRL_MON group:
> 	        "<CTRL_MON group>/<MON group>/<domain_id><op-code><flags>"
> 	
> 	Op-code can be one of the following:
> 	
> 	= Update the assignment to match the flag.
> 	+ Assign a new state.
> 	- Unassign a new state.

Please be consistent with terminology. Above switches between "flag"
and "state" while it then continues below using "event". Also,
"Unassign a _new_ state" is unexpected, it should probably be an
_existing_ (not "new") state/flag/event?

> 
> 	Flags can be one of the following:
> 
>          t  MBM total event.
>          l  MBM local event.
>          tl Both total and local MBM events.
>          _  None of the MBM events. Only works with '=' op-code.
> 	
> 	Initial group status:
> 	# cat /sys/fs/resctrl/info/L3_MON/mbm_control
> 	non_default_ctrl_mon_grp//0=tl;1=tl;
> 	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> 	//0=tl;1=tl;
> 	/child_default_mon_grp/0=tl;1=tl;
> 
> 	To update the default group to enable only total event on domain 0:
> 	# echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_control
> 
> 	Assignment status after the update:
> 	# cat /sys/fs/resctrl/info/L3_MON/mbm_control
> 	non_default_ctrl_mon_grp//0=tl;1=tl;
> 	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> 	//0=t;1=tl;
> 	/child_default_mon_grp/0=tl;1=tl;
> 
> 	To update the MON group child_default_mon_grp to remove total event on domain 1:
> 	# echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_control
> 
> 	Assignment status after the update:
> 	$ cat /sys/fs/resctrl/info/L3_MON/mbm_control
> 	non_default_ctrl_mon_grp//0=tl;1=tl;
> 	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> 	//0=t;1=tl;
> 	/child_default_mon_grp/0=tl;1=l;
> 
> 	To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to
> 	remove both local and total events on domain 1:
> 	# echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
> 	       /sys/fs/resctrl/info/L3_MON/mbm_control
> 
> 	Assignment status after the update:
> 	non_default_ctrl_mon_grp//0=tl;1=tl;
> 	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
> 	//0=t;1=tl;
> 	/child_default_mon_grp/0=tl;1=l;
> 
> 	To update the default group to add a local event domain 0.
> 	# echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_control
> 
> 	Assignment status after the update:
> 	# cat /sys/fs/resctrl/info/L3_MON/mbm_control
> 	non_default_ctrl_mon_grp//0=tl;1=tl;
> 	non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
> 	//0=tl;1=tl;
> 	/child_default_mon_grp/0=tl;1=l;
> 
> 
> f. Read the event mbm_total_bytes and mbm_local_bytes of the default group.
>     There is no change in reading the events with ABMC. If the event is unassigned
>     when reading, then the read will come back as "Unassigned".
> 	
> 	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> 	779247936
> 	# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> 	765207488
> 	
> g. Users will have the option to go back to legacy mbm_mode if required.
>     This can be done using the following command. Note that switching the
>     mbm_mode will reset all the mbm counters of all resctrl groups.

mbm -> MBM (throughout)

> 
> 	# echo "legacy" > /sys/fs/resctrl/info/L3_MON/mbm_mode
> 	# cat /sys/fs/resctrl/info/L3_MON/mbm_mode
> 	abmc
> 	[legacy]
> 
> h. Check the bandwidth configuration for the group. Note that bandwidth
>     configuration has a domain scope. Total event defaults to 0x7F (to
>     count all the events) and local event defaults to 0x15 (to count all
>     the local numa events). The event bitmap decoding is available at
>     https://www.kernel.org/doc/Documentation/x86/resctrl.rst
>     in section "mbm_total_bytes_config", "mbm_local_bytes_config":
> 	
> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> 	0=0x7f;1=0x7f
> 	
> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
> 	0=0x15;1=0x15
> 	
> j. Change the bandwidth source for domain 0 for the total event to count only reads.
>     Note that this change effects total events on the domain 0.
> 	
> 	#echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> 	0=0x33;1=0x7F
> 	
> k. Now read the total event again. The first read will come back with "Unavailable"
>     status. The subsequent read of mbm_total_bytes will display only the read events.
> 	
> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> 	Unavailable
> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> 	314101
> 	
> l. Unmount the resctrl
> 	
> 	#umount /sys/fs/resctrl/
> 

Reinette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ