lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALPaoChSzzU5mzMZsdT6CeyEn0WD1qdT9fKCoNW_ty4tojtrkw@mail.gmail.com>
Date: Mon, 19 May 2025 17:59:17 +0200
From: Peter Newman <peternewman@...gle.com>
To: Babu Moger <babu.moger@....com>
Cc: corbet@....net, tony.luck@...el.com, reinette.chatre@...el.com, 
	tglx@...utronix.de, mingo@...hat.com, bp@...en8.de, 
	dave.hansen@...ux.intel.com, james.morse@....com, dave.martin@....com, 
	fenghuay@...dia.com, x86@...nel.org, hpa@...or.com, paulmck@...nel.org, 
	akpm@...ux-foundation.org, thuth@...hat.com, rostedt@...dmis.org, 
	ardb@...nel.org, gregkh@...uxfoundation.org, daniel.sneddon@...ux.intel.com, 
	jpoimboe@...nel.org, alexandre.chartre@...cle.com, 
	pawan.kumar.gupta@...ux.intel.com, thomas.lendacky@....com, 
	perry.yuan@....com, seanjc@...gle.com, kai.huang@...el.com, 
	xiaoyao.li@...el.com, kan.liang@...ux.intel.com, xin3.li@...el.com, 
	ebiggers@...gle.com, xin@...or.com, sohil.mehta@...el.com, 
	andrew.cooper3@...rix.com, mario.limonciello@....com, 
	linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org, 
	maciej.wieczor-retman@...el.com, eranian@...gle.com, Xiaojian.Du@....com, 
	gautham.shenoy@....com
Subject: Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth
 Monitoring Counters (ABMC)

Hi Babu,

On Fri, May 16, 2025 at 12:52 AM Babu Moger <babu.moger@....com> wrote:
>
>
> This series adds the support for Assignable Bandwidth Monitoring Counters
> (ABMC). It is also called QoS RMID Pinning feature
>
> Series is written such that it is easier to support other assignable
> features supported from different vendors.
>
> The feature details are documented in the  APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC). The documentation is available at
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>
> The patches are based on top of commit
> 92a09c47464d0 (tag: v6.15-rc5, tip/irq/merge) Linux 6.15-rc5
> plus
> https://lore.kernel.org/lkml/20250515165855.31452-1-james.morse@arm.com/
>
> It is very clear these patches will go after James's resctrl FS/ARCH
> restructure. Hoping to avoid one review cycle due to the merge.
>
> # Introduction
>
> Users can create as many monitor groups as RMIDs supported by the hardware.
> However, bandwidth monitoring feature on AMD system only guarantees that
> RMIDs currently assigned to a processor will be tracked by hardware.
> The counters of any other RMIDs which are no longer being tracked will be
> reset to zero. The MBM event counters return "Unavailable" for the RMIDs
> that are not tracked by hardware. So, there can be only limited number of
> groups that can give guaranteed monitoring numbers. With ever changing
> configurations there is no way to definitely know which of these groups
> are being tracked for certain point of time. Users do not have the option
> to monitor a group or set of groups for certain period of time without
> worrying about counter being reset in between.
>
> The ABMC feature provides an option to the user to assign a hardware
> counter to an RMID, event pair and monitor the bandwidth as long as it is
> assigned.  The assigned RMID will be tracked by the hardware until the user
> unassigns it manually. There is no need to worry about counters being reset
> during this period. Additionally, the user can specify a bitmask identifying
> the specific bandwidth types from the given source to track with the counter.
>
> Without ABMC enabled, monitoring will work in current 'default' mode without
> assignment option.
>
> # History
>
> Earlier implementation of ABMC had dependancy on BMEC (Bandwidth Monitoring
> Event Configuration). Peter had concerns with that implementation because
> it may be not be compatible with ARM's MPAM.
>
> Here are the threads discussing the concerns and new interface to address the concerns.
> https://lore.kernel.org/lkml/CALPaoCg97cLVVAcacnarp+880xjsedEWGJPXhYpy4P7=ky4MZw@mail.gmail.com/
> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>
> Here are the finalized requirements based on the discussion:
>
> *   Remove BMEC dependency on the ABMC feature.
>
> *   Eliminate global assignment listing. The interface
>     /sys/fs/resctrl/info/L3_MON/mbm_assign_control is no longer required.
>
> *   Create the configuration directories at /sys/fs/resctrl/info/L3_MON/counter_configs/.
>     The configuration file names should be free-form, allowing users to create them as needed.
>
> *   Perform assignment listing at the group level by introducing mbm_L3_assignments
>     in each monitoring group. The listing should provide the following details:
>
>     Event Configuration: Specifies the event configuration applied. This will be crucial
>     when "mkdir" on event configuration is added in the future, leading to the creation
>     of mon_data/mon_l3_*/<event configuration>.
>
>     Domains: Identifies the domains where the configuration is applied, supporting multi-domain setups.
>
>     Assignment Type: Indicates whether the assignment is Exclusive (e or d), Shared (s), or Unassigned (_).
>
> *   Provide option to enable or disable auto assignment when new group is created.

So far I was able to reenable MBM on AMD implementations (for some
users) while deferring on the counter assignment interface discussion
by just making shared assignment the default for newly-created groups.
Until they want to upgrade assignments to exclusive or break down
traffic with multiple counters to watch a particular group more
closely, they won't need to change any assignments.

Just pointing out that this turned out to be a useful first step in
deploying ABMC support.

>
> This series tries to address all the requirements listed above.
>
> # Implementation details
>
> Create a generic interface aimed to support user space assignment of scarce
> counters used for monitoring. First usage of interface is by ABMC with option
> to expand usage to "soft-ABMC" and MPAM counters in future.

I'll try to identify any issues I've encountered with "soft-ABMC".
Hopefully I'll be able to share a sample implementation based on these
patches soon.

There's now more interest in Google for allowing explicit control of
where RMIDs are assigned on Intel platforms. Even though the number of
RMIDs implemented by hardware tends to be roughly the number of
containers they want to support, they often still need to create
containers when all RMIDs have already been allocated, which is not
currently allowed. Once the container has been created and starts
running, it's no longer possible to move its threads into a monitoring
group whenever RMIDs should become available again, so it's important
for resctrl to maintain an accurate task list for a container even
when RMIDs are not available.

>
> Feature adds following interface files:
>
> /sys/fs/resctrl/info/L3_MON/mbm_assign_mode: Reports the list of assignable
> monitoring features supported. The enclosed brackets indicate which
> feature is enabled.
>
> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
> counters available for assignment.

Earlier I discussed with Reinette[1] what num_mbm_cntrs should
represent in a "soft-ABMC" implementation where assignment is
implemented by assigning an RMID, which would result in all events
being assigned at once.

My main concern is how many "counters" you can assign by assigning
RMIDs. I recall Reinette proposed reporting the number of groups which
can be assigned separately from counters which can be assigned.

>
> /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs: Reports the number of monitoring
> counters free in each domain.
>
> /sys/fs/resctrl/info/L3_MON/counter_configs : Directory to hold the counter configuration.
>
> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter : Default configuration
> for MBM total events.
>
> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter : Default configuration
> for MBM local events.

IIUC, this needs to be implemented now so you can drop BMEC with this series?

>
> /sys/fs/resctrl/mbm_L3_assignments: Interface to list or modify assignment states on each group.
>
> # Examples
>
> a. Check if ABMC support is available
>         #mount -t resctrl resctrl /sys/fs/resctrl/
>
>         # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>         [mbm_cntr_assign]
>         default
>
>         ABMC feature is detected and it is enabled.
>
> b. Check how many ABMC counters are available.
>
>         # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
>         32
>
> c. Check how many ABMC counters are available in each domain.
>
>         # cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
>         0=30;1=30
>
> d. Check default counter configuration.
>
>         # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>         local_reads, remote_reads, local_non_temporal_writes, remote_non_temporal_writes,
>         local_reads_slow_memory, remote_reads_slow_memory, dirty_victim_writes_all
>
>         # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>         local_reads, local_non_temporal_writes, local_reads_slow_memory
>
> e. Series adds a new interface file "mbm_L3_assignments" in each monitoring group
>    to list and modify any group's monitoring states.

To confirm, would we have "mbm_<resource_name>_assignments" for each
resource where MBM-ish events could be assigned?

>
>         The list is displayed in the following format:
>
>         <Event configuration>:<Domain id>=<Assignment type>

For soft-ABMC assignment, is there just a single event configuration
representing all the events tracked by the RMID?

>
>         Event configuration: A valid event configuration listed in the
>         /sys/fs/resctrl/info/L3_MON/counter_configs directory.
>
>         Domain ID: A valid domain ID number.
>
>         Assignment types:
>
>         _ : No event configuration assigned
>
>         e : Event configuration assigned in exclusive mode
>
>         To list the default group states:
>         # cat /sys/fs/resctrl/mbm_L3_assignments
>         mbm_total_bytes:0=e;1=e
>         mbm_local_bytes:0=e;1=e
>
>         To unassign the configuration of mbm_total_bytes on domain 0:
>         #echo "mbm_total_bytes:0=_" > mbm_L3_assignments
>         #cat mbm_L3_assignments
>         mbm_total_bytes:0=_;1=e
>         mbm_local_bytes:0=e;1=e
>
>         To unassign the mbm_total_bytes configuration on all domains:
>         $echo "mbm_total_bytes:*=_" > mbm_L3_assignments
>         $cat mbm_L3_assignments
>         mbm_total_bytes:0=_;1=_
>         mbm_local_bytes:0=e;1=e
>
>         To assign the mbm_total_bytes configuration on all domains in exclusive mode:
>         $echo "mbm_total_bytes:*=e" > mbm_L3_assignments
>         $cat mbm_L3_assignments
>         mbm_total_bytes:0=e;1=e
>         mbm_local_bytes:0=e;1=e
>
> g. Read the events mbm_total_bytes and mbm_local_bytes of the default group.
>    There is no change in reading the events with ABMC. If the event is unassigned
>    when reading, then the read will come back as "Unassigned".
>
>         # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>         779247936
>         # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>         765207488
>
> h. Check the default event configurations.
>
>         #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>         local_reads, remote_reads, local_non_temporal_writes, remote_non_temporal_writes,
>         local_reads_slow_memory, remote_reads_slow_memory, dirty_victim_writes_all
>
>         #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>         local_reads, local_non_temporal_writes, local_reads_slow_memory

These look like the BMEC event names converted from camel case. Will
event filter programming be portable?

Thanks,
-Peter


[1] https://lore.kernel.org/lkml/b3babdac-da08-4dfd-9544-47db31d574f5@intel.com/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ