lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALPaoCjZ3oLdKymJjASt0aqtd0GGOme7LavvYOtPYTb_rA-mYQ@mail.gmail.com>
Date: Wed, 1 May 2024 10:48:35 -0700
From: Peter Newman <peternewman@...gle.com>
To: Babu Moger <babu.moger@....com>
Cc: corbet@....net, fenghua.yu@...el.com, reinette.chatre@...el.com, 
	tglx@...utronix.de, mingo@...hat.com, bp@...en8.de, 
	dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com, 
	paulmck@...nel.org, rdunlap@...radead.org, tj@...nel.org, 
	peterz@...radead.org, yanjiewtw@...il.com, kim.phillips@....com, 
	lukas.bulwahn@...il.com, seanjc@...gle.com, jmattson@...gle.com, 
	leitao@...ian.org, jpoimboe@...nel.org, rick.p.edgecombe@...el.com, 
	kirill.shutemov@...ux.intel.com, jithu.joseph@...el.com, kai.huang@...el.com, 
	kan.liang@...ux.intel.com, daniel.sneddon@...ux.intel.com, 
	pbonzini@...hat.com, sandipan.das@....com, ilpo.jarvinen@...ux.intel.com, 
	maciej.wieczor-retman@...el.com, linux-doc@...r.kernel.org, 
	linux-kernel@...r.kernel.org, eranian@...gle.com, james.morse@....com
Subject: Re: [RFC PATCH v3 00/17] x86/resctrl : Support AMD Assignable
 Bandwidth Monitoring Counters (ABMC)

Hi Babu,

On Thu, Mar 28, 2024 at 6:07 PM Babu Moger <babu.moger@....com> wrote:
>
>
> This series adds the support for Assignable Bandwidth Monitoring Counters
> (ABMC). It is also called QoS RMID Pinning feature
>
> The feature details are documented in the  APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC). The documentation is available at
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>
> The patches are based on top of commit
> cd80c2c94699913f9334414189487ff3f93cf0b5 (tip/master)
>
> # Introduction
>
> AMD hardware can support 256 or more RMIDs. However, bandwidth monitoring
> feature only guarantees that RMIDs currently assigned to a processor will
> be tracked by hardware. The counters of any other RMIDs which are no longer
> being tracked will be reset to zero. The MBM event counters return
> "Unavailable" for the RMIDs that are not active.
>
> Users can create 256 or more monitor groups. But there can be only limited
> number of groups that can give guaranteed monitoring numbers. With ever
> changing configurations there is no way to definitely know which of these
> groups will be active for certain point of time. Users do not have the
> option to monitor a group or set of groups for certain period of time
> without worrying about RMID being reset in between.
>
> The ABMC feature provides an option to the user to assign an RMID to the
> hardware counter and monitor the bandwidth for a longer duration.
> The assigned RMID will be active until the user unassigns it manually.
> There is no need to worry about counters being reset during this period.
> Additionally, the user can specify a bitmask identifying the specific
> bandwidth types from the given source to track with the counter.
>
> Without ABMC enabled, monitoring will work in current mode without
> assignment option.
>
> # Linux Implementation
>
> Linux resctrl subsystem provides the interface to count maximum of two
> memory bandwidth events per group, from a combination of available total
> and local events. Keeping the current interface, users can assign a maximum
> of 2 ABMC counters per group. User will also have the option to assign only
> one counter to the group. If the system runs out of assignable ABMC
> counters, kernel will display an error. Users need to unassign an already
> assigned counter to make space for new assignments.
>
>
> # Examples
>
> a. Check if ABMC support is available
>         #mount -t resctrl resctrl /sys/fs/resctrl/
>
>         #cat /sys/fs/resctrl/info/L3_MON/mbm_assign
>         [abmc]
>         legacy_mbm
>
>         Linux kernel detected ABMC feature and it is enabled.
>
> b. Check how many ABMC counters are available.
>
>         #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_cntrs
>         32
>
> c. Create few resctrl groups.
>
>         # mkdir /sys/fs/resctrl/mon_groups/default_mon1
>         # mkdir /sys/fs/resctrl/non_defult_group
>         # mkdir /sys/fs/resctrl/non_defult_group/mon_groups/non_default_mon1
>
> d. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>    to list and modify the group's assignment states.
>
>    The list follows the following format:
>
>        * Default CTRL_MON group:
>                "//<domain_id>=<assignment_flags>"
>
>        * Non-default CTRL_MON group:
>                "<CTRL_MON group>//<domain_id>=<assignment_flags>"
>
>        * Child MON group of default CTRL_MON group:
>                "/<MON group>/<domain_id>=<assignment_flags>"
>
>        * Child MON group of non-default CTRL_MON group:
>                "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
>
>        Assignment flags can be one of the following:
>
>         t  MBM total event is assigned
>         l  MBM local event is assigned
>         tl Both total and local MBM events are assigned
>         _  None of the MBM events are assigned
>

I was able to successfully build a kernel where this interface is
adapted to work with both real ABMC on hardware that supports it and
my software workaround for older hardware.

My prototype is based on a refactored version of the codebase
supporting MPAM, but the capabilities of the MPAM hardware look
similar enough to ABMC that I'm not concerned about the feasibility.

The FS layer is informed by the arch layer (through rdt_resource
fields) how many assignable monitors are available and whether a
monitor is assigned to an entire group or a single event in a group.
Also, the FS layer can assume that monitors are indexed contiguously,
allowing it to host the data structures managing FS-level view of
monitor usage.

I used the following resctrl_arch-interfaces to propagate assignments
to the implementation:

void resctrl_arch_assign_monitor(struct rdt_domain *d, u32 mon_id, u32
closid, u32 rmid, int evtid);
void resctrl_arch_unassign_monitor(struct rdt_domain *d, u32 mon_id);

I chose to allow reassigning an assigned monitor without calling
unassign first. This is important when monitors are unassigned and
assigned in a single write to mbm_assign_control, as it allows all
updates to be performed in a single round of parallel IPIs to the
domains.


>
> g. Users will have the option to go back to legacy_mbm mode if required.
>    This can be done using the following command.
>
>         # echo "legacy_mbm" > /sys/fs/resctrl/info/L3_MON/mbm_assign
>         # cat /sys/fs/resctrl/info/L3_MON/mbm_assign
>         abmc
>         [legacy_mbm]

I chose to make this a mount option to simplify the management of the
monitor tracking data structures. They are simply allocated at mount
time and deallocated and unmount.

I called the option "mon_assign": The mount option parser calls
resctrl_arch_mon_assign_enable() to determine whether the
implementation supports assignment in some form. If it returns an
error, the mount fails. When successful, the assignable monitor count
is made non-zero in the appropriate rdt_resource, triggering the
behavior change in the FS layer.

I'm still not sure if it's a good idea to enable monitor assignment by
default. This would be a major disruption in the MBM usage model
triggered by moving software between AMD CPU models. I thought the
safest option was to disallow creating more monitoring groups than
monitors unless the option is selected. Given that nobody else
complained about monitoring HW limitations on the mailing list, I
assumed few users create enough monitoring groups to be impacted.

Thanks!
-Peter

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ