[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4dbcea13-382e-4af2-960d-0e66652cc2f5@amd.com>
Date: Tue, 20 May 2025 10:28:41 -0500
From: "Moger, Babu" <babu.moger@....com>
To: Peter Newman <peternewman@...gle.com>
Cc: corbet@....net, tony.luck@...el.com, reinette.chatre@...el.com,
tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com, james.morse@....com, dave.martin@....com,
fenghuay@...dia.com, x86@...nel.org, hpa@...or.com, paulmck@...nel.org,
akpm@...ux-foundation.org, thuth@...hat.com, rostedt@...dmis.org,
ardb@...nel.org, gregkh@...uxfoundation.org, daniel.sneddon@...ux.intel.com,
jpoimboe@...nel.org, alexandre.chartre@...cle.com,
pawan.kumar.gupta@...ux.intel.com, thomas.lendacky@....com,
perry.yuan@....com, seanjc@...gle.com, kai.huang@...el.com,
xiaoyao.li@...el.com, kan.liang@...ux.intel.com, xin3.li@...el.com,
ebiggers@...gle.com, xin@...or.com, sohil.mehta@...el.com,
andrew.cooper3@...rix.com, mario.limonciello@....com,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
maciej.wieczor-retman@...el.com, eranian@...gle.com, Xiaojian.Du@....com,
gautham.shenoy@....com
Subject: Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth
Monitoring Counters (ABMC)
Hi Peter,
Thanks for trying the series.
On 5/19/25 10:59, Peter Newman wrote:
> Hi Babu,
>
> On Fri, May 16, 2025 at 12:52 AM Babu Moger <babu.moger@....com> wrote:
>>
>>
>> This series adds the support for Assignable Bandwidth Monitoring Counters
>> (ABMC). It is also called QoS RMID Pinning feature
>>
>> Series is written such that it is easier to support other assignable
>> features supported from different vendors.
>>
>> The feature details are documented in the APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC). The documentation is available at
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>>
>> The patches are based on top of commit
>> 92a09c47464d0 (tag: v6.15-rc5, tip/irq/merge) Linux 6.15-rc5
>> plus
>> https://lore.kernel.org/lkml/20250515165855.31452-1-james.morse@arm.com/
>>
>> It is very clear these patches will go after James's resctrl FS/ARCH
>> restructure. Hoping to avoid one review cycle due to the merge.
>>
>> # Introduction
>>
>> Users can create as many monitor groups as RMIDs supported by the hardware.
>> However, bandwidth monitoring feature on AMD system only guarantees that
>> RMIDs currently assigned to a processor will be tracked by hardware.
>> The counters of any other RMIDs which are no longer being tracked will be
>> reset to zero. The MBM event counters return "Unavailable" for the RMIDs
>> that are not tracked by hardware. So, there can be only limited number of
>> groups that can give guaranteed monitoring numbers. With ever changing
>> configurations there is no way to definitely know which of these groups
>> are being tracked for certain point of time. Users do not have the option
>> to monitor a group or set of groups for certain period of time without
>> worrying about counter being reset in between.
>>
>> The ABMC feature provides an option to the user to assign a hardware
>> counter to an RMID, event pair and monitor the bandwidth as long as it is
>> assigned. The assigned RMID will be tracked by the hardware until the user
>> unassigns it manually. There is no need to worry about counters being reset
>> during this period. Additionally, the user can specify a bitmask identifying
>> the specific bandwidth types from the given source to track with the counter.
>>
>> Without ABMC enabled, monitoring will work in current 'default' mode without
>> assignment option.
>>
>> # History
>>
>> Earlier implementation of ABMC had dependancy on BMEC (Bandwidth Monitoring
>> Event Configuration). Peter had concerns with that implementation because
>> it may be not be compatible with ARM's MPAM.
>>
>> Here are the threads discussing the concerns and new interface to address the concerns.
>> https://lore.kernel.org/lkml/CALPaoCg97cLVVAcacnarp+880xjsedEWGJPXhYpy4P7=ky4MZw@mail.gmail.com/
>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>>
>> Here are the finalized requirements based on the discussion:
>>
>> * Remove BMEC dependency on the ABMC feature.
>>
>> * Eliminate global assignment listing. The interface
>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control is no longer required.
>>
>> * Create the configuration directories at /sys/fs/resctrl/info/L3_MON/counter_configs/.
>> The configuration file names should be free-form, allowing users to create them as needed.
>>
>> * Perform assignment listing at the group level by introducing mbm_L3_assignments
>> in each monitoring group. The listing should provide the following details:
>>
>> Event Configuration: Specifies the event configuration applied. This will be crucial
>> when "mkdir" on event configuration is added in the future, leading to the creation
>> of mon_data/mon_l3_*/<event configuration>.
>>
>> Domains: Identifies the domains where the configuration is applied, supporting multi-domain setups.
>>
>> Assignment Type: Indicates whether the assignment is Exclusive (e or d), Shared (s), or Unassigned (_).
>>
>> * Provide option to enable or disable auto assignment when new group is created.
>
> So far I was able to reenable MBM on AMD implementations (for some
> users) while deferring on the counter assignment interface discussion
> by just making shared assignment the default for newly-created groups.
> Until they want to upgrade assignments to exclusive or break down
> traffic with multiple counters to watch a particular group more
> closely, they won't need to change any assignments.
>
> Just pointing out that this turned out to be a useful first step in
> deploying ABMC support.
Thank you.
>
>>
>> This series tries to address all the requirements listed above.
>>
>> # Implementation details
>>
>> Create a generic interface aimed to support user space assignment of scarce
>> counters used for monitoring. First usage of interface is by ABMC with option
>> to expand usage to "soft-ABMC" and MPAM counters in future.
>
> I'll try to identify any issues I've encountered with "soft-ABMC".
> Hopefully I'll be able to share a sample implementation based on these
> patches soon.
That would be wonderful.
>
> There's now more interest in Google for allowing explicit control of
> where RMIDs are assigned on Intel platforms. Even though the number of
> RMIDs implemented by hardware tends to be roughly the number of
> containers they want to support, they often still need to create
> containers when all RMIDs have already been allocated, which is not
> currently allowed. Once the container has been created and starts
> running, it's no longer possible to move its threads into a monitoring
> group whenever RMIDs should become available again, so it's important
> for resctrl to maintain an accurate task list for a container even
> when RMIDs are not available.
>
>>
>> Feature adds following interface files:
>>
>> /sys/fs/resctrl/info/L3_MON/mbm_assign_mode: Reports the list of assignable
>> monitoring features supported. The enclosed brackets indicate which
>> feature is enabled.
>>
>> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring
>> counters available for assignment.
>
> Earlier I discussed with Reinette[1] what num_mbm_cntrs should
> represent in a "soft-ABMC" implementation where assignment is
> implemented by assigning an RMID, which would result in all events
> being assigned at once.
>
> My main concern is how many "counters" you can assign by assigning
> RMIDs. I recall Reinette proposed reporting the number of groups which
> can be assigned separately from counters which can be assigned.
More context may be needed here. Currently, num_mbm_cntrs indicates the
number of counters available per domain, which is 32.
At the moment, we can assign 2 counters to each group, meaning each RMID
can be associated with 2 hardware counters. In theory, it's possible to
assign all 32 hardware counters to a group—allowing one RMID to be linked
with up to 32 counters. However, we currently lack the interface to
support that level of assignment.
For now, the plan is to support basic assignment and expand functionality
later once we have the necessary data structure and requirements.
>
>>
>> /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs: Reports the number of monitoring
>> counters free in each domain.
>>
>> /sys/fs/resctrl/info/L3_MON/counter_configs : Directory to hold the counter configuration.
>>
>> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter : Default configuration
>> for MBM total events.
>>
>> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter : Default configuration
>> for MBM local events.
>
> IIUC, this needs to be implemented now so you can drop BMEC with this series?
This series hides the configuration files (mbm_local_bytes_config and
mbm_total_bytes_config) required for BMEC when ABMC is enabled.
When the user switches back to "default" mode, BMEC becomes available
again. I believe it's a good approach to keep it this way.
>
>>
>> /sys/fs/resctrl/mbm_L3_assignments: Interface to list or modify assignment states on each group.
>>
>> # Examples
>>
>> a. Check if ABMC support is available
>> #mount -t resctrl resctrl /sys/fs/resctrl/
>>
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> [mbm_cntr_assign]
>> default
>>
>> ABMC feature is detected and it is enabled.
>>
>> b. Check how many ABMC counters are available.
>>
>> # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
>> 32
>>
>> c. Check how many ABMC counters are available in each domain.
>>
>> # cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
>> 0=30;1=30
>>
>> d. Check default counter configuration.
>>
>> # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>> local_reads, remote_reads, local_non_temporal_writes, remote_non_temporal_writes,
>> local_reads_slow_memory, remote_reads_slow_memory, dirty_victim_writes_all
>>
>> # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>> local_reads, local_non_temporal_writes, local_reads_slow_memory
>>
>> e. Series adds a new interface file "mbm_L3_assignments" in each monitoring group
>> to list and modify any group's monitoring states.
>
> To confirm, would we have "mbm_<resource_name>_assignments" for each
> resource where MBM-ish events could be assigned?
This is a group-level property—it resides within each group and is not
related to any specific resource.
>
>>
>> The list is displayed in the following format:
>>
>> <Event configuration>:<Domain id>=<Assignment type>
>
> For soft-ABMC assignment, is there just a single event configuration
> representing all the events tracked by the RMID?
I’m not sure about the details of how soft-ABMC will be supported. It’s
not available at the moment, but I believe it can be added once soft-ABMC
support is in place.
>
>>
>> Event configuration: A valid event configuration listed in the
>> /sys/fs/resctrl/info/L3_MON/counter_configs directory.
>>
>> Domain ID: A valid domain ID number.
>>
>> Assignment types:
>>
>> _ : No event configuration assigned
>>
>> e : Event configuration assigned in exclusive mode
>>
>> To list the default group states:
>> # cat /sys/fs/resctrl/mbm_L3_assignments
>> mbm_total_bytes:0=e;1=e
>> mbm_local_bytes:0=e;1=e
>>
>> To unassign the configuration of mbm_total_bytes on domain 0:
>> #echo "mbm_total_bytes:0=_" > mbm_L3_assignments
>> #cat mbm_L3_assignments
>> mbm_total_bytes:0=_;1=e
>> mbm_local_bytes:0=e;1=e
>>
>> To unassign the mbm_total_bytes configuration on all domains:
>> $echo "mbm_total_bytes:*=_" > mbm_L3_assignments
>> $cat mbm_L3_assignments
>> mbm_total_bytes:0=_;1=_
>> mbm_local_bytes:0=e;1=e
>>
>> To assign the mbm_total_bytes configuration on all domains in exclusive mode:
>> $echo "mbm_total_bytes:*=e" > mbm_L3_assignments
>> $cat mbm_L3_assignments
>> mbm_total_bytes:0=e;1=e
>> mbm_local_bytes:0=e;1=e
>>
>> g. Read the events mbm_total_bytes and mbm_local_bytes of the default group.
>> There is no change in reading the events with ABMC. If the event is unassigned
>> when reading, then the read will come back as "Unassigned".
>>
>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>> 779247936
>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>> 765207488
>>
>> h. Check the default event configurations.
>>
>> #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes/event_filter
>> local_reads, remote_reads, local_non_temporal_writes, remote_non_temporal_writes,
>> local_reads_slow_memory, remote_reads_slow_memory, dirty_victim_writes_all
>>
>> #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>> local_reads, local_non_temporal_writes, local_reads_slow_memory
>
> These look like the BMEC event names converted from camel case. Will
> event filter programming be portable?
Yes, that’s correct. The event types (reads, writes, etc.) supported by
both BMEC and ABMC are the same, so I’ve used generalized names here.
As for portability, I can’t comment, since I’m not familiar with how event
configuration is handled in MPAM or other architectures.
>
> Thanks,
> -Peter
>
>
> [1] https://lore.kernel.org/lkml/b3babdac-da08-4dfd-9544-47db31d574f5@intel.com/
--
Thanks
Babu Moger
Powered by blists - more mailing lists