lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1122252b-55b9-4337-8e95-a95d3be95503@amd.com>
Date: Wed, 26 Feb 2025 11:12:07 -0600
From: "Moger, Babu" <babu.moger@....com>
To: Reinette Chatre <reinette.chatre@...el.com>,
 Peter Newman <peternewman@...gle.com>
Cc: "Moger, Babu" <bmoger@....com>, Dave Martin <Dave.Martin@....com>,
 corbet@....net, tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
 dave.hansen@...ux.intel.com, tony.luck@...el.com, x86@...nel.org,
 hpa@...or.com, paulmck@...nel.org, akpm@...ux-foundation.org,
 thuth@...hat.com, rostedt@...dmis.org, xiongwei.song@...driver.com,
 pawan.kumar.gupta@...ux.intel.com, daniel.sneddon@...ux.intel.com,
 jpoimboe@...nel.org, perry.yuan@....com, sandipan.das@....com,
 kai.huang@...el.com, xiaoyao.li@...el.com, seanjc@...gle.com,
 xin3.li@...el.com, andrew.cooper3@...rix.com, ebiggers@...gle.com,
 mario.limonciello@....com, james.morse@....com, tan.shaopeng@...itsu.com,
 linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
 maciej.wieczor-retman@...el.com, eranian@...gle.com
Subject: Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth
 Monitoring Counters (ABMC)

Hi Peter/Reinette,

On 2/26/25 10:25, Reinette Chatre wrote:
> Hi Peter,
> 
> On 2/26/25 5:27 AM, Peter Newman wrote:
>> Hi Babu,
>>
>> On Tue, Feb 25, 2025 at 10:31 PM Moger, Babu <babu.moger@....com> wrote:
>>>
>>> Hi Peter,
>>>
>>> On 2/25/25 11:11, Peter Newman wrote:
>>>> Hi Reinette,
>>>>
>>>> On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre
>>>> <reinette.chatre@...el.com> wrote:
>>>>>
>>>>> Hi Peter,
>>>>>
>>>>> On 2/21/25 5:12 AM, Peter Newman wrote:
>>>>>> On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre
>>>>>> <reinette.chatre@...el.com> wrote:
>>>>>>> On 2/20/25 6:53 AM, Peter Newman wrote:
>>>>>>>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
>>>>>>>> <reinette.chatre@...el.com> wrote:
>>>>>>>>> On 2/19/25 3:28 AM, Peter Newman wrote:
>>>>>>>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
>>>>>>>>>> <reinette.chatre@...el.com> wrote:
>>>>>>>>>>> On 2/17/25 2:26 AM, Peter Newman wrote:
>>>>>>>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
>>>>>>>>>>>> <reinette.chatre@...el.com> wrote:
>>>>>>>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
>>>>>>>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
>>>>>>>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
>>>>>>>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>>>>>>>>>>>>>>>> Please help me understand if you see it differently.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
>>>>>>>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> mbm_local_read_bytes a
>>>>>>>>>>>>>>>>> mbm_local_write_bytes b
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Then mbm_assign_control can be used as:
>>>>>>>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>>>>>>>>>>>>>>>> <value>
>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>>>>>>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>>>>>>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>>>>>>>>>>>>>
>>>>>>>>>>>>> As mentioned above, one possible issue with existing interface is that
>>>>>>>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
>>>>>>>>>>>>> is low enough to be of concern.
>>>>>>>>>>>>
>>>>>>>>>>>> The events which can be monitored by a single counter on ABMC and MPAM
>>>>>>>>>>>> so far are combinable, so 26 counters per group today means it limits
>>>>>>>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained
>>>>>>>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their
>>>>>>>>>>>> investigation, I would question whether they know what they're looking
>>>>>>>>>>>> for.
>>>>>>>>>>>
>>>>>>>>>>> The key here is "so far" as well as the focus on MBM only.
>>>>>>>>>>>
>>>>>>>>>>> It is impossible for me to predict what we will see in a couple of years
>>>>>>>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
>>>>>>>>>>> to support their users. Just looking at the Intel RDT spec the event register
>>>>>>>>>>> has space for 32 events for each "CPU agent" resource. That does not take into
>>>>>>>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
>>>>>>>>>>> that he is working on patches [1] that will add new events and shared the idea
>>>>>>>>>>> that we may be trending to support "perf" like events associated with RMID. I
>>>>>>>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
>>>>>>>>>>> customers.
>>>>>>>>>>> This all makes me think that resctrl should be ready to support more events than 26.
>>>>>>>>>>
>>>>>>>>>> I was thinking of the letters as representing a reusable, user-defined
>>>>>>>>>> event-set for applying to a single counter rather than as individual
>>>>>>>>>> events, since MPAM and ABMC allow us to choose the set of events each
>>>>>>>>>> one counts. Wherever we define the letters, we could use more symbolic
>>>>>>>>>> event names.
>>>>>>>>>
>>>>>>>>> Thank you for clarifying.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> In the letters as events model, choosing the events assigned to a
>>>>>>>>>> group wouldn't be enough information, since we would want to control
>>>>>>>>>> which events should share a counter and which should be counted by
>>>>>>>>>> separate counters. I think the amount of information that would need
>>>>>>>>>> to be encoded into mbm_assign_control to represent the level of
>>>>>>>>>> configurability supported by hardware would quickly get out of hand.
>>>>>>>>>>
>>>>>>>>>> Maybe as an example, one counter for all reads, one counter for all
>>>>>>>>>> writes in ABMC would look like...
>>>>>>>>>>
>>>>>>>>>> (L3_QOS_ABMC_CFG.BwType field names below)
>>>>>>>>>>
>>>>>>>>>> (per domain)
>>>>>>>>>> group 0:
>>>>>>>>>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>  counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>> group 1:
>>>>>>>>>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>  counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>> ...
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think this may also be what Dave was heading towards in [2] but in that
>>>>>>>>> example and above the counter configuration appears to be global. You do mention
>>>>>>>>> "configurability supported by hardware" so I wonder if per-domain counter
>>>>>>>>> configuration is a requirement?
>>>>>>>>
>>>>>>>> If it's global and we want a particular group to be watched by more
>>>>>>>> counters, I wouldn't want this to result in allocating more counters
>>>>>>>> for that group in all domains, or allocating counters in domains where
>>>>>>>> they're not needed. I want to encourage my users to avoid allocating
>>>>>>>> monitoring resources in domains where a job is not allowed to run so
>>>>>>>> there's less pressure on the counters.
>>>>>>>>
>>>>>>>> In Dave's proposal it looks like global configuration means
>>>>>>>> globally-defined "named counter configurations", which works because
>>>>>>>> it's really per-domain assignment of the configurations to however
>>>>>>>> many counters the group needs in each domain.
>>>>>>>
>>>>>>> I think I am becoming lost. Would a global configuration not break your
>>>>>>> view of "event-set applied to a single counter"? If a counter is configured
>>>>>>> globally then it would not make it possible to support the full configurability
>>>>>>> of the hardware.
>>>>>>> Before I add more confusion, let me try with an example that builds on your
>>>>>>> earlier example copied below:
>>>>>>>
>>>>>>>>>> (per domain)
>>>>>>>>>> group 0:
>>>>>>>>>>  counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>  counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>> group 1:
>>>>>>>>>>  counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>  counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>> ...
>>>>>>>
>>>>>>> Since the above states "per domain" I rewrite the example to highlight that as
>>>>>>> I understand it:
>>>>>>>
>>>>>>> group 0:
>>>>>>>  domain 0:
>>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>  domain 1:
>>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>> group 1:
>>>>>>>  domain 0:
>>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>  domain 1:
>>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>
>>>>>>> You mention that you do not want counters to be allocated in domains that they
>>>>>>> are not needed in. So, let's say group 0 does not need counter 0 and counter 1
>>>>>>> in domain 1, resulting in:
>>>>>>>
>>>>>>> group 0:
>>>>>>>  domain 0:
>>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>> group 1:
>>>>>>>  domain 0:
>>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>  domain 1:
>>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>
>>>>>>> With counter 0 and counter 1 available in domain 1, these counters could
>>>>>>> theoretically be configured to give group 1 more data in domain 1:
>>>>>>>
>>>>>>> group 0:
>>>>>>>  domain 0:
>>>>>>>   counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>   counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>> group 1:
>>>>>>>  domain 0:
>>>>>>>   counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>   counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>  domain 1:
>>>>>>>   counter 0: LclFill,RmtFill
>>>>>>>   counter 1: LclNTWr,RmtNTWr
>>>>>>>   counter 2: LclSlowFill,RmtSlowFill
>>>>>>>   counter 3: VictimBW
>>>>>>>
>>>>>>> The counters are shown with different per-domain configurations that seems to
>>>>>>> match with earlier goals of (a) choose events counted by each counter and
>>>>>>> (b) do not allocate counters in domains where they are not needed. As I
>>>>>>> understand the above does contradict global counter configuration though.
>>>>>>> Or do you mean that only the *name* of the counter is global and then
>>>>>>> that it is reconfigured as part of every assignment?
>>>>>>
>>>>>> Yes, I meant only the *name* is global. I assume based on a particular
>>>>>> system configuration, the user will settle on a handful of useful
>>>>>> groupings to count.
>>>>>>
>>>>>> Perhaps mbm_assign_control syntax is the clearest way to express an example...
>>>>>>
>>>>>>  # define global configurations (in ABMC terms), not necessarily in this
>>>>>>  # syntax and probably not in the mbm_assign_control file.
>>>>>>
>>>>>>  r=LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>  w=VictimBW,LclNTWr,RmtNTWr
>>>>>>
>>>>>>  # legacy "total" configuration, effectively r+w
>>>>>>  t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>
>>>>>>  /group0/0=t;1=t
>>>>>>  /group1/0=t;1=t
>>>>>>  /group2/0=_;1=t
>>>>>>  /group3/0=rw;1=_
>>>>>>
>>>>>> - group2 is restricted to domain 0
>>>>>> - group3 is restricted to domain 1
>>>>>> - the rest are unrestricted
>>>>>> - In group3, we decided we need to separate read and write traffic
>>>>>>
>>>>>> This consumes 4 counters in domain 0 and 3 counters in domain 1.
>>>>>>
>>>>>
>>>>> I see. Thank you for the example.
>>>>>
>>>>> resctrl supports per-domain configurations with the following possible when
>>>>> using mbm_total_bytes_config and mbm_local_bytes_config:
>>>>>
>>>>> t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>> t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr
>>>>>
>>>>>    /group0/0=t;1=t
>>>>>    /group1/0=t;1=t
>>>>>
>>>>> Even though the flags are identical in all domains, the assigned counters will
>>>>> be configured differently in each domain.
>>>>>
>>>>> With this supported by hardware and currently also supported by resctrl it seems
>>>>> reasonable to carry this forward to what will be supported next.
>>>>
>>>> The hardware supports both a per-domain mode, where all groups in a
>>>> domain use the same configurations and are limited to two events per
>>>> group and a per-group mode where every group can be configured and
>>>> assigned freely. This series is using the legacy counter access mode
>>>> where only counters whose BwType matches an instance of QOS_EVT_CFG_n
>>>> in the domain can be read. If we chose to read the assigned counter
>>>> directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC)
>>>> rather than asking the hardware to find the counter by RMID, we would
>>>> not be limited to 2 counters per group/domain and the hardware would
>>>> have the same flexibility as on MPAM.
>>>
>>> In extended mode, the contents of a specific counter can be read by
>>> setting the following fields in QM_EVTSEL: [ExtendedEvtID]=1,
>>> [EvtID]=L3CacheABMC and setting [RMID] to the desired counter ID. Reading
>>> QM_CTR will then return the contents of the specified counter.
>>>
>>> It is documented below.
>>> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf
>>>  Section: 19.3.3.3 Assignable Bandwidth Monitoring (ABMC)
>>>
>>> We previously discussed this with you (off the public list) and I
>>> initially proposed the extended assignment mode.
>>>
>>> Yes, the extended mode allows greater flexibility by enabling multiple
>>> counters to be assigned to the same group, rather than being limited to
>>> just two.
>>>
>>> However, the challenge is that we currently lack the necessary interfaces
>>> to configure multiple events per group. Without these interfaces, the
>>> extended mode is not practical at this time.
>>>
>>> Therefore, we ultimately agreed to use the legacy mode, as it does not
>>> require modifications to the existing interface, allowing us to continue
>>> using it as is.
>>>
>>>>
>>>> (I might have said something confusing in my last messages because I
>>>> had forgotten that I switched to the extended assignment mode when
>>>> prototyping with soft-ABMC and MPAM.)
>>>>
>>>> Forcing all groups on a domain to share the same 2 counter
>>>> configurations would not be acceptable for us, as the example I gave
>>>> earlier is one I've already been asked about.
>>>
>>> I don’t see this as a blocker. It should be considered an extension to the
>>> current ABMC series. We can easily build on top of this series once we
>>> finalize how to configure the multiple event interface for each group.
>>
>> I don't think it is, either. Only being able to use ABMC to assign
>> counters is fine for our use as an incremental step. My longer-term
>> concern is the domain-scoped mbm_total_bytes_config and
>> mbm_local_bytes_config files, but they were introduced with BMEC, so
>> there's already an expectation that the files are present when BMEC is
>> supported.

It's good that we at least know about this concern now. Let's take a step
back and figure out how we can address it.

>>
>> On ABMC hardware that also supports BMEC, I'm concerned about enabling
>> ABMC when only the BMEC-style event configuration interface exists.
> 
> ABMC currently depends on BMEC making the current implementation the
> one you are concerned about?
> https://lore.kernel.org/lkml/e4111779ebb0e7004dbedc258eeae2677f578ab1.1737577229.git.babu.moger@amd.com/

I think it is more than that.

The ABMC feature allows event configuration by writing to L3_QOS_ABMC_CFG,
where we can set cntr_id, RMID, and event configuration. Currently, we
derive event configuration from BMEC settings (either
mbm_total_bytes_config or mbm_local_bytes_config).

If we don’t use BMEC values, we would need to require users to manually
specify event configuration settings.

struct mbm_cntr_cfg {
        enum resctrl_event_id   evtid;
        struct rdtgroup         *rdtgrp;
};

Currently, we determine the RMID from the rdtgroup and the event type,
while event configuration relies on BMEC:


To make event configuration independent of BMEC, we can include an
explicit event configuration field:

struct mbm_cntr_cfg {
        enum resctrl_event_id   evtid;
        u32                     evt_cfg;  // User-provided config value
        struct rdtgroup         *rdtgrp;
};

Key Considerations

1.  Counter Management: Managing counters globally (like CLOSID
management) would be simpler than handling them at the domain level,
though domain-level management is feasible.

2. User Input: Users will need to specify event configuration when
assigning events.


Here is the quick example using our current interface:
a. List the group.

#cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
//0=t:0x1F,l:0x15;1=t:0x1F,l:0x15

b. Unassign an Event:

#echo "//0-l" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control

#cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
//0=t:0x1F;1=t:0x1F,l:0x15

c. Assign an Event:

#echo "//0+l:0x15" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control

Note that I dont want to rush here.

Peter, Can you please spend some time and propose the interface you are
thinking of based on both ABMC and MPAM.

> 
>> The scope of my issue is just whether enabling "full" ABMC support
>> will require an additional opt-in, since that could remove the BMEC
>> interface. If it does, it's something we can live with.
> 
> 
> Reinette
> 
> 

-- 
Thanks
Babu Moger

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ