[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1db8ad73-5194-4821-844a-8fd7cac72ad4@intel.com>
Date: Wed, 12 Mar 2025 10:14:05 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: <babu.moger@....com>, "Moger, Babu" <bmoger@....com>, "Luck, Tony"
<tony.luck@...el.com>
CC: Peter Newman <peternewman@...gle.com>, Dave Martin <Dave.Martin@....com>,
<corbet@....net>, <tglx@...utronix.de>, <mingo@...hat.com>, <bp@...en8.de>,
<dave.hansen@...ux.intel.com>, <x86@...nel.org>, <hpa@...or.com>,
<paulmck@...nel.org>, <akpm@...ux-foundation.org>, <thuth@...hat.com>,
<rostedt@...dmis.org>, <xiongwei.song@...driver.com>,
<pawan.kumar.gupta@...ux.intel.com>, <daniel.sneddon@...ux.intel.com>,
<jpoimboe@...nel.org>, <perry.yuan@....com>, <sandipan.das@....com>,
<kai.huang@...el.com>, <xiaoyao.li@...el.com>, <seanjc@...gle.com>,
<xin3.li@...el.com>, <andrew.cooper3@...rix.com>, <ebiggers@...gle.com>,
<mario.limonciello@....com>, <james.morse@....com>,
<tan.shaopeng@...itsu.com>, <linux-doc@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <maciej.wieczor-retman@...el.com>,
<eranian@...gle.com>
Subject: Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth
Monitoring Counters (ABMC)
Hi Babu,
On 3/12/25 9:03 AM, Moger, Babu wrote:
> Hi Reinette,
>
> On 3/12/25 10:07, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 3/11/25 1:35 PM, Moger, Babu wrote:
>>> Hi All,
>>>
>>> On 3/10/25 22:51, Reinette Chatre wrote:
>>>>
>>>>
>>>> On 3/10/25 6:44 PM, Moger, Babu wrote:
>>>>> Hi Tony,
>>>>>
>>>>> On 3/10/2025 6:22 PM, Luck, Tony wrote:
>>>>>> On Mon, Mar 10, 2025 at 05:48:44PM -0500, Moger, Babu wrote:
>>>>>>> Hi All,
>>>>>>>
>>>>>>> On 3/5/2025 1:34 PM, Moger, Babu wrote:
>>>>>>>> Hi Peter,
>>>>>>>>
>>>>>>>> On 3/5/25 04:40, Peter Newman wrote:
>>>>>>>>> Hi Babu,
>>>>>>>>>
>>>>>>>>> On Tue, Mar 4, 2025 at 10:49 PM Moger, Babu <babu.moger@....com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Peter,
>>>>>>>>>>
>>>>>>>>>> On 3/4/25 10:44, Peter Newman wrote:
>>>>>>>>>>> On Mon, Mar 3, 2025 at 8:16 PM Moger, Babu <babu.moger@....com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Peter/Reinette,
>>>>>>>>>>>>
>>>>>>>>>>>> On 2/26/25 07:27, Peter Newman wrote:
>>>>>>>>>>>>> Hi Babu,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Feb 25, 2025 at 10:31 PM Moger, Babu <babu.moger@....com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 2/25/25 11:11, Peter Newman wrote:
>>>>>>>>>>>>>>> Hi Reinette,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Feb 21, 2025 at 11:43 PM Reinette Chatre
>>>>>>>>>>>>>>> <reinette.chatre@...el.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 2/21/25 5:12 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>> On Thu, Feb 20, 2025 at 7:36 PM Reinette Chatre
>>>>>>>>>>>>>>>>> <reinette.chatre@...el.com> wrote:
>>>>>>>>>>>>>>>>>> On 2/20/25 6:53 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>>> On Wed, Feb 19, 2025 at 7:21 PM Reinette Chatre
>>>>>>>>>>>>>>>>>>> <reinette.chatre@...el.com> wrote:
>>>>>>>>>>>>>>>>>>>> On 2/19/25 3:28 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>>>>> On Tue, Feb 18, 2025 at 6:50 PM Reinette Chatre
>>>>>>>>>>>>>>>>>>>>> <reinette.chatre@...el.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> On 2/17/25 2:26 AM, Peter Newman wrote:
>>>>>>>>>>>>>>>>>>>>>>> On Fri, Feb 14, 2025 at 8:18 PM Reinette Chatre
>>>>>>>>>>>>>>>>>>>>>>> <reinette.chatre@...el.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> On 2/14/25 10:31 AM, Moger, Babu wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> On 2/14/2025 12:26 AM, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> On 2/13/25 9:37 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Feb 12, 2025 at 03:33:31PM -0800, Reinette Chatre wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 2/12/25 9:46 AM, Dave Martin wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jan 22, 2025 at 02:20:08PM -0600, Babu Moger wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> (quoting relevant parts with goal to focus discussion on new possible syntax)
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I see the support for MPAM events distinct from the support of assignable counters.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Once the MPAM events are sorted, I think that they can be assigned with existing interface.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please help me understand if you see it differently.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Doing so would need to come up with alphabetical letters for these events,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> which seems to be needed for your proposal also? If we use possible flags of:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> mbm_local_read_bytes a
>>>>>>>>>>>>>>>>>>>>>>>>>>>> mbm_local_write_bytes b
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Then mbm_assign_control can be used as:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> # echo '//0=ab;1=b' >/sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_read_bytes
>>>>>>>>>>>>>>>>>>>>>>>>>>>> <value>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>>>>>>>>>>>>>>>>>>>>> <sum of mbm_local_read_bytes and mbm_local_write_bytes>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> One issue would be when resctrl needs to support more than 26 events (no more flags available),
>>>>>>>>>>>>>>>>>>>>>>>>>>>> assuming that upper case would be used for "shared" counters (unless this interface is defined
>>>>>>>>>>>>>>>>>>>>>>>>>>>> differently and only few uppercase letters used for it). Would this be too low of a limit?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> As mentioned above, one possible issue with existing interface is that
>>>>>>>>>>>>>>>>>>>>>>>> it is limited to 26 events (assuming only lower case letters are used). The limit
>>>>>>>>>>>>>>>>>>>>>>>> is low enough to be of concern.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> The events which can be monitored by a single counter on ABMC and MPAM
>>>>>>>>>>>>>>>>>>>>>>> so far are combinable, so 26 counters per group today means it limits
>>>>>>>>>>>>>>>>>>>>>>> breaking down MBM traffic for each group 26 ways. If a user complained
>>>>>>>>>>>>>>>>>>>>>>> that a 26-way breakdown of a group's MBM traffic was limiting their
>>>>>>>>>>>>>>>>>>>>>>> investigation, I would question whether they know what they're looking
>>>>>>>>>>>>>>>>>>>>>>> for.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> The key here is "so far" as well as the focus on MBM only.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> It is impossible for me to predict what we will see in a couple of years
>>>>>>>>>>>>>>>>>>>>>> from Intel RDT, AMD PQoS, and Arm MPAM that now all rely on resctrl interface
>>>>>>>>>>>>>>>>>>>>>> to support their users. Just looking at the Intel RDT spec the event register
>>>>>>>>>>>>>>>>>>>>>> has space for 32 events for each "CPU agent" resource. That does not take into
>>>>>>>>>>>>>>>>>>>>>> account the "non-CPU agents" that are enumerated via ACPI. Tony already mentioned
>>>>>>>>>>>>>>>>>>>>>> that he is working on patches [1] that will add new events and shared the idea
>>>>>>>>>>>>>>>>>>>>>> that we may be trending to support "perf" like events associated with RMID. I
>>>>>>>>>>>>>>>>>>>>>> expect AMD PQoS and Arm MPAM to provide related enhancements to support their
>>>>>>>>>>>>>>>>>>>>>> customers.
>>>>>>>>>>>>>>>>>>>>>> This all makes me think that resctrl should be ready to support more events than 26.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I was thinking of the letters as representing a reusable, user-defined
>>>>>>>>>>>>>>>>>>>>> event-set for applying to a single counter rather than as individual
>>>>>>>>>>>>>>>>>>>>> events, since MPAM and ABMC allow us to choose the set of events each
>>>>>>>>>>>>>>>>>>>>> one counts. Wherever we define the letters, we could use more symbolic
>>>>>>>>>>>>>>>>>>>>> event names.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thank you for clarifying.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> In the letters as events model, choosing the events assigned to a
>>>>>>>>>>>>>>>>>>>>> group wouldn't be enough information, since we would want to control
>>>>>>>>>>>>>>>>>>>>> which events should share a counter and which should be counted by
>>>>>>>>>>>>>>>>>>>>> separate counters. I think the amount of information that would need
>>>>>>>>>>>>>>>>>>>>> to be encoded into mbm_assign_control to represent the level of
>>>>>>>>>>>>>>>>>>>>> configurability supported by hardware would quickly get out of hand.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Maybe as an example, one counter for all reads, one counter for all
>>>>>>>>>>>>>>>>>>>>> writes in ABMC would look like...
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> (L3_QOS_ABMC_CFG.BwType field names below)
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> (per domain)
>>>>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>>>> counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>> counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>>>> counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>> counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I think this may also be what Dave was heading towards in [2] but in that
>>>>>>>>>>>>>>>>>>>> example and above the counter configuration appears to be global. You do mention
>>>>>>>>>>>>>>>>>>>> "configurability supported by hardware" so I wonder if per-domain counter
>>>>>>>>>>>>>>>>>>>> configuration is a requirement?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> If it's global and we want a particular group to be watched by more
>>>>>>>>>>>>>>>>>>> counters, I wouldn't want this to result in allocating more counters
>>>>>>>>>>>>>>>>>>> for that group in all domains, or allocating counters in domains where
>>>>>>>>>>>>>>>>>>> they're not needed. I want to encourage my users to avoid allocating
>>>>>>>>>>>>>>>>>>> monitoring resources in domains where a job is not allowed to run so
>>>>>>>>>>>>>>>>>>> there's less pressure on the counters.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> In Dave's proposal it looks like global configuration means
>>>>>>>>>>>>>>>>>>> globally-defined "named counter configurations", which works because
>>>>>>>>>>>>>>>>>>> it's really per-domain assignment of the configurations to however
>>>>>>>>>>>>>>>>>>> many counters the group needs in each domain.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I think I am becoming lost. Would a global configuration not break your
>>>>>>>>>>>>>>>>>> view of "event-set applied to a single counter"? If a counter is configured
>>>>>>>>>>>>>>>>>> globally then it would not make it possible to support the full configurability
>>>>>>>>>>>>>>>>>> of the hardware.
>>>>>>>>>>>>>>>>>> Before I add more confusion, let me try with an example that builds on your
>>>>>>>>>>>>>>>>>> earlier example copied below:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> (per domain)
>>>>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>>>>> counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>> counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>>>>> counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>>>>> counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Since the above states "per domain" I rewrite the example to highlight that as
>>>>>>>>>>>>>>>>>> I understand it:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>> domain 0:
>>>>>>>>>>>>>>>>>> counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>> counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>> domain 1:
>>>>>>>>>>>>>>>>>> counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>> counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>> domain 0:
>>>>>>>>>>>>>>>>>> counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>> counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>> domain 1:
>>>>>>>>>>>>>>>>>> counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>> counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> You mention that you do not want counters to be allocated in domains that they
>>>>>>>>>>>>>>>>>> are not needed in. So, let's say group 0 does not need counter 0 and counter 1
>>>>>>>>>>>>>>>>>> in domain 1, resulting in:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>> domain 0:
>>>>>>>>>>>>>>>>>> counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>> counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>> domain 0:
>>>>>>>>>>>>>>>>>> counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>> counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>> domain 1:
>>>>>>>>>>>>>>>>>> counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>> counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> With counter 0 and counter 1 available in domain 1, these counters could
>>>>>>>>>>>>>>>>>> theoretically be configured to give group 1 more data in domain 1:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> group 0:
>>>>>>>>>>>>>>>>>> domain 0:
>>>>>>>>>>>>>>>>>> counter 0: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>> counter 1: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>> group 1:
>>>>>>>>>>>>>>>>>> domain 0:
>>>>>>>>>>>>>>>>>> counter 2: LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>> counter 3: VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>> domain 1:
>>>>>>>>>>>>>>>>>> counter 0: LclFill,RmtFill
>>>>>>>>>>>>>>>>>> counter 1: LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>> counter 2: LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>>> counter 3: VictimBW
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The counters are shown with different per-domain configurations that seems to
>>>>>>>>>>>>>>>>>> match with earlier goals of (a) choose events counted by each counter and
>>>>>>>>>>>>>>>>>> (b) do not allocate counters in domains where they are not needed. As I
>>>>>>>>>>>>>>>>>> understand the above does contradict global counter configuration though.
>>>>>>>>>>>>>>>>>> Or do you mean that only the *name* of the counter is global and then
>>>>>>>>>>>>>>>>>> that it is reconfigured as part of every assignment?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yes, I meant only the *name* is global. I assume based on a particular
>>>>>>>>>>>>>>>>> system configuration, the user will settle on a handful of useful
>>>>>>>>>>>>>>>>> groupings to count.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Perhaps mbm_assign_control syntax is the clearest way to express an example...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> # define global configurations (in ABMC terms), not necessarily in this
>>>>>>>>>>>>>>>>> # syntax and probably not in the mbm_assign_control file.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> r=LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>>>>>>> w=VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> # legacy "total" configuration, effectively r+w
>>>>>>>>>>>>>>>>> t=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> /group0/0=t;1=t
>>>>>>>>>>>>>>>>> /group1/0=t;1=t
>>>>>>>>>>>>>>>>> /group2/0=_;1=t
>>>>>>>>>>>>>>>>> /group3/0=rw;1=_
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> - group2 is restricted to domain 0
>>>>>>>>>>>>>>>>> - group3 is restricted to domain 1
>>>>>>>>>>>>>>>>> - the rest are unrestricted
>>>>>>>>>>>>>>>>> - In group3, we decided we need to separate read and write traffic
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This consumes 4 counters in domain 0 and 3 counters in domain 1.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I see. Thank you for the example.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> resctrl supports per-domain configurations with the following possible when
>>>>>>>>>>>>>>>> using mbm_total_bytes_config and mbm_local_bytes_config:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> t(domain 0)=LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>> t(domain 1)=LclFill,RmtFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> /group0/0=t;1=t
>>>>>>>>>>>>>>>> /group1/0=t;1=t
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Even though the flags are identical in all domains, the assigned counters will
>>>>>>>>>>>>>>>> be configured differently in each domain.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> With this supported by hardware and currently also supported by resctrl it seems
>>>>>>>>>>>>>>>> reasonable to carry this forward to what will be supported next.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The hardware supports both a per-domain mode, where all groups in a
>>>>>>>>>>>>>>> domain use the same configurations and are limited to two events per
>>>>>>>>>>>>>>> group and a per-group mode where every group can be configured and
>>>>>>>>>>>>>>> assigned freely. This series is using the legacy counter access mode
>>>>>>>>>>>>>>> where only counters whose BwType matches an instance of QOS_EVT_CFG_n
>>>>>>>>>>>>>>> in the domain can be read. If we chose to read the assigned counter
>>>>>>>>>>>>>>> directly (QM_EVTSEL[ExtendedEvtID]=1, QM_EVTSEL[EvtID]=L3CacheABMC)
>>>>>>>>>>>>>>> rather than asking the hardware to find the counter by RMID, we would
>>>>>>>>>>>>>>> not be limited to 2 counters per group/domain and the hardware would
>>>>>>>>>>>>>>> have the same flexibility as on MPAM.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In extended mode, the contents of a specific counter can be read by
>>>>>>>>>>>>>> setting the following fields in QM_EVTSEL: [ExtendedEvtID]=1,
>>>>>>>>>>>>>> [EvtID]=L3CacheABMC and setting [RMID] to the desired counter ID. Reading
>>>>>>>>>>>>>> QM_CTR will then return the contents of the specified counter.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It is documented below.
>>>>>>>>>>>>>> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf
>>>>>>>>>>>>>> Section: 19.3.3.3 Assignable Bandwidth Monitoring (ABMC)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We previously discussed this with you (off the public list) and I
>>>>>>>>>>>>>> initially proposed the extended assignment mode.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, the extended mode allows greater flexibility by enabling multiple
>>>>>>>>>>>>>> counters to be assigned to the same group, rather than being limited to
>>>>>>>>>>>>>> just two.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> However, the challenge is that we currently lack the necessary interfaces
>>>>>>>>>>>>>> to configure multiple events per group. Without these interfaces, the
>>>>>>>>>>>>>> extended mode is not practical at this time.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Therefore, we ultimately agreed to use the legacy mode, as it does not
>>>>>>>>>>>>>> require modifications to the existing interface, allowing us to continue
>>>>>>>>>>>>>> using it as is.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> (I might have said something confusing in my last messages because I
>>>>>>>>>>>>>>> had forgotten that I switched to the extended assignment mode when
>>>>>>>>>>>>>>> prototyping with soft-ABMC and MPAM.)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Forcing all groups on a domain to share the same 2 counter
>>>>>>>>>>>>>>> configurations would not be acceptable for us, as the example I gave
>>>>>>>>>>>>>>> earlier is one I've already been asked about.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I don’t see this as a blocker. It should be considered an extension to the
>>>>>>>>>>>>>> current ABMC series. We can easily build on top of this series once we
>>>>>>>>>>>>>> finalize how to configure the multiple event interface for each group.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don't think it is, either. Only being able to use ABMC to assign
>>>>>>>>>>>>> counters is fine for our use as an incremental step. My longer-term
>>>>>>>>>>>>> concern is the domain-scoped mbm_total_bytes_config and
>>>>>>>>>>>>> mbm_local_bytes_config files, but they were introduced with BMEC, so
>>>>>>>>>>>>> there's already an expectation that the files are present when BMEC is
>>>>>>>>>>>>> supported.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On ABMC hardware that also supports BMEC, I'm concerned about enabling
>>>>>>>>>>>>> ABMC when only the BMEC-style event configuration interface exists.
>>>>>>>>>>>>> The scope of my issue is just whether enabling "full" ABMC support
>>>>>>>>>>>>> will require an additional opt-in, since that could remove the BMEC
>>>>>>>>>>>>> interface. If it does, it's something we can live with.
>>>>>>>>>>>>
>>>>>>>>>>>> As you know, this series is currently blocked without further feedback.
>>>>>>>>>>>>
>>>>>>>>>>>> I’d like to begin reworking these patches to incorporate Peter’s feedback.
>>>>>>>>>>>> Any input or suggestions would be appreciated.
>>>>>>>>>>>>
>>>>>>>>>>>> Here’s what we’ve learned so far:
>>>>>>>>>>>>
>>>>>>>>>>>> 1. Assignments should be independent of BMEC.
>>>>>>>>>>>> 2. We should be able to specify multiple event types to a counter (e.g.,
>>>>>>>>>>>> read, write, victimBM, etc.). This is also called shared counter
>>>>>>>>>>>> 3. There should be an option to assign events per domain.
>>>>>>>>>>>> 4. Currently, only two counters can be assigned per group, but the design
>>>>>>>>>>>> should allow flexibility to assign more in the future as the interface
>>>>>>>>>>>> evolves.
>>>>>>>>>>>> 5. Utilize the extended RMID read mode.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Here is my proposal using Peter's earlier example:
>>>>>>>>>>>>
>>>>>>>>>>>> # define event configurations
>>>>>>>>>>>>
>>>>>>>>>>>> ========================================================
>>>>>>>>>>>> Bits Mnemonics Description
>>>>>>>>>>>> ==== ========================================================
>>>>>>>>>>>> 6 VictimBW Dirty Victims from all types of memory
>>>>>>>>>>>> 5 RmtSlowFill Reads to slow memory in the non-local NUMA domain
>>>>>>>>>>>> 4 LclSlowFill Reads to slow memory in the local NUMA domain
>>>>>>>>>>>> 3 RmtNTWr Non-temporal writes to non-local NUMA domain
>>>>>>>>>>>> 2 LclNTWr Non-temporal writes to local NUMA domain
>>>>>>>>>>>> 1 mtFill Reads to memory in the non-local NUMA domain
>>>>>>>>>>>> 0 LclFill Reads to memory in the local NUMA domain
>>>>>>>>>>>> ==== ========================================================
>>>>>>>>>>>>
>>>>>>>>>>>> #Define flags based on combination of above event types.
>>>>>>>>>>>>
>>>>>>>>>>>> t = LclFill,RmtFill,LclSlowFill,RmtSlowFill,VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>> l = LclFill, LclNTWr, LclSlowFill
>>>>>>>>>>>> r = LclFill,RmtFill,LclSlowFill,RmtSlowFill
>>>>>>>>>>>> w = VictimBW,LclNTWr,RmtNTWr
>>>>>>>>>>>> v = VictimBW
>>>>>>>>>>>>
>>>>>>>>>>>> Peter suggested the following format earlier :
>>>>>>>>>>>>
>>>>>>>>>>>> /group0/0=t;1=t
>>>>>>>>>>>> /group1/0=t;1=t
>>>>>>>>>>>> /group2/0=_;1=t
>>>>>>>>>>>> /group3/0=rw;1=_
>>>>>>>>>>>
>>>>>>>>>>> After some inquiries within Google, it sounds like nobody has invested
>>>>>>>>>>> much into the current mbm_assign_control format yet, so it would be
>>>>>>>>>>> best to drop it and distribute the configuration around the filesystem
>>>>>>>>>>> hierarchy[1], which should allow us to produce something more flexible
>>>>>>>>>>> and cleaner to implement.
>>>>>>>>>>>
>>>>>>>>>>> Roughly what I had in mind:
>>>>>>>>>>>
>>>>>>>>>>> Use mkdir in a info/<resource>_MON subdirectory to create free-form
>>>>>>>>>>> names for the assignable configurations rather than being restricted
>>>>>>>>>>> to single letters. In the resulting directory, populate a file where
>>>>>>>>>>> we can specify the set of events the config should represent. I think
>>>>>>>>>>> we should use symbolic names for the events rather than raw BMEC field
>>>>>>>>>>> values. Moving forward we could come up with portable names for common
>>>>>>>>>>> events and only support the BMEC names on AMD machines for users who
>>>>>>>>>>> want specific events and don't care about portability.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I’m still processing this. Let me start with some initial questions.
>>>>>>>>>>
>>>>>>>>>> So, we are creating event configurations here, which seems reasonable.
>>>>>>>>>>
>>>>>>>>>> Yes, we should use portable names and are not limited to BMEC names.
>>>>>>>>>>
>>>>>>>>>> How many configurations should we allow? Do we know?
>>>>>>>>>
>>>>>>>>> Do we need an upper limit?
>>>>>>>>
>>>>>>>> I think so. This needs to be maintained in some data structure. We can
>>>>>>>> start with 2 default configurations for now.
>>>>
>>>> There is a big difference between no upper limit and 2. The hardware is
>>>> capable of supporting per-domain configurations so more flexibility is
>>>> certainly possible. Consider the example presented by Peter in:
>>>> https://lore.kernel.org/lkml/CALPaoCi0mFZ9TycyNs+SCR+2tuRJovQ2809jYMun4HtC64hJmA@mail.gmail.com/
>>>>
>>>>>>>>>>> Next, put assignment-control file nodes in per-domain directories
>>>>>>>>>>> (i.e., mon_data/mon_L3_00/assign_{exclusive,shared}). Writing a
>>>>>>>>>>> counter-configuration name into the file would then allocate a counter
>>>>>>>>>>> in the domain, apply the named configuration, and monitor the parent
>>>>>>>>>>> group-directory. We can also put a group/resource-scoped assign_* file
>>>>>>>>>>> higher in the hierarchy to make it easier for users who want to
>>>>>>>>>>> configure all domains the same for a group.
>>>>>>>>>>
>>>>>>>>>> What is the difference between shared and exclusive?
>>>>>>>>>
>>>>>>>>> Shared assignment[1] means that non-exclusively-assigned counters in
>>>>>>>>> each domain will be scheduled round-robin to the groups requesting
>>>>>>>>> shared access to a counter. In my tests, I assigned the counters long
>>>>>>>>> enough to produce a single 1-second MB/s sample for the per-domain
>>>>>>>>> aggregation files[2].
>>>>>>>>>
>>>>>>>>> These do not need to be implemented immediately, but knowing that they
>>>>>>>>> work addresses the overhead and scalability concerns of reassigning
>>>>>>>>> counters and reading their values.
>>>>>>>>
>>>>>>>> Ok. Lets focus on exclusive assignments for now.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Having three files—assign_shared, assign_exclusive, and unassign—for each
>>>>>>>>>> domain seems excessive. In a system with 32 groups and 12 domains, this
>>>>>>>>>> results in 32 × 12 × 3 files, which is quite large.
>>>>>>>>>>
>>>>>>>>>> There should be a more efficient way to handle this.
>>>>>>>>>>
>>>>>>>>>> Initially, we started with a group-level file for this interface, but it
>>>>>>>>>> was rejected due to the high number of sysfs calls, making it inefficient.
>>>>>>>>>
>>>>>>>>> I had rejected it due to the high-frequency of access of a large
>>>>>>>>> number of files, which has since been addressed by shared assignment
>>>>>>>>> (or automatic reassignment) and aggregated mbps files.
>>>>>>>>
>>>>>>>> I think we should address this as well. Creating three extra files for
>>>>>>>> each group isn’t ideal when there are more efficient alternatives.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Additionally, how can we list all assignments with a single sysfs call?
>>>>>>>>>>
>>>>>>>>>> That was another problem we need to address.
>>>>>>>>>
>>>>>>>>> This is not a requirement I was aware of. If the user forgot where
>>>>>>>>> they assigned counters (or forgot to disable auto-assignment), they
>>>>>>>>> can read multiple sysfs nodes to remind themselves.
>>>>>>>>
>>>>>>>> I suggest, we should provide users with an option to list the assignments
>>>>>>>> of all groups in a single command. As the number of groups increases, it
>>>>>>>> becomes cumbersome to query each group individually.
>>>>>>>>
>>>>>>>> To achieve this, we can reuse our existing mbm_assign_control interface
>>>>>>>> for this purpose. More details on this below.
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The configuration names listed in assign_* would result in files of
>>>>>>>>>>> the same name in the appropriate mon_data domain directories from
>>>>>>>>>>> which the count values can be read.
>>>>>>>>>>>
>>>>>>>>>>> # mkdir info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>>>>> # echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>>> # echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>>> # echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>>> # cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
>>>>>>>>>>> LclFill
>>>>>>>>>>> LclNTWr
>>>>>>>>>>> LclSlowFill
>>>>>>>>>>
>>>>>>>>>> I feel we can just have the configs. event_filter file is not required.
>>>>>>>>>
>>>>>>>>> That's right, I forgot that we can implement kernfs_ops::open(). I was
>>>>>>>>> only looking at struct kernfs_syscall_ops
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> #cat info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>>>> LclFill <-rename these to generic names.
>>>>>>>>>> LclNTWr
>>>>>>>>>> LclSlowFill
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think portable and non-portable event names should both be available
>>>>>>>>> as options. There are simple bandwidth measurement mechanisms that
>>>>>>>>> will be applied in general, but when they turn up an issue, it can
>>>>>>>>> often lead to a more focused investigation, requiring more precise
>>>>>>>>> events.
>>>>>>>>
>>>>>>>> I aggree. We should provide both portable and non-portable event names.
>>>>>>>>
>>>>>>>> Here is my draft proposal based on the discussion so far and reusing some
>>>>>>>> of the current interface. Idea here is to start with basic assigment
>>>>>>>> feature with options to enhance it in the future. Feel free to
>>>>>>>> comment/suggest.
>>>>>>>>
>>>>>>>> 1. Event configurations will be in
>>>>>>>> /sys/fs/resctrl/info/L3_MON/counter_configs/.
>>>>>>>>
>>>>>>>> There will be two pre-defined configurations by default.
>>>>>>>>
>>>>>>>> #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes
>>>>>>>> LclFill, LclNTWr,LclSlowFill,VictimBM,RmtSlowFill,LclSlowFill,RmtFill
>>>>>>>>
>>>>>>>> #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>> LclFill, LclNTWr, LclSlowFill
>>>>>>>>
>>>>>>>> 2. Users will have options to update these configurations.
>>>>>>>>
>>>>>>>> #echo "LclFill, LclNTWr, RmtFill" >
>>>>>>>> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>
>>>>>> This part seems odd to me. Now the "mbm_local_bytes" files aren't
>>>>>> reporting "local_bytes" any more. They report something different,
>>>>>> and users only know if they come to check the options currently
>>>>>> configured in this file. Changing the contents without changing
>>>>>> the name seems confusing to me.
>>>>>
>>>>> It is the same behaviour right now with BMEC. It is configurable.
>>>>> By default it is mbm_local_bytes, but users can configure whatever they want to monitor using /info/L3_MON/mbm_local_bytes_config.
>>>>>
>>>>> We can continue the same behaviour with ABMC, but the configuration will be in /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes.
>>>>
>>>> This could be supported by following Peter's original proposal where the name
>>>> of the counter configuration is provided by the user via a mkdir:
>>>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>>>>
>>>> As he mentioned there could be pre-populated mbm_local_bytes/mbm_total_bytes.
>>>
>>> Sure. We can do that. I was thinking in the first phase, just provide the
>>> default pre-defined configuration and option to update the configuration.
>>>
>>> We can add the mkdir support later. That way we can provide basic ABMC
>>> support without too much code complexity with mkdir support.
>>
>> This is not clear to me how you envision the "first phase". Is it what you
>> proposed above, for example:
>> #echo "LclFill, LclNTWr, RmtFill" >
>> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>
>> In above the counter configuration name is a file.
>
> Yes. That is correct.
>
> There will be two configuration files by default when resctrl is mounted
> when ABMC is enabled.
> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_total_bytes
> /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>
>>
>> How could mkdir support be added to this later if there are already files present?
>
> We already have these directories when resctrl is mounted.
> /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_total_bytes
> /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_local_bytes
> /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_total_bytes
> /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_local_bytes
>
> We dont need "mkdir" support for default configurations.
I was referring to the "mkdir" support for additional configurations that
I understood you are thinking about adding later. For example,
(copied from Peter's message
https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/):
# mkdir info/L3_MON/counter_configs/mbm_local_bytes
# echo LclFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
# echo LclNTWr > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
# echo LclSlowFill > info/L3_MON/counter_configs/mbm_local_bytes/event_filter
# cat info/L3_MON/counter_configs/mbm_local_bytes/event_filter
LclFill
LclNTWr
LclSlowFill
Any "later" work needs to be backward compatible with the first phase.
If the first phase starts with a file:
/sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
... I do not see how second phase can be backward compatible when that work
needs a directory with the same name that contains a file for configuration:
/sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
sidenote: I think interactions with the "event_filter" file needs more
descriptions since it is not clear with the provided example how user space
may want to interact with the file when adding vs replacing event configurations.
>
> My plan was to support only the default configurations in the first phase.
> That way there is no difference in the usage model with ABMC when mounted.
>
>
>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>>>>
>>>>>>>> # #cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes
>>>>>>>> LclFill, LclNTWr, RmtFill
>>>>>>>>
>>>>>>>> 3. The default configurations will be used when user mounts the resctrl.
>>>>>>>>
>>>>>>>> mount -t resctrl resctrl /sys/fs/resctrl/
>>>>>>>> mkdir /sys/fs/resctrl/test/
>>>>>>>>
>>>>>>>> 4. The resctrl group/domains can be in one of these assingnment states.
>>>>>>>> e: Exclusive
>>>>>>>> s: Shared
>>>>>>>> u: Unassigned
>>>>>>>>
>>>>>>>> Exclusive mode is supported now. Shared mode will be supported in the
>>>>>>>> future.
>>>>>>>>
>>>>>>>> 5. We can use the current /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>> to list the assignment state of all the groups.
>>>>>>>>
>>>>>>>> Format:
>>>>>>>> "<CTRL_MON group>/<MON group>/<confguration>:<domain_id>=<assign state>"
>>>>>>>>
>>>>>>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>> test//mbm_total_bytes:0=e;1=e
>>>>>>>> test//mbm_local_bytes:0=e;1=e
>>>>>>>> //mbm_total_bytes:0=e;1=e
>>>>>>>> //mbm_local_bytes:0=e;1=e
>>>>
>>>> This would make mbm_assign_control even more unwieldy and quicker to exceed a
>>>> page of data (these examples never seem to reflect those AMD systems with the many
>>>> L3 domains). How to handle resctrl files larger than 4KB needs to be well understood
>>>> and solved when/if going this route.
>>>
>>> This problem is not specific this series. I feel it is a generic problem
>>> to many of the semilar interfaces. I dont know how it is addressed. May
>>> have to investigate on this. Any pointers would be helpful.
>>
>> Dave Martin already did a lot of analysis here. What other pointers do you need?
>>
>>>
>>>
>>>>
>>>> There seems to be two opinions about this file at moment. Would it be possible to
>>>> summarize the discussion with pros/cons raised to make an informed selection?
>>>> I understand that Google as represented by Peter no longer requires/requests this
>>>> file but the motivation for this change seems new and does not seem to reduce the
>>>> original motivation for this file. We may also want to separate requirements for reading
>>>> from and writing to this file.
>>>
>>> Yea. We can just use mbm_assign_control for reading the assignment states.
>>>
>>> Summary: We have two proposals.
>>>
>>> First one from Peter:
>>>
>>> https://lore.kernel.org/lkml/CALPaoCiii0vXOF06mfV=kVLBzhfNo0SFqt4kQGwGSGVUqvr2Dg@mail.gmail.com/
>>>
>>>
>>> Pros
>>> a. Allows flexible creation of free-form names for assignable
>>> configurations, stored in info/L3_MON/counter_configs/.
>>>
>>> b. Events can be accessed using corresponding free-form names in the
>>> mon_data directory, making it clear to users what each event represents.
>>>
>>>
>>> Cons:
>>> a. Requires three separate files for assignment in each group
>>> (assign_exclusive, assign_shared, unassign), which might be excessive.
>>>
>>> b. No built-in listing support, meaning users must query each group
>>> individually to check assignment states.
>>>
>>>
>>> Second Proposal (Mine)
>>>
>>> https://lore.kernel.org/lkml/a4ab53b5-03be-4299-8853-e86270d46f2e@amd.com/
>>>
>>> Pros:
>>>
>>> a. Maintains the flexibility of free-form names for assignable
>>> configurations (info/L3_MON/counter_configs/).
>>>
>>> b. Events remain accessible via free-form names in mon_data, ensuring
>>> clarity on their purpose.
>>>
>>> c. Adds the ability to list assignment states for all groups in a single
>>> command.
>>>
>>> Cons:
>>> a. Potential buffer overflow issues when handling a large number of
>>> groups and domains and code complexity to fix the issue.
>>>
>>>
>>> Third Option: A Hybrid Approach
>>>
>>> We could combine elements from both proposals:
>>>
>>> a. Retain the free-form naming approach for assignable configurations in
>>> info/L3_MON/counter_configs/.
>>>
>>> b. Use the assignment method from the first proposal:
>>> $mkdir test
>>> $echo mbm_local_bytes > test/mon_data/mon_L3_00/assign_exclusive
>>>
>>> c. Introduce listing support via the info/L3_MON/mbm_assign_control
>>> interface, enabling users to read assignment states for all groups in one
>>> place. Only reading support.
>>>
>>>
>>>>
>>>>>>>>
>>>>>>>> 6. Users can modify the assignment state by writing to mbm_assign_control.
>>>>>>>>
>>>>>>>> Format:
>>>>>>>> “<CTRL_MON group>/<MON group>/<configuration>:<domain_id>=<assign state>”
>>>>>>>>
>>>>>>>> #echo "test//mbm_local_bytes:0=e;1=e" >
>>>>>>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>
>>>>>>>> #echo "test//mbm_local_bytes:0=u;1=u" >
>>>>>>>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>
>>>>>>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>> test//mbm_total_bytes:0=u;1=u
>>>>>>>> test//mbm_local_bytes:0=u;1=u
>>>>>>>> //mbm_total_bytes:0=e;1=e
>>>>>>>> //mbm_local_bytes:0=e;1=e
>>>>>>>>
>>>>>>>> The corresponding events will be read in
>>>>>>>>
>>>>>>>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>>>>>>> /sys/fs/resctrl/mon_data/mon_L3_01/mbm_total_bytes
>>>>>>>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>> /sys/fs/resctrl/mon_data/mon_L3_01/mbm_local_bytes
>>>>>>>> /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_total_bytes
>>>>>>>> /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_total_bytes
>>>>>>>> /sys/fs/resctrl/test/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>> /sys/fs/resctrl/test/mon_data/mon_L3_01/mbm_local_bytes
>>>>>>>>
>>>>>>>> 7. In the first stage, only two configurations(mbm_total_bytes and
>>>>>>>> mbm_local_bytes) will be supported.
>>>>>>>>
>>>>>>>> 8. In the future, there will be options to create multiple configurations
>>>>>>>> and corresponding directory will be created in
>>>>>>>> /sysf/fs/resctrl/test/mon_data/mon_L3_00/<configation name>.
>>>>>>
>>>>>> Would this be done by creating a new file in the /sys/fs/resctrl/info/L3_MON/counter_configs
>>>>>> directory? Like this:
>>>>>>
>>>>>> # echo "LclFill, LclNTWr, RmtFill" >
>>>>>> /sys/fs/resctrl/info/L3_MON/counter_configs/cache_stuff
>>>>>>
>>>>>> This seems OK (dependent on the user picking meaningful names for
>>>>>> the set of attributes picked ... but if they want to name this
>>>>>> monitor file "brian" then they have to live with any confusion
>>>>>> that they bring on themselves).
>>>>>>
>>>>>> Would this involve an extension to kernfs? I don't see a function
>>>>>> pointer callback for file creation in kernfs_syscall_ops.
>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> I know you are all busy with multiple series going on parallel. I am still
>>>>>>> waiting for the inputs on this. It will be great if you can spend some time
>>>>>>> on this to see if we can find common ground on the interface.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Babu
>>>>>>
>>>>>> -Tony
>>>>>>
>>>>>
>>>>>
>>>>> thanks
>>>>> Babu
>>>>
>>>> Reinette
>>>>
>>>>
>>>
>>
>>
>
Powered by blists - more mailing lists