lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51c60991-eb10-40e8-b3ab-676b92b0c662@amd.com>
Date: Thu, 8 Feb 2024 11:29:50 -0600
From: "Moger, Babu" <babu.moger@....com>
To: Reinette Chatre <reinette.chatre@...el.com>, corbet@....net,
 fenghua.yu@...el.com, tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
 dave.hansen@...ux.intel.com
Cc: x86@...nel.org, hpa@...or.com, paulmck@...nel.org, rdunlap@...radead.org,
 tj@...nel.org, peterz@...radead.org, yanjiewtw@...il.com,
 kim.phillips@....com, lukas.bulwahn@...il.com, seanjc@...gle.com,
 jmattson@...gle.com, leitao@...ian.org, jpoimboe@...nel.org,
 rick.p.edgecombe@...el.com, kirill.shutemov@...ux.intel.com,
 jithu.joseph@...el.com, kai.huang@...el.com, kan.liang@...ux.intel.com,
 daniel.sneddon@...ux.intel.com, pbonzini@...hat.com, sandipan.das@....com,
 ilpo.jarvinen@...ux.intel.com, peternewman@...gle.com,
 maciej.wieczor-retman@...el.com, linux-doc@...r.kernel.org,
 linux-kernel@...r.kernel.org, eranian@...gle.com,
 "james.morse@....com" <james.morse@....com>
Subject: Re: [PATCH v2 00/17] x86/resctrl : Support AMD Assignable Bandwidth
 Monitoring Counters (ABMC)

Hi Reinette,

I am trying to propose few things here to move forward based on my
assumptions. Please point me if I missed something.

On 2/5/24 16:38, Reinette Chatre wrote:
> Hi Babu,
> 
> On 2/2/2024 1:57 PM, Moger, Babu wrote:
>> On 2/1/2024 10:09 PM, Reinette Chatre wrote:
>>> On 1/19/2024 10:22 AM, Babu Moger wrote:
>>>> These series adds the support for Assignable Bandwidth Monitoring Counters
>>> Not a good start ([1]).
>>
>> Yea. My bad.
>>
>>>
>>>> (ABMC). It is also called QoS RMID Pinning feature
>>>>
>>>> The feature details are documented in the  APM listed below [1].
>>>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>>>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>>>> Monitoring (ABMC). The documentation is available at
>>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>>>>
>>>> The patches are based on top of commit
>>>> 1ac6b49423e83af2abed9be7fbdf2e491686c66b (tip/master)
>>>>
>>>> # Introduction
>>>>
>>>> AMD hardware can support 256 or more RMIDs. However, bandwidth monitoring
>>>> feature only guarantees that RMIDs currently assigned to a processor will
>>>> be tracked by hardware. The counters of any other RMIDs which are no longer
>>>> being tracked will be reset to zero. The MBM event counters return
>>>> "Unavailable" for the RMIDs that are not active.
>>>>      Users can create 256 or more monitor groups. But there can be only limited
>>>> number of groups that can be give guaranteed monitoring numbers.  With ever
>>> "can be given"?
>>
>> "can give guaranteed monitoring numbers."
>>
>> I feel this looks better.
> 
> Sounds good. Thank you.
> 
>>
>>>
>>>> changing configurations there is no way to definitely know which of these
>>>> groups will be active for certain point of time. Users do not have the
>>>> option to monitor a group or set of groups for certain period of time
>>>> without worrying about RMID being reset in between.
>>>>      The ABMC feature provides an option to the user to assign an RMID to the
>>>> hardware counter and monitor the bandwidth for a longer duration.
>>>> The assigned RMID will be active until the user unassigns it manually.
>>>> There is no need to worry about counters being reset during this period.
>>>> Additionally, the user can specify a bitmask identifying the specific
>>>> bandwidth types from the given source to track with the counter.
>>>>
>>>> Without ABMC enabled, monitoring will work in current mode without
>>>> assignment option.
>>>>
>>>> # Linux Implementation
>>>>
>>>> Linux resctrl subsystem provides the interface to count maximum of two
>>>> memory bandwidth events per group, from a combination of available total
>>>> and local events. Keeping the current interface, users can assign a maximum
>>>> of 2 ABMC counters per group. User will also have the option to assign only
>>>> one counter to the group. If the system runs out of assignable ABMC
>>>> counters, kernel will display an error. Users need to unassign an already
>>>> assigned counter to make space for new assignments.
>>>>
>>>>
>>>> # Examples
>>>>
>>>> a. Check if ABMC support is available
>>>>     #mount -t resctrl resctrl /sys/fs/resctrl/
>>>>
>>>>     #cat /sys/fs/resctrl/info/L3_MON/mon_features
>>>>     llc_occupancy
>>>>     mbm_total_bytes
>>>>     mbm_total_bytes_config
>>>>     mbm_local_bytes
>>>>     mbm_local_bytes_config
>>>>     mbm_assign_capable ←  Linux kernel detected ABMC feature
>>>>
>>>> b. Check if ABMC is enabled. By default, ABMC feature is disabled.
>>>>     Monitoring works in legacy monitor mode when ABMC is not enabled.
>>>>
>>>>     #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_enable
>>>>     0
>>>>
>>> With the introduction of "mbm_assign_enable" the entry in mon_features seems
>>> to provide duplicate information.
>>
>> ok. We can remove the text in mon_features and keep mbm_assign_enable. We need this to enable and disable the feature.
> 
> This could be improved beyond a binary "enable"/"disable" interface to user space.
> For example, the hardware can discover which "mbm counter assign" related feature
> (I'm counting the "soft RMID" here as one of the "mbm counter assign" related
> features) is supported on the platform and it can be presented to the user like:
> 
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign
> [feature_1] feature_2 feature_3

How about this?
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign
 ABMC:Capable
 SOFT-RMID:Capable

To enable ABMC
# echo ABMC:enable > /sys/fs/resctrl/info/L3_MON/mbm_assign

When ABMC is enabled:
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign
 ABMC:Enable
 SOFT-RMID:Capable

> The output indicates which features are supported by the platform and the brackets indicate
> which feature is enabled. 
> 
> 
>>>> c. There will be new file "monitor_state" for each monitor group when ABMC
>>>>     feature is supported. However, monitor_state is not available if ABMC is
>>>>     disabled.
>>>>     
>>>>     #cat /sys/fs/resctrl/monitor_state
>>>>     Unsupported
>>> This sounds potentially confusing since users will still be able to monitor
>>> the groups ...
>> How about "Assignment-Unsupported"?
> 
> (please see later)
> 
>>>
>>>>     
>>>> d. Read the event mbm_total_bytes and mbm_local_bytes. Without ABMC
>>>>     enabled, monitoring will work in current mode without assignment option.
>>>>     
>>>>     # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>>>     779247936
>>>>     # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>     765207488
>>>>     
>>>> e. Enable ABMC mode.
>>>>
>>>>     #echo 1 > /sys/fs/resctrl/info/L3_MON/mbm_assign_enable
>>>>          #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_enable
>>>>          1
>>>>
>>>> f. Read the monitor states. By default, both total and local MBM
>>>>     events are in "unassign" state.
>>>>     
>>>>     #cat /sys/fs/resctrl/monitor_state
>>>>     total=unassign;local=unassign
>>> This interface does not seem to take into account that hardware
>>> can support assignment per domain. I understand that this is
>>> not something you want to implement at this time but the user interface
>>> has to accommodate such an enhancement. This was already mentioned, and
>>> you did acknowledge the point [3] to this new version that does not
>>> reflect this is unexpected.
>>
>> Yea. Domain level assignment is not supported at this point. Do you want me to explicitly mention here?
>>
>> Please elaborate what you meant here.
> 
> You have made it clear on several occasions that you do not intend to support
> domain level assignment. That may be ok but the interface you create should
> not prevent future support of domain level assignment.
> 
> If my point is not clear, could you please share how this interface is able to
> support domain level assignment in the future?
> 
> I am starting to think that we need a file similar to the schemata file
> for group and domain level monitor configurations.

Something like this?

By default
#cat /sys/fs/resctrl/monitor_state
default:0=total=assign,local=assign;1=total=assign,local=assign

With ABMC,
#cat /sys/fs/resctrl/monitor_state
ABMC:0=total=unassign,local=unassign;1=total=unassign,local=unassign

> 
>>> My previous suggestions do seem to still stand and and I also am not able to
>>> see how Peter's requests [2] were considered. This same interface needs to
>>> accommodate usages apart from ABMC. For example, how to use this interface
>>> to address the same counter issue on AMD hardware without ABMC, and MPAM
>>> (pending James's feedback).
>>
>> Yea. Agree. Peter's comments are not addressed. I am not all clear
>> about details of Peters and James requirement.
> 
> Peter listed his requirements in [1]. That email thread is a worthwhile read
> for the use cases.
> 
> I believe that James is aware of this work and do hope to hear from him. 
> 
>>
>> With respect to ABMC here are my requirements.
>>
>> a.  Assignment needs to be done at group level.
>>
>> b. User should be able to assign each event individually. Multiple events assignment(in one command) should be supported.
>>
>> c. I have no plans to implement domain level assignment. It is done at system level.
>>
>> d. We need only couple of states.  Assigned and unassigned.
>>
>> e. monitor_state is name of file for user interface. We can change that based on comments.
>>
>> Peter, James,
>>
>> Please comment on what you want achieve in "assignment" based on the features you are working on.
>>
>> Do you want to add new states?
>>
>>>
>>> I understand that until we hear from Arm we do not know all the requirements
>>> that this interface needs to support, but I do expect this interface to
>>> at least consider requirements and usage scenarios that are already known.
>>
>> Sure. Will try that in the next version. Lets continue the discussion.
>>
>>
>>>> g. Read the event mbm_total_bytes and mbm_local_bytes. In ABMC mode,
>>>>     the MBA events are not available until the user assigns the events
>>>>     explicitly.
>>>>     
>>>>     #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>>>     Unsupported
>>>>     
>>>>     #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>     Unsupported
>>>>
>>> This needs some more thought to accommodate Peter's scenario where the counter
>>> can be expected to return the final count after the counter is disabled.
>>
>> I am not sure how to achieve this with ABMC. This may be applicable
>> to soft rmid only. In case of "soft rmid", previous readings are
>> saved in the soft rmid state.
> 
> Right. Please consider this work in two parts, first, there is a generic
> interface that aims to support ABMC, "soft RMID", and MPAM. Second, there
> is using this interface to support ABMC.

Yea. But it is tough without knowing all the details of the other features.

How about?

#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
ABMC:Unassigned
#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
ABMC:Unassigned

> 
>>>> h. The event llc_occupancy is not affected by ABMC mode. Users can still
>>>>     read the llc_occupancy.
>>>>
>>>>     #cat /sys/fs/resctrl/mon_data/mon_L3_00/llc_occupancy
>>>>     557056
>>>>
>>>> i. Now assign the total event and read the monitor_state.
>>>>     
>>>>     #echo total=assign > /sys/fs/resctrl/monitor_state
>>>>     #cat /sys/fs/resctrl/monitor_state
>>>>     total=assign;local=unassign
>>>>     
>>> I do not see the "global assign/unassign" scenario addressed.
>>
>> I am not all clear on meaning of "global assign/unassign".  Does it
>> mean looping thru all the groups and assign the RMIDs?
> 
> Please see [1].
> 
>  
>> It may not work in many cases. In case of ABMC, we have only limited
>> number of hw counters. It will fail after hardware runs out of
>> counters. It is better done selectively based on which group user is
>> interested in.
> 
> Right. This is one more item where the generic interface needs to
> accommodate different hardware implementations. Perhaps this could
> be one of the "features" exposed by (global) mbm_assign that the
> user can "enable"/"disable" on demand?
> 
>> But it can be done later if we find a use case for that.
> 
> There already exists a use case as presented by Peter in support
> of AMD hardware without ABMC, no? 

Yes. There is a use case. But seems like the use case is mostly applicable
to soft-rmid feature.

We can tie the global assign only to soft-rmid.

# echo SOFT-RMID:enable > /sys/fs/resctrl/info/L3_MON/mbm_assign

Because this is soft-rmid, call global assign method.

# echo ABMC:enable > /sys/fs/resctrl/info/L3_MON/mbm_assign

Because this is ABMC, do the steps required just to enable ABMC.
Don't do individual assignment

> 
>>> This version seems to ignore (without discussion) a lot of earlier
>>> feedback.
>>
>> Please feel free comment. There are various threads of discussion. I may have missed.
>>
> 
> 
> Reinette
> 
> [1] https://lore.kernel.org/lkml/CALPaoCiRD6j_Rp7ffew+PtGTF4rWDORwbuRQqH2i-cY5SvWQBg@mail.gmail.com/

-- 
Thanks
Babu Moger

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ