lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <369ab28a-f3fa-4359-8e73-4dcf214c9b6e@amd.com>
Date: Thu, 29 Feb 2024 14:37:07 -0600
From: "Moger, Babu" <babu.moger@....com>
To: Reinette Chatre <reinette.chatre@...el.com>,
 James Morse <james.morse@....com>, corbet@....net, fenghua.yu@...el.com,
 tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
 dave.hansen@...ux.intel.com, Peter Newman <peternewman@...gle.com>
Cc: x86@...nel.org, hpa@...or.com, paulmck@...nel.org, rdunlap@...radead.org,
 tj@...nel.org, peterz@...radead.org, yanjiewtw@...il.com,
 kim.phillips@....com, lukas.bulwahn@...il.com, seanjc@...gle.com,
 jmattson@...gle.com, leitao@...ian.org, jpoimboe@...nel.org,
 rick.p.edgecombe@...el.com, kirill.shutemov@...ux.intel.com,
 jithu.joseph@...el.com, kai.huang@...el.com, kan.liang@...ux.intel.com,
 daniel.sneddon@...ux.intel.com, pbonzini@...hat.com, sandipan.das@....com,
 ilpo.jarvinen@...ux.intel.com, peternewman@...gle.com,
 maciej.wieczor-retman@...el.com, linux-doc@...r.kernel.org,
 linux-kernel@...r.kernel.org, eranian@...gle.com
Subject: Re: [PATCH v2 00/17] x86/resctrl : Support AMD Assignable Bandwidth
 Monitoring Counters (ABMC)

Hi Reinette,

On 2/28/24 14:04, Reinette Chatre wrote:
> Hi Babu,
> 
> On 2/28/2024 9:59 AM, Moger, Babu wrote:
>> On 2/27/24 17:50, Reinette Chatre wrote:
>>> On 2/27/2024 10:12 AM, Moger, Babu wrote:
>>>> On 2/26/24 15:20, Reinette Chatre wrote:
>>>>> On 2/26/2024 9:59 AM, Moger, Babu wrote:
>>>>>> On 2/23/24 16:21, Reinette Chatre wrote:
>>>
> 
>>>>> For example, if I understand correctly, theoretically, when ABMC is enabled then
>>>>> "num_rmids" can be U32_MAX (after a quick look it is not clear to me why r->num_rmid
>>>>> is not unsigned, tbd if number of directories may also be limited by kernfs).
>>>>> User space could theoretically create more monitor groups than the number of
>>>>> rmids that a resource claims to support using current upstream enumeration.
>>>>
>>>> CPU or task association still uses PQR_ASSOC(MSR C8Fh). There are only 11
>>>> bits(depends on specific h/w) to represent RMIDs. So, we cannot create
>>>> more than this limit(r->num_rmid).
>>>>
>>>> In case of ABMC, h/w uses another counter(mbm_assignable_counters) with
>>>> RMID to assign the monitoring. So, assignment limit is
>>>> mbm_assignable_counters. The number of mon groups limit is still r->num_rmid.
>>>
>>> I see. Thank you for clarifying. This does make enabling simpler and one
>>> less user interface item that needs changing.
>>>
>>> ...
>>>
>>>>>> 2. /sys/fs/resctrl/monitor_state.
>>>>>> This can used to individually assign or unassign the counters in each group.
>>>>>>
>>>>>> When assigned:
>>>>>> #cat /sys/fs/resctrl/monitor_state
>>>>>> 0=total-assign,local-assign;1=total-assign,local-assign
>>>>>>
>>>>>> When unassigned:
>>>>>> #cat /sys/fs/resctrl/monitor_state
>>>>>> 0=total-unassign,local-unassign;1=total-unassign,local-unassign
>>>>>>
>>>>>>
>>>>>> Thoughts?
>>>>>
>>>>> How do you expect this interface to be used? I understand the mechanics
>>>>> of this interface but on a higher level, do you expect user space to
>>>>> once in a while assign a new counter to a single event or monitor group
>>>>> (for which a fine grained interface works) or do you expect user space to
>>>>> shift multiple counters across several monitor events at intervals?
>>>>
>>>> I think we should provide both the options. I was thinking of providing
>>>> fine grained interface first.
>>>
>>> Could you please provide a motivation for why two interfaces, one inefficient
>>> and one not, should be created and maintained? Users can still do fine grained
>>> assignment with a global assignment interface.
>>
>> Lets consider one by one.
>>
>> 1. Fine grained assignment.
>>
>> It will be part of the mongroup(or control mongroup). User has the access
>> to the group and can query the group's current status before assigning or
>> unassigning.
>>
>>    $cd /sys/fs/resctrl/ctrl_mon1
>>    $cat /sys/fs/resctrl/ctrl_mon1/monitor_state
>>        0=total-unassign,local-unassign;1=total-unassign,local-unassign;
>>
>> Assign the total event
>>
>>   $echo 0=total-assign > /sys/fs/resctrl/ctrl_mon1/monitor_state
>>
>> Assign the local event
>>
>>    $echo 0=local-assign > /sys/fs/resctrl/ctrl_mon1/monitor_state
>>
>> Assign both events:
>>
>>    $echo 0=total-assign,local-assign > /sys/fs/resctrl/ctrl_mon1/monitor_state
>>
>> Check the assignment status.
>>
>>    $cat /sys/fs/resctrl/ctrl_mon1/monitor_state
>>        0=total-assign,local-assign;1=total-unassign,local-unassign;
>>
>> -User interface is simple.
> 
> This should not be the only motivation. Please do not sacrifice efficiency
> and usability just to have a simple interface. One can also argue that this
> interface can only be considered simple from the kernel implementation perspective,
> from user space it seems complicated. For example, as James pointed out earlier [1],
> user space would need to walk the entire resctrl to find out where counters are
> assigned. Peter also pointed out how the multiple syscalls needed when adjusting
> hundreds of monitor groups is inefficient. Please take all feedback into account.
> 
> You consider "simple interface" as a motivation, there seems to be at least two
> arguments against this interface. Please consider these in your comparison
> between interfaces. These are things that should be noted and make their way to
> the cover letter.
> 
>>
>> -Assignment will fail if all the h/w counters are exhausted. User needs to
>> unassign a counter from another group and use that counter here. This can
>> be done just querying the monitor state of another group.
> 
> Right ... and as you state there can be hundreds of monitor groups that
> user space would need to walk and query to get this information.
> 
>>
>> -Monitor group's details(cpus, tasks) are part of the group. So, it is
>> better to have assignment state inside the group.
> 
> The assignment state should be clear from the event file.
> 
>> Note: Used interface names here just to give example.
>>
>>
>> 2. global assignment:
>>
>> I would assume the interface file will be in /sys/fs/resctrl/info/L3_MON/
>> directory.
>>
>> In case there are 100 mongroups, we need to have a way to list current
>> assignment status for these groups. I am not sure how to list status of
>> these 100 groups.
> 
> The kernel has many examples of interfaces that manages status of a large
> number of entities. I am thinking, for example, we can learn a lot from
> how dynamic debug works. On my system I see:
> 
> $ wc -l /sys/kernel/debug/dynamic_debug/control
> 5359 /sys/kernel/debug/dynamic_debug/control
> 
>>
>> If user is wants to assign the local event(or total) in a specific group
>> in this list of 100 groups, I am not sure how to provide interface for
>> that. Should we pass the name of mongroup? That will involve looping
>> through using the call kernfs_walk_and_get. This may be ok if we are
>> dealing with very small number of groups.
>>
> 
> What is your concern when needing to modify a large number of groups?
> Are you concerned about the size of the writes needing to be parsed? It looks
> like kernfs does support writes of larger than PAGE_SIZE, but it is not clear
> to me that such large sizes will be required.   
> 
> There is also kernfs_find_and_get() that may be more convenient to use.

Will look at this. There is also kernfs_name and kernfs_path.

> I believe user space needs to provide control group name for a global
> interface (the same name can be used by monitor groups belonging to
> different control groups), and that can be used to narrow search.
> 
> Reading your message I do not find any motivation _against_ a global
> interface, except that it is not obvious to you how such interface may look
> or work. That is fair. Peter seems to have ideas and a working implementation
> that can be used as reference. So far I have only seen one comment [2] from James
> that was skeptical about the global interface but the reason notes that MPAM
> allocates counters per domain, which is the same as ABMC so we will need more
> information from James here on what is required since he did not respond to
> Peter.
> 
> Below is a *hypothetical* interface to start a discussion that explores how
> to support fine grained assignment in an interface that aims to be easy to use
> by user space. Obviously Peter is also working on something so there
> are many viewpoints to consider.
> 
> File info/L3_MON/mbm_assign_control:
> #control_group/mon_group/flags
> ctrl_a/mon_a/00=_;01=_
> ctrl_a/mon_b/00=l;01=t
> ctrl_b/mon_c/00=lt;01=lt

I think you left few things here(Like the default control_mon group).

To make more clear, let me list all the groups here based this.

When none of the counters assigned:

$cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
resctrl/00=none,none;01=none,none (#default control_mon group)
resctrl/mon_a/00=none,none;01=none,none (#mon group)
resctrl/ctrl_a/00=none,none;01=none,none (#control_mon group)
resctrl/ctrl_a/mon_ab/00=none,none;01=none,none (#mon group)


When some counters are assigned:

$echo "resctrl/00=total,local" >
/sys/fs/resctrl/info/L3_MON/mbm_assign_control (#assigning counter to
default group)

$echo "resctrl/mon_a/00=total;01=total" >
/sys/fs/resctrl/info/L3_MON/mbm_assign_control (#assigning counter to mon
group)

$echo "resctrl/ctrl_a/00=local;01=local" >
/sys/fs/resctrl/info/L3_MON/mbm_assign_control

$echo "resctrl/ctrl_a/mon_ab/00=total,local;01=total,local" >
/sys/fs/resctrl/nfo/L3_MON/mbm_assign_control

$cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
resctrl/00=total,local;01=none,none (#default control_mon group)
resctrl/mon_a/00=total,none;01=total,none (#mon group)
resctrl/ctrl_a/00=none,local;01=none,local (#control_mon group)
resctrl/ctrl_a/mon_ab/00=total,local;01=total,local (#mon group)


Few comments about this approach:
1.This will involve lots of text processing in the kernel. Will need to
figure out calls for these processing.

2.In this approach there is no way to list assignment of a single
group(like group resctrl/ctrl_a/mon_ab alone).

3. This is similar to fine grained approach we discussed but in global level.

Want to get Pater/James comments about this approach.

> 
> Above file displays to user:
> * No counters are assigned to monitor group mon_a within control group ctrl_a
> * Counter for local MBM is assigned to domain 0 of monitor group mon_b within
>   control group ctrl_a 
> * Counter for total MBM is assigned to domain 1 of monitor group mon_b within
>   control group ctrl_a 
> * Counters for local and total MBM are assigned to both domains of monitor
>   group mon_c within control group ctrl_b
> 
> With above interface user space can, with a single read, get insight into
> how counters are assigned across all monitor groups.
> User space can write to the file to modify the flags. If assigning a new
> counter when no more counters are available then the write will fail.
> Potentially, if changes are made in order provided by the user then
> the user will be able to unassign counters from one group and re-assign to
> another group with a single write.
> 
> I provide this purely to generate some ideas and gather more thoughts on
> a global interface.
> 
> Reinette
> 
> [1] https://lore.kernel.org/lkml/2f373abf-f0c0-4f5d-9e22-1039a40a57f0@arm.com/
> [2] https://lore.kernel.org/lkml/1a8c1cd6-a1ce-47a2-bc87-d4cccc84519b@arm.com/
> 
> 
> 
> 
> 

-- 
Thanks
Babu Moger

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ