linux-kernel - Re: [PATCH v9 14/26] x86/resctrl: Introduce interface to display number of free counters

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <10a197e4-bfa8-4d42-a229-b755378781b0@intel.com>
Date: Mon, 2 Dec 2024 13:47:41 -0800
From: Reinette Chatre <reinette.chatre@...el.com>
To: <babu.moger@....com>, Peter Newman <peternewman@...gle.com>
CC: <corbet@....net>, <tglx@...utronix.de>, <mingo@...hat.com>,
	<bp@...en8.de>, <dave.hansen@...ux.intel.com>, <fenghua.yu@...el.com>,
	<x86@...nel.org>, <hpa@...or.com>, <thuth@...hat.com>, <paulmck@...nel.org>,
	<rostedt@...dmis.org>, <akpm@...ux-foundation.org>,
	<xiongwei.song@...driver.com>, <pawan.kumar.gupta@...ux.intel.com>,
	<daniel.sneddon@...ux.intel.com>, <perry.yuan@....com>,
	<sandipan.das@....com>, <kai.huang@...el.com>, <xiaoyao.li@...el.com>,
	<seanjc@...gle.com>, <jithu.joseph@...el.com>, <brijesh.singh@....com>,
	<xin3.li@...el.com>, <ebiggers@...gle.com>, <andrew.cooper3@...rix.com>,
	<mario.limonciello@....com>, <james.morse@....com>,
	<tan.shaopeng@...itsu.com>, <tony.luck@...el.com>,
	<linux-doc@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	<maciej.wieczor-retman@...el.com>, <eranian@...gle.com>,
	<jpoimboe@...nel.org>, <thomas.lendacky@....com>
Subject: Re: [PATCH v9 14/26] x86/resctrl: Introduce interface to display
 number of free counters

Hi Babu,

On 12/2/24 1:28 PM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 12/2/24 15:09, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 12/2/24 12:42 PM, Moger, Babu wrote:
>>> Hi Reinette,
>>>
>>> On 12/2/24 14:15, Reinette Chatre wrote:
>>>> Hi Babu,
>>>>
>>>> On 12/2/24 11:48 AM, Moger, Babu wrote:
>>>>> On 12/2/24 12:33, Reinette Chatre wrote:
>>>>>> On 11/29/24 9:06 AM, Moger, Babu wrote:
>>>>>>> On 11/29/2024 3:59 AM, Peter Newman wrote:
>>>>>>>> On Thu, Nov 28, 2024 at 8:35 PM Moger, Babu <bmoger@....com> wrote:
>>>>>>>>> On 11/28/2024 5:10 AM, Peter Newman wrote:
>>>>>>>>>> On Wed, Nov 27, 2024 at 8:05 PM Reinette Chatre
>>>>>>>>>> <reinette.chatre@...el.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Babu,
>>>>>>>>>>>
>>>>>>>>>>> On 11/27/24 6:57 AM, Moger, Babu wrote:
>>>>>>>>
>>>>>>>>>>>> 1. Each group needs to remember counter ids in each domain for each event.
>>>>>>>>>>>>      For example:
>>>>>>>>>>>>      Resctrl group mon1
>>>>>>>>>>>>       Total event
>>>>>>>>>>>>       dom 0 cntr_id 1,
>>>>>>>>>>>>       dom 1 cntr_id 10
>>>>>>>>>>>>       dom 2 cntr_id 11
>>>>>>>>>>>>
>>>>>>>>>>>>      Local event
>>>>>>>>>>>>       dom 0 cntr_id 2,
>>>>>>>>>>>>       dom 1 cntr_id 15
>>>>>>>>>>>>       dom 2 cntr_id 10
>>>>>>>>>>>
>>>>>>>>>>> Indeed. The challenge here is that domains may come and go so it cannot be a simple
>>>>>>>>>>> static array. As an alternative it can be an xarray indexed by the domain ID with
>>>>>>>>>>> pointers to a struct like below to contain the counters associated with the monitor
>>>>>>>>>>> group:
>>>>>>>>>>>           struct cntr_id {
>>>>>>>>>>>                   u32     mbm_total;
>>>>>>>>>>>                   u32     mbm_local;
>>>>>>>>>>>           }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thinking more about how this array needs to be managed made me wonder how the
>>>>>>>>>>> current implementation deals with domains that come and go. I do not think
>>>>>>>>>>> this is currently handled. For example, if a new domain comes online and
>>>>>>>>>>> monitoring groups had counters dynamically assigned, then these counters are
>>>>>>>>>>> not configured to the newly online domain.
>>>>>>>>>
>>>>>>>>> I am trying to understand the details of your approach here.
>>>>>>>>>>
>>>>>>>>>> In my prototype, I allocated a counter id-indexed array to each
>>>>>>>>>> monitoring domain structure for tracking the counter allocations,
>>>>>>>>>> because the hardware counters are all domain-scoped. That way the
>>>>>>>>>> tracking data goes away when the hardware does.
>>>>>>>>>>
>>>>>>>>>> I was focused on allowing all pending counter updates to a domain
>>>>>>>>>> resulting from a single mbm_assign_control write to be batched and
>>>>>>>>>> processed in a single IPI, so I structured the counter tracker
>>>>>>>>>> something like this:
>>>>>>>>>
>>>>>>>>> Not sure what you meant here. How are you batching two IPIs for two domains?
>>>>>>>>>
>>>>>>>>> #echo "//0=t;1=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>
>>>>>>>>> This is still a single write. Two IPIs are sent separately, one for each
>>>>>>>>> domain.
>>>>>>>>>
>>>>>>>>> Are you doing something different?
>>>>>>>>
>>>>>>>> I said "all pending counter updates to a domain", whereby I meant
>>>>>>>> targeting a single domain.
>>>>>>>>
>>>>>>>> Depending on the CPU of the caller, your example write requires 1 or 2 IPIs.
>>>>>>>>
>>>>>>>> What is important is that the following write also requires 1 or 2 IPIs:
>>>>>>>>
>>>>>>>> (assuming /sys/fs/resctrl/mon_groups/[g1-g31] exist, line breaks added
>>>>>>>> for readability)
>>>>>>>>
>>>>>>>> echo $'//0=t;1=t\n
>>>>>>>> /g1/0=t;1=t\n
>>>>>>>> /g2/0=t;1=t\n
>>>>>>>> /g3/0=t;1=t\n
>>>>>>>> /g4/0=t;1=t\n
>>>>>>>> /g5/0=t;1=t\n
>>>>>>>> /g6/0=t;1=t\n
>>>>>>>> /g7/0=t;1=t\n
>>>>>>>> /g8/0=t;1=t\n
>>>>>>>> /g9/0=t;1=t\n
>>>>>>>> /g10/0=t;1=t\n
>>>>>>>> /g11/0=t;1=t\n
>>>>>>>> /g12/0=t;1=t\n
>>>>>>>> /g13/0=t;1=t\n
>>>>>>>> /g14/0=t;1=t\n
>>>>>>>> /g15/0=t;1=t\n
>>>>>>>> /g16/0=t;1=t\n
>>>>>>>> /g17/0=t;1=t\n
>>>>>>>> /g18/0=t;1=t\n
>>>>>>>> /g19/0=t;1=t\n
>>>>>>>> /g20/0=t;1=t\n
>>>>>>>> /g21/0=t;1=t\n
>>>>>>>> /g22/0=t;1=t\n
>>>>>>>> /g23/0=t;1=t\n
>>>>>>>> /g24/0=t;1=t\n
>>>>>>>> /g25/0=t;1=t\n
>>>>>>>> /g26/0=t;1=t\n
>>>>>>>> /g27/0=t;1=t\n
>>>>>>>> /g28/0=t;1=t\n
>>>>>>>> /g29/0=t;1=t\n
>>>>>>>> /g30/0=t;1=t\n
>>>>>>>> /g31/0=t;1=t\n'
>>>>>>>>
>>>>>>>> My ultimate goal is for a thread bound to a particular domain to be
>>>>>>>> able to unassign and reassign the local domain's 32 counters in a
>>>>>>>> single write() with no IPIs at all. And when IPIs are required, then
>>>>>>>> no more than one per domain, regardless of the number of groups
>>>>>>>> updated.
>>>>>>>>
>>>>>>>
>>>>>>> Yes. I think I got the idea. Thanks.
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> struct resctrl_monitor_cfg {
>>>>>>>>>>       int closid;
>>>>>>>>>>       int rmid;
>>>>>>>>>>       int evtid;
>>>>>>>>>>       bool dirty;
>>>>>>>>>> };
>>>>>>>>>>
>>>>>>>>>> This mirrors the info needed in whatever register configures the
>>>>>>>>>> counter, plus a dirty flag to skip over the ones that don't need to be
>>>>>>>>>> updated.
>>>>>>>>>
>>>>>>>>> This is what my understanding of your implementation.
>>>>>>>>>
>>>>>>>>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>>>>>>>>> index d94abba1c716..9cebf065cc97 100644
>>>>>>>>> --- a/include/linux/resctrl.h
>>>>>>>>> +++ b/include/linux/resctrl.h
>>>>>>>>> @@ -94,6 +94,13 @@ struct rdt_ctrl_domain {
>>>>>>>>>           u32                             *mbps_val;
>>>>>>>>>    };
>>>>>>>>>
>>>>>>>>> +struct resctrl_monitor_cfg {
>>>>>>>>> +    int closid;
>>>>>>>>> +    int rmid;
>>>>>>>>> +    int evtid;
>>>>>>>>> +    bool dirty;
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>>    /**
>>>>>>>>>     * struct rdt_mon_domain - group of CPUs sharing a resctrl monitor
>>>>>>>>> resource
>>>>>>>>>     * @hdr:               common header for different domain types
>>>>>>>>> @@ -116,6 +123,7 @@ struct rdt_mon_domain {
>>>>>>>>>           struct delayed_work             cqm_limbo;
>>>>>>>>>           int                             mbm_work_cpu;
>>>>>>>>>           int                             cqm_work_cpu;
>>>>>>>>> +     /* Allocate num_mbm_cntrs entries in each domain */
>>>>>>>>> +       struct resctrl_monitor_cfg      *mon_cfg;
>>>>>>>>>    };
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> When a user requests an assignment for total event to the default group
>>>>>>>>> for domain 0, you go search in rdt_mon_domain(dom 0) for empty mon_cfg
>>>>>>>>> entry.
>>>>>>>>>
>>>>>>>>> If there is an empty entry, then use that entry for assignment and
>>>>>>>>> update closid, rmid, evtid and dirty = 1. We can get all these
>>>>>>>>> information from default group here.
>>>>>>>>>
>>>>>>>>> Does this make sense?
>>>>>>>>
>>>>>>>> Yes, sounds correct.
>>>>>>>
>>>>>>> I will probably add cntr_id in resctrl_monitor_cfg structure and
>>>>>>> initialize during the allocation. And rename the field 'dirty' to
>>>>>>> 'active'(or something similar) to hold the assign state for that
>>>>>>> entry. That way we have all the information required for assignment
>>>>>>> at one place. We don't need to update the rdtgroup structure.
>>>>>>>
>>>>>>> Reinette, What do you think about this approach?
>>>>>>
>>>>>> I think this approach is in the right direction. Thanks to Peter for
>>>>>> the guidance here.
>>>>>> I do not think that it is necessary to add cntr_id to resctrl_monitor_cfg
>>>>>> though, I think the cntr_id would be the index to the array instead?
>>>>>
>>>>> Yes. I think We can use the index as cntn_id. Will let you know otherwise.
>>>>>
>>>>>
>>>>>>
>>>>>> It may also be worthwhile to consider using a pointer to the resource
>>>>>> group instead of storing closid and rmid directly. If used to indicate
>>>>>> initialization then an initialized pointer is easier to distinguish than
>>>>>> the closid/rmid that may have zero as valid values.
>>>>>
>>>>> Sure. Sounds good.
>>>>>
>>>>>>
>>>>>> I expect evtid will be enum resctrl_event_id and that raises the question
>>>>>> of whether "0" can indeed be used as an "uninitialized" value since doing
>>>>>> so would change the meaning of the enum. It may indeed keep things
>>>>>> separated by maintaining evtid as an enum resctrl_event_id and note the
>>>>>> initialization differently ... either via a pointer to a resource group
>>>>>> or entirely separately as Babu indicates later.
>>>>>
>>>>> Sure. Will add evtid as enum resctrl_event_id and use the "state" to
>>>>> indicate assign/unassign/dirty status.
>>>>
>>>> Is "assign/unassign" state needed? If resctrl_monitor_cfg contains a pointer
>>>> to the resource group to which the counter has been assigned then I expect NULL
>>>> means unassigned and a value means assigned?
>>>
>>> Yes. We use the rdtgroup pointer to check the assign/unassign state.
>>>
>>> I will drop the 'state' field. Peter can add state when he wants use it
>>> for optimization later.
>>>
>>> I think we need to have the 'cntr_id" field here in resctrl_monitor_cfg.
>>> When we access the pointer from mbm_state, we wont know what is cntr_id
>>> index it came from.
>>>
>>
>> oh, good point. I wonder how Peter addressed this in his PoC. As an alternative,
>> could the cntr_id be used in mbm_state instead of a pointer? 
>>
> 
> Yes. It can be done.
> 
> I thought it would be better to have everything at once place.
> 
> struct resctrl_monitor_cfg {
>   unsigned int            cntr_id;
>   enum resctrl_event_id   evtid;
>   struct rdtgroup         *rgtgrp;
> };
> 
> This will have everything required to assign/unassign the event.
> 

The "everything in one place" argument is not clear to me since the cntr_id
is indeed present already as the index to the array that stores this structure.
Including the cntr_id seems redundant to me. This is similar to several
other data structures in resctrl that are indexed either by closid or rmid,
without needing to store closid/rmid in these data structures self.

Reinette