[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0b0d0627-07e4-4fe4-8de5-f6e5a5148dc7@amd.com>
Date: Mon, 2 Dec 2024 16:06:58 -0600
From: "Moger, Babu" <babu.moger@....com>
To: Reinette Chatre <reinette.chatre@...el.com>,
Peter Newman <peternewman@...gle.com>
Cc: corbet@....net, tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com, fenghua.yu@...el.com, x86@...nel.org,
hpa@...or.com, thuth@...hat.com, paulmck@...nel.org, rostedt@...dmis.org,
akpm@...ux-foundation.org, xiongwei.song@...driver.com,
pawan.kumar.gupta@...ux.intel.com, daniel.sneddon@...ux.intel.com,
perry.yuan@....com, sandipan.das@....com, kai.huang@...el.com,
xiaoyao.li@...el.com, seanjc@...gle.com, jithu.joseph@...el.com,
brijesh.singh@....com, xin3.li@...el.com, ebiggers@...gle.com,
andrew.cooper3@...rix.com, mario.limonciello@....com, james.morse@....com,
tan.shaopeng@...itsu.com, tony.luck@...el.com, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, maciej.wieczor-retman@...el.com,
eranian@...gle.com, jpoimboe@...nel.org, thomas.lendacky@....com
Subject: Re: [PATCH v9 14/26] x86/resctrl: Introduce interface to display
number of free counters
Hi Reinette,
On 12/2/24 15:47, Reinette Chatre wrote:
> Hi Babu,
>
> On 12/2/24 1:28 PM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 12/2/24 15:09, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 12/2/24 12:42 PM, Moger, Babu wrote:
>>>> Hi Reinette,
>>>>
>>>> On 12/2/24 14:15, Reinette Chatre wrote:
>>>>> Hi Babu,
>>>>>
>>>>> On 12/2/24 11:48 AM, Moger, Babu wrote:
>>>>>> On 12/2/24 12:33, Reinette Chatre wrote:
>>>>>>> On 11/29/24 9:06 AM, Moger, Babu wrote:
>>>>>>>> On 11/29/2024 3:59 AM, Peter Newman wrote:
>>>>>>>>> On Thu, Nov 28, 2024 at 8:35 PM Moger, Babu <bmoger@....com> wrote:
>>>>>>>>>> On 11/28/2024 5:10 AM, Peter Newman wrote:
>>>>>>>>>>> On Wed, Nov 27, 2024 at 8:05 PM Reinette Chatre
>>>>>>>>>>> <reinette.chatre@...el.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Babu,
>>>>>>>>>>>>
>>>>>>>>>>>> On 11/27/24 6:57 AM, Moger, Babu wrote:
>>>>>>>>>
>>>>>>>>>>>>> 1. Each group needs to remember counter ids in each domain for each event.
>>>>>>>>>>>>> For example:
>>>>>>>>>>>>> Resctrl group mon1
>>>>>>>>>>>>> Total event
>>>>>>>>>>>>> dom 0 cntr_id 1,
>>>>>>>>>>>>> dom 1 cntr_id 10
>>>>>>>>>>>>> dom 2 cntr_id 11
>>>>>>>>>>>>>
>>>>>>>>>>>>> Local event
>>>>>>>>>>>>> dom 0 cntr_id 2,
>>>>>>>>>>>>> dom 1 cntr_id 15
>>>>>>>>>>>>> dom 2 cntr_id 10
>>>>>>>>>>>>
>>>>>>>>>>>> Indeed. The challenge here is that domains may come and go so it cannot be a simple
>>>>>>>>>>>> static array. As an alternative it can be an xarray indexed by the domain ID with
>>>>>>>>>>>> pointers to a struct like below to contain the counters associated with the monitor
>>>>>>>>>>>> group:
>>>>>>>>>>>> struct cntr_id {
>>>>>>>>>>>> u32 mbm_total;
>>>>>>>>>>>> u32 mbm_local;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thinking more about how this array needs to be managed made me wonder how the
>>>>>>>>>>>> current implementation deals with domains that come and go. I do not think
>>>>>>>>>>>> this is currently handled. For example, if a new domain comes online and
>>>>>>>>>>>> monitoring groups had counters dynamically assigned, then these counters are
>>>>>>>>>>>> not configured to the newly online domain.
>>>>>>>>>>
>>>>>>>>>> I am trying to understand the details of your approach here.
>>>>>>>>>>>
>>>>>>>>>>> In my prototype, I allocated a counter id-indexed array to each
>>>>>>>>>>> monitoring domain structure for tracking the counter allocations,
>>>>>>>>>>> because the hardware counters are all domain-scoped. That way the
>>>>>>>>>>> tracking data goes away when the hardware does.
>>>>>>>>>>>
>>>>>>>>>>> I was focused on allowing all pending counter updates to a domain
>>>>>>>>>>> resulting from a single mbm_assign_control write to be batched and
>>>>>>>>>>> processed in a single IPI, so I structured the counter tracker
>>>>>>>>>>> something like this:
>>>>>>>>>>
>>>>>>>>>> Not sure what you meant here. How are you batching two IPIs for two domains?
>>>>>>>>>>
>>>>>>>>>> #echo "//0=t;1=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>>>>>>>>>
>>>>>>>>>> This is still a single write. Two IPIs are sent separately, one for each
>>>>>>>>>> domain.
>>>>>>>>>>
>>>>>>>>>> Are you doing something different?
>>>>>>>>>
>>>>>>>>> I said "all pending counter updates to a domain", whereby I meant
>>>>>>>>> targeting a single domain.
>>>>>>>>>
>>>>>>>>> Depending on the CPU of the caller, your example write requires 1 or 2 IPIs.
>>>>>>>>>
>>>>>>>>> What is important is that the following write also requires 1 or 2 IPIs:
>>>>>>>>>
>>>>>>>>> (assuming /sys/fs/resctrl/mon_groups/[g1-g31] exist, line breaks added
>>>>>>>>> for readability)
>>>>>>>>>
>>>>>>>>> echo $'//0=t;1=t\n
>>>>>>>>> /g1/0=t;1=t\n
>>>>>>>>> /g2/0=t;1=t\n
>>>>>>>>> /g3/0=t;1=t\n
>>>>>>>>> /g4/0=t;1=t\n
>>>>>>>>> /g5/0=t;1=t\n
>>>>>>>>> /g6/0=t;1=t\n
>>>>>>>>> /g7/0=t;1=t\n
>>>>>>>>> /g8/0=t;1=t\n
>>>>>>>>> /g9/0=t;1=t\n
>>>>>>>>> /g10/0=t;1=t\n
>>>>>>>>> /g11/0=t;1=t\n
>>>>>>>>> /g12/0=t;1=t\n
>>>>>>>>> /g13/0=t;1=t\n
>>>>>>>>> /g14/0=t;1=t\n
>>>>>>>>> /g15/0=t;1=t\n
>>>>>>>>> /g16/0=t;1=t\n
>>>>>>>>> /g17/0=t;1=t\n
>>>>>>>>> /g18/0=t;1=t\n
>>>>>>>>> /g19/0=t;1=t\n
>>>>>>>>> /g20/0=t;1=t\n
>>>>>>>>> /g21/0=t;1=t\n
>>>>>>>>> /g22/0=t;1=t\n
>>>>>>>>> /g23/0=t;1=t\n
>>>>>>>>> /g24/0=t;1=t\n
>>>>>>>>> /g25/0=t;1=t\n
>>>>>>>>> /g26/0=t;1=t\n
>>>>>>>>> /g27/0=t;1=t\n
>>>>>>>>> /g28/0=t;1=t\n
>>>>>>>>> /g29/0=t;1=t\n
>>>>>>>>> /g30/0=t;1=t\n
>>>>>>>>> /g31/0=t;1=t\n'
>>>>>>>>>
>>>>>>>>> My ultimate goal is for a thread bound to a particular domain to be
>>>>>>>>> able to unassign and reassign the local domain's 32 counters in a
>>>>>>>>> single write() with no IPIs at all. And when IPIs are required, then
>>>>>>>>> no more than one per domain, regardless of the number of groups
>>>>>>>>> updated.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Yes. I think I got the idea. Thanks.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> struct resctrl_monitor_cfg {
>>>>>>>>>>> int closid;
>>>>>>>>>>> int rmid;
>>>>>>>>>>> int evtid;
>>>>>>>>>>> bool dirty;
>>>>>>>>>>> };
>>>>>>>>>>>
>>>>>>>>>>> This mirrors the info needed in whatever register configures the
>>>>>>>>>>> counter, plus a dirty flag to skip over the ones that don't need to be
>>>>>>>>>>> updated.
>>>>>>>>>>
>>>>>>>>>> This is what my understanding of your implementation.
>>>>>>>>>>
>>>>>>>>>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>>>>>>>>>> index d94abba1c716..9cebf065cc97 100644
>>>>>>>>>> --- a/include/linux/resctrl.h
>>>>>>>>>> +++ b/include/linux/resctrl.h
>>>>>>>>>> @@ -94,6 +94,13 @@ struct rdt_ctrl_domain {
>>>>>>>>>> u32 *mbps_val;
>>>>>>>>>> };
>>>>>>>>>>
>>>>>>>>>> +struct resctrl_monitor_cfg {
>>>>>>>>>> + int closid;
>>>>>>>>>> + int rmid;
>>>>>>>>>> + int evtid;
>>>>>>>>>> + bool dirty;
>>>>>>>>>> +};
>>>>>>>>>> +
>>>>>>>>>> /**
>>>>>>>>>> * struct rdt_mon_domain - group of CPUs sharing a resctrl monitor
>>>>>>>>>> resource
>>>>>>>>>> * @hdr: common header for different domain types
>>>>>>>>>> @@ -116,6 +123,7 @@ struct rdt_mon_domain {
>>>>>>>>>> struct delayed_work cqm_limbo;
>>>>>>>>>> int mbm_work_cpu;
>>>>>>>>>> int cqm_work_cpu;
>>>>>>>>>> + /* Allocate num_mbm_cntrs entries in each domain */
>>>>>>>>>> + struct resctrl_monitor_cfg *mon_cfg;
>>>>>>>>>> };
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> When a user requests an assignment for total event to the default group
>>>>>>>>>> for domain 0, you go search in rdt_mon_domain(dom 0) for empty mon_cfg
>>>>>>>>>> entry.
>>>>>>>>>>
>>>>>>>>>> If there is an empty entry, then use that entry for assignment and
>>>>>>>>>> update closid, rmid, evtid and dirty = 1. We can get all these
>>>>>>>>>> information from default group here.
>>>>>>>>>>
>>>>>>>>>> Does this make sense?
>>>>>>>>>
>>>>>>>>> Yes, sounds correct.
>>>>>>>>
>>>>>>>> I will probably add cntr_id in resctrl_monitor_cfg structure and
>>>>>>>> initialize during the allocation. And rename the field 'dirty' to
>>>>>>>> 'active'(or something similar) to hold the assign state for that
>>>>>>>> entry. That way we have all the information required for assignment
>>>>>>>> at one place. We don't need to update the rdtgroup structure.
>>>>>>>>
>>>>>>>> Reinette, What do you think about this approach?
>>>>>>>
>>>>>>> I think this approach is in the right direction. Thanks to Peter for
>>>>>>> the guidance here.
>>>>>>> I do not think that it is necessary to add cntr_id to resctrl_monitor_cfg
>>>>>>> though, I think the cntr_id would be the index to the array instead?
>>>>>>
>>>>>> Yes. I think We can use the index as cntn_id. Will let you know otherwise.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> It may also be worthwhile to consider using a pointer to the resource
>>>>>>> group instead of storing closid and rmid directly. If used to indicate
>>>>>>> initialization then an initialized pointer is easier to distinguish than
>>>>>>> the closid/rmid that may have zero as valid values.
>>>>>>
>>>>>> Sure. Sounds good.
>>>>>>
>>>>>>>
>>>>>>> I expect evtid will be enum resctrl_event_id and that raises the question
>>>>>>> of whether "0" can indeed be used as an "uninitialized" value since doing
>>>>>>> so would change the meaning of the enum. It may indeed keep things
>>>>>>> separated by maintaining evtid as an enum resctrl_event_id and note the
>>>>>>> initialization differently ... either via a pointer to a resource group
>>>>>>> or entirely separately as Babu indicates later.
>>>>>>
>>>>>> Sure. Will add evtid as enum resctrl_event_id and use the "state" to
>>>>>> indicate assign/unassign/dirty status.
>>>>>
>>>>> Is "assign/unassign" state needed? If resctrl_monitor_cfg contains a pointer
>>>>> to the resource group to which the counter has been assigned then I expect NULL
>>>>> means unassigned and a value means assigned?
>>>>
>>>> Yes. We use the rdtgroup pointer to check the assign/unassign state.
>>>>
>>>> I will drop the 'state' field. Peter can add state when he wants use it
>>>> for optimization later.
>>>>
>>>> I think we need to have the 'cntr_id" field here in resctrl_monitor_cfg.
>>>> When we access the pointer from mbm_state, we wont know what is cntr_id
>>>> index it came from.
>>>>
>>>
>>> oh, good point. I wonder how Peter addressed this in his PoC. As an alternative,
>>> could the cntr_id be used in mbm_state instead of a pointer?
>>>
>>
>> Yes. It can be done.
>>
>> I thought it would be better to have everything at once place.
>>
>> struct resctrl_monitor_cfg {
>> unsigned int cntr_id;
>> enum resctrl_event_id evtid;
>> struct rdtgroup *rgtgrp;
>> };
>>
>> This will have everything required to assign/unassign the event.
>>
>
> The "everything in one place" argument is not clear to me since the cntr_id
> is indeed present already as the index to the array that stores this structure.
> Including the cntr_id seems redundant to me. This is similar to several
> other data structures in resctrl that are indexed either by closid or rmid,
> without needing to store closid/rmid in these data structures self.
>
Ok. That is fine. Will remove cntr_id index from resctrl_monitor_cfg.
Will add it in mbm_state. That should be good.
--
Thanks
Babu Moger
Powered by blists - more mailing lists