[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e787e4e7-d3ee-46f0-8be1-9ccc1e45a7b1@amd.com>
Date: Thu, 26 Sep 2024 11:56:06 -0500
From: "Moger, Babu" <babu.moger@....com>
To: Reinette Chatre <reinette.chatre@...el.com>, corbet@....net,
fenghua.yu@...el.com, tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com
Cc: x86@...nel.org, hpa@...or.com, paulmck@...nel.org, rdunlap@...radead.org,
tj@...nel.org, peterz@...radead.org, yanjiewtw@...il.com,
kim.phillips@....com, lukas.bulwahn@...il.com, seanjc@...gle.com,
jmattson@...gle.com, leitao@...ian.org, jpoimboe@...nel.org,
rick.p.edgecombe@...el.com, kirill.shutemov@...ux.intel.com,
jithu.joseph@...el.com, kai.huang@...el.com, kan.liang@...ux.intel.com,
daniel.sneddon@...ux.intel.com, pbonzini@...hat.com, sandipan.das@....com,
ilpo.jarvinen@...ux.intel.com, peternewman@...gle.com,
maciej.wieczor-retman@...el.com, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, eranian@...gle.com, james.morse@....com
Subject: Re: [PATCH v7 17/24] x86/resctrl: Add the interface to unassign a MBM
counter
Hi Reinette,
On 9/19/24 12:26, Reinette Chatre wrote:
> Hi Babu,
>
> On 9/4/24 3:21 PM, Babu Moger wrote:
>> The mbm_cntr_assign mode provides a limited number of hardware counters
>> that can be assigned to an RMID-event pair to monitor bandwidth while
>> assigned. If all counters are in use, the kernel will show an error
>> message: "Out of MBM assignable counters" when a new assignment is
>> requested. To make space for a new assignment, users must unassign an
>> already assigned counter.
>>
>> Introduce an interface that allows for the unassignment of counter IDs
>> from both the group and the domain. Additionally, ensure that the global
>> counter is released if it is no longer assigned to any domains.
>>
>> Signed-off-by: Babu Moger <babu.moger@....com>
>> ---
>> v7: Merged rdtgroup_unassign_cntr and rdtgroup_free_cntr functions.
>> Renamed rdtgroup_mbm_cntr_test() to rdtgroup_mbm_cntr_is_assigned().
>> Reworded the commit log little bit.
>>
>> v6: Removed mbm_cntr_free from this patch.
>> Added counter test in all the domains and free if it is not assigned to
>> any domains.
>>
>> v5: Few name changes to match cntr_id.
>> Changed the function names to rdtgroup_unassign_cntr
>> More comments on commit log.
>>
>> v4: Added domain specific unassign feature.
>> Few name changes.
>>
>> v3: Removed the static from the prototype of rdtgroup_unassign_abmc.
>> The function is not called directly from user anymore. These
>> changes are related to global assignment interface.
>>
>> v2: No changes.
>> ---
>> arch/x86/kernel/cpu/resctrl/internal.h | 2 ++
>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 49 ++++++++++++++++++++++++++
>> 2 files changed, 51 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 6a90fc20be5b..9a65a13ccbe9 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -707,6 +707,8 @@ int resctrl_arch_assign_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>> u32 cntr_id, bool assign);
>> int rdtgroup_assign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>> struct rdt_mon_domain *d, enum resctrl_event_id evtid);
>> +int rdtgroup_unassign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>> + struct rdt_mon_domain *d, enum resctrl_event_id evtid);
>> void rdt_staged_configs_clear(void);
>> bool closid_allocated(unsigned int closid);
>> int resctrl_find_cleanest_closid(void);
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 1d45120ff2b5..21b9ca4ce493 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1944,6 +1944,55 @@ int rdtgroup_assign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>> return 0;
>> }
>>
>> +static int rdtgroup_mbm_cntr_is_assigned(struct rdt_resource *r, u32 cntr_id)
>
> Should this return bool?
Sure.
>
> With function prefix of "rdtgroup" I would expect that an rdtgroup would be one of its
> parameters but that is not the case ... this is nothing to do with a rdtgroup.
> Maybe something like "mbm_cntr_assigned_to_domain()"?
Sure.
>
>> +{
>> + struct rdt_mon_domain *d;
>> +
>> + list_for_each_entry(d, &r->mon_domains, hdr.list)
>
> Based on function name it is unexpected that it checks the global bitmap and not the
> domain lists. The function really needs a more appropriate name to reflect what it
> actually does.
ok. The name mbm_cntr_assigned_to_domain() should be fine now.
>
>> + if (test_bit(cntr_id, d->mbm_cntr_map))
>> + return 1;
>> +
>> + return 0;
>> +}
>> +
>> +/*
>> + * Unassign a hardware counter from the domain and the group. Global
>> + * counter will be freed once it is unassigned from all the domains.
>
> Could this also get a similar comment as partner function about special
> meaning of NULL domain?
Sure.
>
>> + */
>> +int rdtgroup_unassign_cntr(struct rdt_resource *r, struct rdtgroup *rdtgrp,
>> + struct rdt_mon_domain *d,
>> + enum resctrl_event_id evtid)
>> +{
>> + int index = MBM_EVENT_ARRAY_INDEX(evtid);
>> + int cntr_id = rdtgrp->mon.cntr_id[index];
>> +
>> + if (cntr_id != MON_CNTR_UNSET) {
>
> Function can exit early after the MON_CNTR_UNSET check to reduce level of
> indentation in rest of function.
Sure.
>
>> + if (!d) {
>> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> + resctrl_arch_assign_cntr(r, d, evtid,
>> + rdtgrp->mon.rmid,
>> + rdtgrp->closid,
>> + cntr_id, false);
>> + clear_bit(cntr_id, d->mbm_cntr_map);
>> + }
>> + } else {
>> + resctrl_arch_assign_cntr(r, d, evtid,
>> + rdtgrp->mon.rmid,
>> + rdtgrp->closid,
>> + cntr_id, false);
>> + clear_bit(cntr_id, d->mbm_cntr_map);
>> + }
>> +
>> + /* Update the counter bitmap */
>> + if (!rdtgroup_mbm_cntr_is_assigned(r, cntr_id)) {
>> + mbm_cntr_free(r, cntr_id);
>> + rdtgrp->mon.cntr_id[index] = MON_CNTR_UNSET;
>> + }
>> + }
>> +
>> + return 0;
>
> This function is called many times and there are always paths adding complexity
> to handle error from this function ... yet it always returns 0. I expect that it should
> actually do error checking of the arch callback that could actually fail on other archs, that
> should impact this function's return value and make the need for error handling apparent.
Sure. Will do it.
>
>> +}
>> +
>> /* rdtgroup information files for one cache resource. */
>> static struct rftype res_common_files[] = {
>> {
>
>
> Reinette
>
--
Thanks
Babu Moger
Powered by blists - more mailing lists