[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <58d214b6-9bd0-42dc-a00d-cc80bccc8fdd@amd.com>
Date: Wed, 25 Jun 2025 20:31:49 -0500
From: "Moger, Babu" <bmoger@....com>
To: Reinette Chatre <reinette.chatre@...el.com>,
Babu Moger <babu.moger@....com>, corbet@....net, tony.luck@...el.com,
Dave.Martin@....com, james.morse@....com, tglx@...utronix.de,
mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com
Cc: x86@...nel.org, hpa@...or.com, akpm@...ux-foundation.org,
rostedt@...dmis.org, paulmck@...nel.org, thuth@...hat.com, ardb@...nel.org,
gregkh@...uxfoundation.org, seanjc@...gle.com, thomas.lendacky@....com,
pawan.kumar.gupta@...ux.intel.com, manali.shukla@....com,
perry.yuan@....com, kai.huang@...el.com, peterz@...radead.org,
xiaoyao.li@...el.com, kan.liang@...ux.intel.com, mario.limonciello@....com,
xin3.li@...el.com, gautham.shenoy@....com, xin@...or.com,
chang.seok.bae@...el.com, fenghuay@...dia.com, peternewman@...gle.com,
maciej.wieczor-retman@...el.com, eranian@...gle.com,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v14 13/32] fs/resctrl: Introduce mbm_cntr_cfg to track
assignable counters per domain
Hi Reinette,
On 6/24/2025 6:31 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/13/25 2:04 PM, Babu Moger wrote:
>> The "mbm_event" mode allows users to assign a hardware counter ID to an
>
> "hardware counter ID" -> "hardware counter" (I'll stop pointing these out)
Sure.
>
>> RMID, event pair and monitor bandwidth usage as long as it is assigned.
>> The hardware continues to track the assigned counter until it is
>> explicitly unassigned by the user. Counters are assigned/unassigned at
>> monitoring domain level.
>>
>> Manage a monitoring domain's hardware counters using a per monitoring
>> domain array of struct mbm_cntr_cfg that is indexed by the hardware
>> counter ID. A hardware counter's configuration contains the MBM event
>> ID and points to the monitoring group that it is assigned to, with a
>> NULL pointer meaning that the hardware counter is available for assignment.
>>
>> There is no direct way to determine which hardware counters are assigned
>> to a particular monitoring group. Check every entry of every hardware
>> counter configuration array in every monitoring domain to query which
>> MBM events of a monitoring group is tracked by hardware. Such queries are
>> acceptable because of a very small number of assignable counters (32
>> to 64).
>>
>> Suggested-by: Peter Newman <peternewman@...gle.com>
>> Signed-off-by: Babu Moger <babu.moger@....com>
>> ---
> ...
>
>> ---
>> fs/resctrl/rdtgroup.c | 8 ++++++++
>> include/linux/resctrl.h | 19 +++++++++++++++++++
>> 2 files changed, 27 insertions(+)
>>
>> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
>> index 967e4df62a19..90b52593ef29 100644
>> --- a/fs/resctrl/rdtgroup.c
>> +++ b/fs/resctrl/rdtgroup.c
>> @@ -4084,6 +4084,7 @@ static void rdtgroup_setup_default(void)
>>
>> static void domain_destroy_mon_state(struct rdt_mon_domain *d)
>> {
>> + kfree(d->cntr_cfg);
>> bitmap_free(d->rmid_busy_llc);
>> for (int i = 0; i < QOS_NUM_L3_MBM_EVENTS; i++) {
>> kfree(d->mbm_states[i]);
>> @@ -4167,6 +4168,13 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain
>> goto cleanup;
>> }
>>
>> + if (resctrl_is_mbm_enabled() && r->mon.mbm_cntr_assignable) {
>> + tsize = sizeof(*d->cntr_cfg);
>> + d->cntr_cfg = kcalloc(r->mon.num_mbm_cntrs, tsize, GFP_KERNEL);
>> + if (!d->cntr_cfg)
>> + goto cleanup;
>> + }
>> +
>
> Please see my earlier comment https://lore.kernel.org/lkml/b761e6ec-a874-4d06-8437-a3a717a91abb@intel.com/
> Before this addition the "cleanup" goto label can only be called when
> (a) idx is guaranteed to be initialized and (b) d->mbm_states[idx] == NULL.
> Using that goto label in snippet above cannot guarantee either.
Yes. Tony took care of this.
cleanup:
bitmap_free(d->rmid_busy_llc);
for_each_mbm_idx(idx) {
kfree(d->mbm_states[idx]);
d->mbm_states[idx] = NULL;
}
return -ENOMEM;
}
>
>> return 0;
>> cleanup:
>> bitmap_free(d->rmid_busy_llc);
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index f078ef24a8ad..468a4ebabc64 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -156,6 +156,22 @@ struct rdt_ctrl_domain {
>> u32 *mbps_val;
>> };
>>
>> +/**
>> + * struct mbm_cntr_cfg - Assignable counter configuration.
>> + * @evtid: MBM event to which the counter is assigned. Only valid
>> + * if @rdtgroup is not NULL.
>> + * @evt_cfg: Event configuration created using the READS_TO_LOCAL_MEM,
>> + * READS_TO_REMOTE_MEM, etc. bits that represent the memory
>> + * transactions being counted.
>> + * @rdtgrp: resctrl group assigned to the counter. NULL if the
>> + * counter is free.
>> + */
>> +struct mbm_cntr_cfg {
>> + enum resctrl_event_id evtid;
>> + u32 evt_cfg;
>
> It is not clear to me why the event configuration needs to be duplicated
> between mbm_cntr_cfg::evt_cfg and mon_evt::evt_cfg (done in patch #16).
> I think there should be only one "source of truth" and mon_evt::evt_cfg
> seems most appropriate since then it can be shared with BMEC.
>
> It also seems unnecessary to make so many copies of the event configuration
> if it can just be determined from the event ID.
>
> Looking ahead at how this is used, for example in event_filter_write()
> introduced in patch #25:
> ret = resctrl_process_configs(buf, &evt_cfg);
> if (!ret && mevt->evt_cfg != evt_cfg) {
> mevt->evt_cfg = evt_cfg;
> resctrl_assign_cntr_allrdtgrp(r, mevt);
> }
>
> After user provides new event configuration the mon_evt::evt_cfg is
> updated. Since there is this initial check to determine if counters need
> to be updated I think it is unnecessary to have a second copy of mbm_cntr_cfg::evt_cfg
> that needs to be checked again. The functions called by resctrl_assign_cntr_allrdtgrp(r, mevt)
> should just update the counters without any additional comparison.
>
> For example, rdtgroup_assign_cntr() can be simplified to:
> rdtgroup_assign_cntr() {
> ...
> list_for_each_entry(d, &r->mon_domains, hdr.list) {
> cntr_id = mbm_cntr_get(r, d, rdtgrp, mevt->evtid);
> if (cntr_id >= 0)
> resctrl_arch_config_cntr(r, d, mevt->evtid, rdtgrp->mon.rmid,
> rdtgrp->closid, cntr_id, true);
> }
> }
>
>
Actually, this interaction works as intended.
It serves as an optimization for cases where the user repeatedly tries
to assign the same event to a group. Since we have no way of knowing
whether the event is up-to-date, this mechanism helps us avoid
unnecessary MSR writes.
For example:
mbm_L3_assignments_write() → resctrl_assign_cntr_event() →
resctrl_alloc_config_cntr() → resctrl_config_cntr() →
resctrl_arch_config_cntr()
resctrl_alloc_config_cntr()
{
..
/*
* Skip reconfiguration if the event setup is current; otherwise,
* update and apply the new configuration to the domain.
*/
if (mevt->evt_cfg != d->cntr_cfg[cntr_id].evt_cfg) {
d->cntr_cfg[cntr_id].evt_cfg = mevt->evt_cfg;
resctrl_config_cntr(r, d, mevt->evtid, rdtgrp->mon.rmid,
rdtgrp->closid, cntr_id, true);
}
}
Thanks
Babu
Powered by blists - more mailing lists