[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aS9NEJAnx0MWoyaT@agluck-desk3>
Date: Tue, 2 Dec 2025 12:33:20 -0800
From: "Luck, Tony" <tony.luck@...el.com>
To: Reinette Chatre <reinette.chatre@...el.com>
CC: Fenghua Yu <fenghuay@...dia.com>, Maciej Wieczor-Retman
<maciej.wieczor-retman@...el.com>, Peter Newman <peternewman@...gle.com>,
James Morse <james.morse@....com>, Babu Moger <babu.moger@....com>, "Drew
Fustini" <dfustini@...libre.com>, Dave Martin <Dave.Martin@....com>, Chen Yu
<yu.c.chen@...el.com>, <x86@...nel.org>, <linux-kernel@...r.kernel.org>,
<patches@...ts.linux.dev>
Subject: Re: [PATCH v14 07/32] x86,fs/resctrl: Use struct rdt_domain_hdr when
reading counters
On Tue, Dec 02, 2025 at 08:06:47AM -0800, Reinette Chatre wrote:
> Hi Tony,
> > +static int __l3_mon_event_count_sum(struct rdtgroup *rdtgrp, struct rmid_read *rr)
> > +{
> > + int cpu = smp_processor_id();
> > + u32 closid = rdtgrp->closid;
> > + u32 rmid = rdtgrp->mon.rmid;
> > + struct rdt_mon_domain *d;
> > + int cntr_id = -ENOENT;
> > + u64 tval = 0;
> > + int err, ret;
> >
> > /* Summing domains that share a cache, must be on a CPU for that cache. */
> > if (!cpumask_test_cpu(cpu, &rr->ci->shared_cpu_map))
> > @@ -480,7 +494,7 @@ static int __l3_mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
> > err = resctrl_arch_cntr_read(rr->r, d, closid, rmid, cntr_id,
> > rr->evtid, &tval);
>
> This is not safe. The current __mon_event_count() implementation being refactored by this series
> ensures that if rr->is_mbm_cntr is true then cntr_id is valid. This patch places the code doing so
> in __l3_mon_event_count() without an equivalent in the new __l3_mon_event_count_sum(). From what I
> can tell, since __l3_mon_event_count_sum() sets cntr_id to -ENOENT and never initializes it correctly,
> resctrl_arch_cntr_read() will be called with an invalid cntr_id that it is not able to handle.
>
> There is no overlap in support for SNC and assignable counters. Do you expect that this is something that
> should be supported? Even if it is, SNC is model specific so it may be reasonable to expect that when/if
> a system supporting both features arrives it would need enabling anyway. I thus propose for simplicity
> that the handling of assignable counters by __l3_mon_event_count_sum() be dropped, albeit with a loud
> complaint if it is ever called with rr->is_mbm_cntr set.
>
Reinette,
Agreed. I see little liklihood that SNC and assignable counters will
meet on a system.
How does this look for the "loud complaint":
static int __l3_mon_event_count_sum(struct rdtgroup *rdtgrp, struct rmid_read *rr)
{
int cpu = smp_processor_id();
u32 closid = rdtgrp->closid;
u32 rmid = rdtgrp->mon.rmid;
struct rdt_mon_domain *d;
u64 tval = 0;
int err, ret;
/*
* Summing across domains is only done for systems that implement
* Sub-NUMA Cluster. There is no overlap with systems that support
* assignable counters.
*/
if (rr->is_mbm_cntr) {
pr_warn_once("Assignable counter on SNC system!\n");
rr->err = -EINVAL;
return -EINVAL;
}
/* Summing domains that share a cache, must be on a CPU for that cache. */
if (!cpumask_test_cpu(cpu, &rr->ci->shared_cpu_map))
return -EINVAL;
/*
* Legacy files must report the sum of an event across all
* domains that share the same L3 cache instance.
* Report success if a read from any domain succeeds, -EINVAL
* (translated to "Unavailable" for user space) if reading from
* all domains fail for any reason.
*/
ret = -EINVAL;
list_for_each_entry(d, &rr->r->mon_domains, hdr.list) {
if (d->ci_id != rr->ci->id)
continue;
err = resctrl_arch_rmid_read(rr->r, &d->hdr, closid, rmid,
rr->evtid, &tval, rr->arch_mon_ctx);
if (!err) {
rr->val += tval;
ret = 0;
}
}
if (ret)
rr->err = ret;
return ret;
}
-Tony
Powered by blists - more mailing lists