[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ee83f3aa-9d89-4520-98bf-78f371d214b0@intel.com>
Date: Tue, 2 Dec 2025 14:24:46 -0800
From: Reinette Chatre <reinette.chatre@...el.com>
To: "Luck, Tony" <tony.luck@...el.com>
CC: Fenghua Yu <fenghuay@...dia.com>, Maciej Wieczor-Retman
<maciej.wieczor-retman@...el.com>, Peter Newman <peternewman@...gle.com>,
James Morse <james.morse@....com>, Babu Moger <babu.moger@....com>, "Drew
Fustini" <dfustini@...libre.com>, Dave Martin <Dave.Martin@....com>, Chen Yu
<yu.c.chen@...el.com>, <x86@...nel.org>, <linux-kernel@...r.kernel.org>,
<patches@...ts.linux.dev>
Subject: Re: [PATCH v14 07/32] x86,fs/resctrl: Use struct rdt_domain_hdr when
reading counters
Hi Tony,
On 12/2/25 12:33 PM, Luck, Tony wrote:
> On Tue, Dec 02, 2025 at 08:06:47AM -0800, Reinette Chatre wrote:
>> Hi Tony,
>>> +static int __l3_mon_event_count_sum(struct rdtgroup *rdtgrp, struct rmid_read *rr)
>>> +{
>>> + int cpu = smp_processor_id();
>>> + u32 closid = rdtgrp->closid;
>>> + u32 rmid = rdtgrp->mon.rmid;
>>> + struct rdt_mon_domain *d;
>>> + int cntr_id = -ENOENT;
>>> + u64 tval = 0;
>>> + int err, ret;
>>>
>>> /* Summing domains that share a cache, must be on a CPU for that cache. */
>>> if (!cpumask_test_cpu(cpu, &rr->ci->shared_cpu_map))
>>> @@ -480,7 +494,7 @@ static int __l3_mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
>>> err = resctrl_arch_cntr_read(rr->r, d, closid, rmid, cntr_id,
>>> rr->evtid, &tval);
>>
>> This is not safe. The current __mon_event_count() implementation being refactored by this series
>> ensures that if rr->is_mbm_cntr is true then cntr_id is valid. This patch places the code doing so
>> in __l3_mon_event_count() without an equivalent in the new __l3_mon_event_count_sum(). From what I
>> can tell, since __l3_mon_event_count_sum() sets cntr_id to -ENOENT and never initializes it correctly,
>> resctrl_arch_cntr_read() will be called with an invalid cntr_id that it is not able to handle.
>>
>> There is no overlap in support for SNC and assignable counters. Do you expect that this is something that
>> should be supported? Even if it is, SNC is model specific so it may be reasonable to expect that when/if
>> a system supporting both features arrives it would need enabling anyway. I thus propose for simplicity
>> that the handling of assignable counters by __l3_mon_event_count_sum() be dropped, albeit with a loud
>> complaint if it is ever called with rr->is_mbm_cntr set.
>>
>
> Reinette,
>
> Agreed. I see little liklihood that SNC and assignable counters will
> meet on a system.
>
> How does this look for the "loud complaint":
>
>
> static int __l3_mon_event_count_sum(struct rdtgroup *rdtgrp, struct rmid_read *rr)
> {
> int cpu = smp_processor_id();
> u32 closid = rdtgrp->closid;
> u32 rmid = rdtgrp->mon.rmid;
> struct rdt_mon_domain *d;
> u64 tval = 0;
> int err, ret;
>
> /*
> * Summing across domains is only done for systems that implement
> * Sub-NUMA Cluster. There is no overlap with systems that support
> * assignable counters.
> */
> if (rr->is_mbm_cntr) {
> pr_warn_once("Assignable counter on SNC system!\n");
> rr->err = -EINVAL;
> return -EINVAL;
> }
Thank you. On a high level this looks good to me. I am concerned about the architecture
"SNC" term creeping deeper into fs code when it (the fs) aims to generalize it as
"summing domains". Mentioning SNC in the comment may be useful though since it
helps to understand what is going on by being specific.
As a nit could the user space message not refer to SNC though? How about something like:
"Summing domains using assignable counters is not supported."
>
> /* Summing domains that share a cache, must be on a CPU for that cache. */
> if (!cpumask_test_cpu(cpu, &rr->ci->shared_cpu_map))
> return -EINVAL;
>
> /*
> * Legacy files must report the sum of an event across all
> * domains that share the same L3 cache instance.
> * Report success if a read from any domain succeeds, -EINVAL
> * (translated to "Unavailable" for user space) if reading from
> * all domains fail for any reason.
> */
> ret = -EINVAL;
> list_for_each_entry(d, &rr->r->mon_domains, hdr.list) {
> if (d->ci_id != rr->ci->id)
> continue;
> err = resctrl_arch_rmid_read(rr->r, &d->hdr, closid, rmid,
> rr->evtid, &tval, rr->arch_mon_ctx);
> if (!err) {
> rr->val += tval;
> ret = 0;
> }
> }
>
> if (ret)
> rr->err = ret;
>
> return ret;
> }
Reinette
Powered by blists - more mailing lists