[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a1bb0fda-22ed-4510-b89f-73d5aa07110f@intel.com>
Date: Wed, 4 Jun 2025 16:40:51 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: "Luck, Tony" <tony.luck@...el.com>
CC: Fenghua Yu <fenghuay@...dia.com>, Maciej Wieczor-Retman
<maciej.wieczor-retman@...el.com>, Peter Newman <peternewman@...gle.com>,
James Morse <james.morse@....com>, Babu Moger <babu.moger@....com>, "Drew
Fustini" <dfustini@...libre.com>, Dave Martin <Dave.Martin@....com>, "Anil
Keshavamurthy" <anil.s.keshavamurthy@...el.com>, Chen Yu
<yu.c.chen@...el.com>, <x86@...nel.org>, <linux-kernel@...r.kernel.org>,
<patches@...ts.linux.dev>
Subject: Re: [PATCH v5 06/29] x86,fs/resctrl: Improve domain type checking
Hi Tony,
On 6/4/25 3:58 PM, Luck, Tony wrote:
> On Tue, Jun 03, 2025 at 08:31:07PM -0700, Reinette Chatre wrote:
>> Hi Tony,
>>
>> On 5/21/25 3:50 PM, Tony Luck wrote:
>>> The rdt_domain_hdr structure is used in both control and monitor
>>> domain structures to provide common methods for operations such as
>>> adding a CPU to a domain, removing a CPU from a domain, accessing
>>> the mask of all CPUs in a domain.
>>>
>>> The "type" field provides a simple check whether a domain is a
>>> control or monitor domain so that programming errors operating
>>> on domains will be quickly caught.
>>>
>>> To prepare for additional domain types that depend on the rdt_resource
>>> to which they are connected add the resource id into the header
>>> and check that in addition to the type.
>>>
>>> Signed-off-by: Tony Luck <tony.luck@...el.com>
>>> ---
>>> include/linux/resctrl.h | 9 +++++++++
>>> arch/x86/kernel/cpu/resctrl/core.c | 10 ++++++----
>>> fs/resctrl/ctrlmondata.c | 2 +-
>>> 3 files changed, 16 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>>> index 40f2d0d48d02..d6b09952ef92 100644
>>> --- a/include/linux/resctrl.h
>>> +++ b/include/linux/resctrl.h
>>> @@ -131,15 +131,24 @@ enum resctrl_domain_type {
>>> * @list: all instances of this resource
>>> * @id: unique id for this instance
>>> * @type: type of this instance
>>> + * @rid: index of resource for this domain
>>> * @cpu_mask: which CPUs share this resource
>>> */
>>> struct rdt_domain_hdr {
>>> struct list_head list;
>>> int id;
>>> enum resctrl_domain_type type;
>>> + enum resctrl_res_level rid;
>>> struct cpumask cpu_mask;
>>> };
>>>
>>> +static inline bool domain_header_is_valid(struct rdt_domain_hdr *hdr,
>>> + enum resctrl_domain_type type,
>>> + enum resctrl_res_level rid)
>>> +{
>>> + return !WARN_ON_ONCE(hdr->type != type || hdr->rid != rid);
>>> +}
>>> +
>>> /**
>>> * struct rdt_ctrl_domain - group of CPUs sharing a resctrl control resource
>>> * @hdr: common header for different domain types
>>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>>> index 4403a820db12..4983f6f81218 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>>> @@ -456,7 +456,7 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
>>>
>>> hdr = resctrl_find_domain(&r->ctrl_domains, id, &add_pos);
>>> if (hdr) {
>>> - if (WARN_ON_ONCE(hdr->type != RESCTRL_CTRL_DOMAIN))
>>> + if (!domain_header_is_valid(hdr, RESCTRL_CTRL_DOMAIN, r->rid))
>>> return;
>
> This type check was added as part of the split of the rdt_domain
> structure into sepaarte ctrl and mon structures. I think the concern
> was that some code might look at the wrong rdt_resource list and
> try to operate on a ctrl domain structure that is actually a mon
> structure (or vice versa). This felt like a real possibility.
>
> Extending this to save and check the resource id seemed like a
> natural extension at the time. But I'm starting to doubt the value
> of doing so.
>
> For this new check to ever fail we would have to somehow add
> a domain for some resource type to a list on a different
> rdt_resource structure. I'm struggling to see how such an
I disagree with this statement. I do not see the failure as related
to the list to which the domain belongs but instead related to how
functions interpret a domain passed to it. There are a couple of functions
that are provided a domain pointer and the function is hardcoded to expect
the domain pointed to to be of a specific type.
For example, rdtgroup_mondata_show() is hardcoded to work with an
L3 resource domain. If my suggestion here is followed then
rdtgroup_mondata_show() would contain the specific check below because
it interprets the domain as belonging to a L3 resource:
domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3)
With the check used as above the current issue would be exposed.
> error could ever occur. Domains are only added to an rdt_resource
> list by one of domain_add_cpu_ctrl() or domain_add_cpu_mon().
> But these same functions are the ones to allocate the domain
> structure and initialize the "d->hdr.id" field a dozen or so
> lines earlier in the function.
>
> Note that I'm not disputing your comments where my patches
> are still passing a rdt_l3_mon_domain structure down through
> several levels of function calls only to do:
>
> if (r->rid == RDT_RESOURCE_PERF_PKG)
> return intel_aet_read_event(d->hdr.id, rmid, eventid, val);
>
> revealing that it wasn't an rdt_l3_mon_domain at all!
>
> But these domain_header_is_valid() checks didn't help uncover
> that.
This is not because of the check itself but how it is used in this version
... it essentially gave the check the wrong value to check against.
> Bottom line: I'd like to just keep the "type" check and not
> extend to check the resource id.
Pointers to domains of different types are passed around (irrespective
of the list they belong to) but required to be of particular type
when acted on. The way I see it this check is required if this
design continues. If used correctly in this implementation it will help
to expose those places where L3 domain specific functions are used as
"generic" to operate on PERF_PKG domains.
Reinette
Powered by blists - more mailing lists