linux-kernel - Re: [PATCH v13 25/32] x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e466dab3-be67-494e-bd73-41652108151c@intel.com>
Date: Tue, 18 Nov 2025 10:11:17 -0800
From: Reinette Chatre <reinette.chatre@...el.com>
To: "Luck, Tony" <tony.luck@...el.com>
CC: Fenghua Yu <fenghuay@...dia.com>, Maciej Wieczor-Retman
	<maciej.wieczor-retman@...el.com>, Peter Newman <peternewman@...gle.com>,
	James Morse <james.morse@....com>, Babu Moger <babu.moger@....com>, "Drew
 Fustini" <dfustini@...libre.com>, Dave Martin <Dave.Martin@....com>, Chen Yu
	<yu.c.chen@...el.com>, <x86@...nel.org>, <linux-kernel@...r.kernel.org>,
	<patches@...ts.linux.dev>
Subject: Re: [PATCH v13 25/32] x86/resctrl: Handle number of RMIDs supported
 by RDT_RESOURCE_PERF_PKG

Hi Tony,

On 11/18/25 9:35 AM, Luck, Tony wrote:
> On Tue, Nov 18, 2025 at 08:48:18AM -0800, Reinette Chatre wrote:
>> Hi Tony,
>>
>> On 11/17/25 10:52 AM, Luck, Tony wrote:
>>> On Mon, Nov 17, 2025 at 09:31:41AM -0800, Reinette Chatre wrote:
>>>> On 11/17/25 8:37 AM, Luck, Tony wrote:
>>>>> On Fri, Nov 14, 2025 at 03:26:42PM -0800, Reinette Chatre wrote:
>>>>>> On 11/14/25 1:55 PM, Luck, Tony wrote:
>>>>>> How a system with two guid of the same feature type would work is not clear to me though. Looks
>>>>>> like they cannot share events at all since an event is uniquely associated with a struct pmt_event
>>>>>> that can belong to only one event group. If they may share events then enable_events()->resctrl_enable_mon_event()
>>>>>> will complain loudly but still proceed and allow the event group to be enabled.
>>>>>
>>>>> I can't see a good reason why the same event would be enabled under
>>>>> different guids present on the same system. We can revisit my assumption
>>>>> if the "Duplicate enable for event" message shows up.
>>>>
>>>> This would be difficult to handle at that time, no? From what I can tell this would enable
>>>> an unusable event group to actually be enabled resulting in untested and invalid flows.
>>>> I think it will be safer to not enable an event group in this scenario and seems to math your
>>>> expectation that this would be unexpected. The "Duplicate enable for event" message will still
>>>> appear and we can still revisit those assumptions when they do, but the systems encountering
>>>> them will not be running with enabled event groups that are not actually fully enabled.
>>>
>>> There's a hardware cost to including an event in an aggregator.
>>> Inclusing the same event in mutliple aggregators described by
>>> different guids is really something that should never happen.
>>> Just printing a warning and skipping the event seems an adequate
>>> defense.
>>
>> My concern is that after skipping the event there is a deceiving message that the event group was
>> enabled successfully.
> 
> I can change resctrl_enable_mon_event() to return a "bool" to say
> whether each event was successfully enabled.
> 
> Then change to:
> 
> 	int skipped_events = 0;
> 
> 	for (int j = 0; j < e->num_events; j++) {
> 		if (!resctrl_enable_mon_event(e->evts[j].id, true,
> 					      e->evts[j].bin_bits, &e->evts[j]))
> 			skipped_events++;
> 	}
> 
> 	if (e->num_events == skipped_events) {
> 		pr_info("No events enabled in %s %s:0x%x\n", r->name, e->name, e->guid);
> 		return false;
> 	}
> 
> 	if (skipped_events)
> 		pr_info("%s %s:0x%x monitoring detected (skipped %d events)\n", r->name,
> 			r->name, e->name, e->guid, skipped_events);
> 	else
> 		pr_info("%s %s:0x%x monitoring detected\n", r->name, e->name, e->guid);

This looks good to me. Thank you. I am not able to tell from this snippet but since enabling of
an event group can fail at this point I think the r->mon.num_rmid initialization should
now be moved later so that a failing event group will not impact the number of RMIDs the
resource can use.

Reinette