[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cdbc23f6-cf4b-4e83-a037-0aaf7c076e8c@intel.com>
Date: Mon, 17 Nov 2025 09:31:41 -0800
From: Reinette Chatre <reinette.chatre@...el.com>
To: "Luck, Tony" <tony.luck@...el.com>
CC: Fenghua Yu <fenghuay@...dia.com>, Maciej Wieczor-Retman
<maciej.wieczor-retman@...el.com>, Peter Newman <peternewman@...gle.com>,
James Morse <james.morse@....com>, Babu Moger <babu.moger@....com>, "Drew
Fustini" <dfustini@...libre.com>, Dave Martin <Dave.Martin@....com>, Chen Yu
<yu.c.chen@...el.com>, <x86@...nel.org>, <linux-kernel@...r.kernel.org>,
<patches@...ts.linux.dev>
Subject: Re: [PATCH v13 25/32] x86/resctrl: Handle number of RMIDs supported
by RDT_RESOURCE_PERF_PKG
Hi Tony,
On 11/17/25 8:37 AM, Luck, Tony wrote:
> On Fri, Nov 14, 2025 at 03:26:42PM -0800, Reinette Chatre wrote:
>> Hi Tony,
>>
>> On 11/14/25 1:55 PM, Luck, Tony wrote:
>>>
>>> resctrl: Feature energy guid=0x26696143 not enabled due to insufficient RMIDs
>>>
>>>
>>> static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
>>> {
>>> struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
>>> bool warn_disable = false;
>>>
>>> if (!group_has_usable_regions(e, p))
>>> return false;
>>>
>>> /* Disable feature if insufficient RMIDs */
>>> if (!all_regions_have_sufficient_rmid(e, p)) {
>>> warn_disable = true;
>>> rdt_set_feature_disabled(e->name);
>>> }
>>>
>>> /* User can override above disable from kernel command line */
>>> if (!rdt_is_feature_enabled(e->name)) {
>>> if (warn_disable)
>>> pr_info("Feature %s guid=0x%x not enabled due to insufficient RMIDs\n",
>>> e->name, e->guid);
>>> return false;
>>> }
>>> ...
>>> }
>>
>> Thank you for considering. This looks good to me.
>>
>> I now realize that if a system supports, for example, two energy guid and only one has insufficient
>> RMID then one or both may be disabled by default depending on which resctrl attempts to enable
>> first. This is arbitrary based on where the event group appears in the array.
>
> intel_pmt_get_regions_by_feature() does return arrays of telemetry_region
> with different guids today, but not currently for the "RMID" features.
> So this could be a problem in the future.
>
> I think I need to drop the "rdt=perf,!energy" command line control as
> being too coarse. Instead add a new boot argument. E.g.
>
> rdtguid=0x26696143,!0x26557651
>
> to give the user control per-guid instead of per-pmt_feature_id. Users
> can discover which guids are supported on a system by looking in
> /sys/bus/auxiliary/devices/intel_vsec.discovery.*/intel_pmt/features*/per_rmid*
> where there are "guids" and "num_rmids" files.
Should disable/enable be per RMID telemetry feature? I do not see anything preventing a system from
using the same guid for different RMID telemetry features.
I think it will be useful to look at how other kernel parameters distinguish different
categories of parameters so that resctrl can be consistent here. Looks like an underscore is
most useful and also flexible since it allows both a dash and underscore to be used.
Another alternative that is common in kernel parameters is to use ":". For example,
rdt=energy:0x26696143
With something like above user can, for example, use just "energy" to disable all RMID energy
telemetry or be specific to which guid should be disabled. This seems to fit well with existing
rdt parameters and be quite flexible.
>
>> How a system with two guid of the same feature type would work is not clear to me though. Looks
>> like they cannot share events at all since an event is uniquely associated with a struct pmt_event
>> that can belong to only one event group. If they may share events then enable_events()->resctrl_enable_mon_event()
>> will complain loudly but still proceed and allow the event group to be enabled.
>
> I can't see a good reason why the same event would be enabled under
> different guids present on the same system. We can revisit my assumption
> if the "Duplicate enable for event" message shows up.
This would be difficult to handle at that time, no? From what I can tell this would enable
an unusable event group to actually be enabled resulting in untested and invalid flows.
I think it will be safer to not enable an event group in this scenario and seems to math your
expectation that this would be unexpected. The "Duplicate enable for event" message will still
appear and we can still revisit those assumptions when they do, but the systems encountering
them will not be running with enabled event groups that are not actually fully enabled.
>
>> I think the resctrl_enable_mon_event() warnings were added to support enabling of new features
>> so that the WARNs can catch issues during development ... now it may encounter issues when a
>> kernel with this implementation is run on a system that supports a single feature with
>> multiple guid. Do you have more insight in how the "single feature with multiple guid" may look to
>> better prepare resctrl to handle them?
>>
>> Should "enable_events" be split so that a feature can be disabled for all its event groups if
>> any of them cannot be enabled due to insufficient RMIDs?
>> Perhaps resctrl_enable_mon_event() should also now return success/fail so that an event group
>> cannot be enabled if its events cannot be enabled?
>> Finally, a system with two guid of the same feature type will end up printing duplicate
>> "<feature type> monitoring detected" that could be more descriptive?
>
> I need to add the guid to that message.
Sounds good. Thank you.
Reinette
Powered by blists - more mailing lists