[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a307936b-c85d-4c88-839e-740c52d96b8d@intel.com>
Date: Fri, 3 Oct 2025 17:06:00 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Tony Luck <tony.luck@...el.com>, Fenghua Yu <fenghuay@...dia.com>, "Maciej
Wieczor-Retman" <maciej.wieczor-retman@...el.com>, Peter Newman
<peternewman@...gle.com>, James Morse <james.morse@....com>, Babu Moger
<babu.moger@....com>, Drew Fustini <dfustini@...libre.com>, Dave Martin
<Dave.Martin@....com>, Chen Yu <yu.c.chen@...el.com>
CC: <x86@...nel.org>, <linux-kernel@...r.kernel.org>,
<patches@...ts.linux.dev>
Subject: Re: [PATCH v11 23/31] x86/resctrl: Handle number of RMIDs supported
by telemetry resources
Hi Tony,
(nit in subject ... "resources" -> "resource" ... with caveat that
the term "telemetry resource" is not used much at all in this series)
On 9/25/25 1:03 PM, Tony Luck wrote:
> There are now three meanings for "number of RMIDs":
>
> 1) The number for legacy features enumerated by CPUID leaf 0xF. This
> is the maximum number of distinct values that can be loaded into the
> IA32_PQR_ASSOC MSR. Note that systems with Sub-NUMA Cluster mode enabled
"the IA32_PQR_ASSOC MSR" -> "MSR_IA32_PQR_ASSOC"
> will force scaling down the CPUID enumerated value by the number of SNC
> nodes per L3-cache.
>
> 2) The number of registers in MMIO space for each event. This
> is enumerated in the XML files and is the value initialized into
> event_group::num_rmids.
>
> 3) The number of "hardware counters" (this isn't a strictly accurate
> description of how things work, but serves as a useful analogy that
> does describe the limitations) feeding to those MMIO registers. This
> is enumerated in telemetry_region::num_rmids returned from the call to
> intel_pmt_get_regions_by_feature()
>
> Event groups with insufficient "hardware counters" to track all RMIDs
> are difficult for users to use, since the system may reassign "hardware
> counters" at any time. This means that users cannot reliably collect
> two consecutive event counts to compute the rate at which events are
> occurring.
>
> Introduce rdt_set_feature_disabled() to mark any under-resourced event
> groups (those with telemetry_region::num_rmids < event_group::num_rmids)
Would it be more accurate to say
"(those with telemetry_region::num_rmids < event_group::num_rmids for any
of the event group's telemetry regions)"
> as unusable. Note that the rdt_options[] structure must now be writable
> at run-time. The request to disable will be overridden if the user
"Override the request ..."?
> explicitly requests to enable using the "rdt=" Linux boot argument.
> This will result in the available number of monitoring resource groups
> being limited by the under-resourced event groups.
needs imperative ... how about something like (for text starting with "The
request to disable ..."):
Limit an event group's number of possible monitor resource groups
to the lowest number of "hardware counters" if the user explicitly
requests to enable an under-resourced event group.
...
> @@ -156,21 +168,57 @@ static void mark_telem_region_unusable(struct telemetry_region *tr)
> tr->addr = NULL;
> }
>
> +static bool all_regions_have_sufficient_rmid(struct event_group *e, struct pmt_feature_group *p)
> +{
> + struct telemetry_region *tr;
> +
> + for (int i = 0; i < p->count; i++) {
> + tr = &p->regions[i];
> + if (skip_telem_region(tr, e))
> + continue;
> +
> + if (tr->num_rmids < e->num_rmids)
> + return false;
> + }
> +
> + return true;
> +}
> +
> static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
> {
> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
> bool usable_events = false;
>
> + /* Disable feature if insufficient RMIDs */
> + if (!all_regions_have_sufficient_rmid(e, p))
> + rdt_set_feature_disabled(e->name);
> +
> + /* User can override above disable from kernel command line */
> + if (!rdt_is_feature_enabled(e->name))
> + return false;
> +
> for (int i = 0; i < p->count; i++) {
> if (skip_telem_region(&p->regions[i], e)) {
> mark_telem_region_unusable(&p->regions[i]);
> continue;
> }
It is unexpected to me that skip_telem_region() needs to be run twice with
second time marking regions as unusable. I think it will be simpler to just run
skip_telem_region() once to determine which telemetry regions are unusable, mark them as
such at that time, and from that point forward just interact with the usable telemetry
regions?
> +
> + /*
> + * e->num_rmids only adjusted lower if user forces an unusable
> + * region to be usable
In this function usable/unusable regions have a distinct meaning that is different
from what this comment intends since insufficient rmid does not make a region
"unusable" per skip_telem_region(). Perhaps something like:
e->num_rmids only adjusted lower if user (via rdt= kernel parameter) forces
an event group with insufficient RMID to be enabled.
Reinette
Powered by blists - more mailing lists