[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a307936b-c85d-4c88-839e-740c52d96b8d@intel.com>
Date: Fri, 3 Oct 2025 17:06:00 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Tony Luck <tony.luck@...el.com>, Fenghua Yu <fenghuay@...dia.com>, "Maciej
 Wieczor-Retman" <maciej.wieczor-retman@...el.com>, Peter Newman
	<peternewman@...gle.com>, James Morse <james.morse@....com>, Babu Moger
	<babu.moger@....com>, Drew Fustini <dfustini@...libre.com>, Dave Martin
	<Dave.Martin@....com>, Chen Yu <yu.c.chen@...el.com>
CC: <x86@...nel.org>, <linux-kernel@...r.kernel.org>,
	<patches@...ts.linux.dev>
Subject: Re: [PATCH v11 23/31] x86/resctrl: Handle number of RMIDs supported
 by telemetry resources
Hi Tony,
(nit in subject ... "resources" -> "resource" ... with caveat that
the term "telemetry resource" is not used much at all in this series)
On 9/25/25 1:03 PM, Tony Luck wrote:
> There are now three meanings for "number of RMIDs":
> 
> 1) The number for legacy features enumerated by CPUID leaf 0xF. This
> is the maximum number of distinct values that can be loaded into the
> IA32_PQR_ASSOC MSR. Note that systems with Sub-NUMA Cluster mode enabled
"the IA32_PQR_ASSOC MSR" -> "MSR_IA32_PQR_ASSOC"
> will force scaling down the CPUID enumerated value by the number of SNC
> nodes per L3-cache.
> 
> 2) The number of registers in MMIO space for each event. This
> is enumerated in the XML files and is the value initialized into
> event_group::num_rmids.
> 
> 3) The number of "hardware counters" (this isn't a strictly accurate
> description of how things work, but serves as a useful analogy that
> does describe the limitations) feeding to those MMIO registers. This
> is enumerated in telemetry_region::num_rmids returned from the call to
> intel_pmt_get_regions_by_feature()
> 
> Event groups with insufficient "hardware counters" to track all RMIDs
> are difficult for users to use, since the system may reassign "hardware
> counters" at any time. This means that users cannot reliably collect
> two consecutive event counts to compute the rate at which events are
> occurring.
> 
> Introduce rdt_set_feature_disabled() to mark any under-resourced event
> groups (those with telemetry_region::num_rmids < event_group::num_rmids)
Would it be more accurate to say
"(those with telemetry_region::num_rmids < event_group::num_rmids for any
  of the event group's telemetry regions)"
> as unusable.  Note that the rdt_options[] structure must now be writable
> at run-time.  The request to disable will be overridden if the user
"Override the request ..."?
> explicitly requests to enable using the "rdt=" Linux boot argument.
> This will result in the available number of monitoring resource groups
> being limited by the under-resourced event groups.
needs imperative ... how about something like (for text starting with "The
request to disable ..."):
	Limit an event group's number of possible monitor resource groups
	to the lowest number of "hardware counters" if the user explicitly
	requests to enable an under-resourced event group.
...
> @@ -156,21 +168,57 @@ static void mark_telem_region_unusable(struct telemetry_region *tr)
>  	tr->addr = NULL;
>  }
>  
> +static bool all_regions_have_sufficient_rmid(struct event_group *e, struct pmt_feature_group *p)
> +{
> +	struct telemetry_region *tr;
> +
> +	for (int i = 0; i < p->count; i++) {
> +		tr = &p->regions[i];
> +		if (skip_telem_region(tr, e))
> +			continue;
> +
> +		if (tr->num_rmids < e->num_rmids)
> +			return false;
> +	}
> +
> +	return true;
> +}
> +
>  static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
>  {
> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
>  	bool usable_events = false;
>  
> +	/* Disable feature if insufficient RMIDs */
> +	if (!all_regions_have_sufficient_rmid(e, p))
> +		rdt_set_feature_disabled(e->name);
> +
> +	/* User can override above disable from kernel command line */
> +	if (!rdt_is_feature_enabled(e->name))
> +		return false;
> +
>  	for (int i = 0; i < p->count; i++) {
>  		if (skip_telem_region(&p->regions[i], e)) {
>  			mark_telem_region_unusable(&p->regions[i]);
>  			continue;
>  		}
It is unexpected to me that skip_telem_region() needs to be run twice with
second time marking regions as unusable. I think it will be simpler to just run
skip_telem_region() once to determine which telemetry regions are unusable, mark them as
such at that time, and from that point forward just interact with the usable telemetry
regions?
> +
> +		/*
> +		 * e->num_rmids only adjusted lower if user forces an unusable
> +		 * region to be usable
In this function usable/unusable regions have a distinct meaning that is different
from what this comment intends since insufficient rmid does not make a region
"unusable" per skip_telem_region(). Perhaps something like:
	e->num_rmids only adjusted lower if user (via rdt= kernel parameter) forces
	an event group with insufficient RMID to be enabled.
Reinette
Powered by blists - more mailing lists
 
