lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c25144f8-f6e5-407c-a6a8-f382beaabb50@intel.com>
Date: Thu, 23 Oct 2025 10:48:56 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Tony Luck <tony.luck@...el.com>, Fenghua Yu <fenghuay@...dia.com>, "Maciej
 Wieczor-Retman" <maciej.wieczor-retman@...el.com>, Peter Newman
	<peternewman@...gle.com>, James Morse <james.morse@....com>, Babu Moger
	<babu.moger@....com>, Drew Fustini <dfustini@...libre.com>, Dave Martin
	<Dave.Martin@....com>, Chen Yu <yu.c.chen@...el.com>
CC: <x86@...nel.org>, <linux-kernel@...r.kernel.org>,
	<patches@...ts.linux.dev>
Subject: Re: [PATCH v12 24/31] x86/resctrl: Handle number of RMIDs supported
 by RDT_RESOURCE_PERF_PKG

Hi Tony,

On 10/13/25 3:33 PM, Tony Luck wrote:
> There are now three meanings for "number of RMIDs":
> 
> 1) The number for legacy features enumerated by CPUID leaf 0xF. This
> is the maximum number of distinct values that can be loaded into
> MSR_IA32_PQR_ASSOC. Note that systems with Sub-NUMA Cluster mode enabled
> will force scaling down the CPUID enumerated value by the number of SNC
> nodes per L3-cache.
> 
> 2) The number of registers in MMIO space for each event. This
> is enumerated in the XML files and is the value initialized into
> event_group::num_rmids.
> 
> 3) The number of "hardware counters" (this isn't a strictly accurate
> description of how things work, but serves as a useful analogy that
> does describe the limitations) feeding to those MMIO registers. This
> is enumerated in telemetry_region::num_rmids returned from the call to
> intel_pmt_get_regions_by_feature()
> 
> Event groups with insufficient "hardware counters" to track all RMIDs
> are difficult for users to use, since the system may reassign "hardware
> counters" at any time. This means that users cannot reliably collect
> two consecutive event counts to compute the rate at which events are
> occurring.
> 
> Introduce rdt_set_feature_disabled() to mark any under-resourced event groups
> (those with telemetry_region::num_rmids < event_group::num_rmids  for any of
> the event group's telemetry regions) as unusable.  Note that the rdt_options[]
> structure must now be writable at run-time.
> 
> Limit an event group's number of possible monitor resource groups
> to the lowest number of "hardware counters" if the user explicitly
> requests to enable an under-resourced event group.

How about:
	Limit an under-resourced event group's number of possible monitor
	resource groups to the lowest number of "hardware counters" if the
	user explicitly requests to enable it.


> Scan all enabled event groups and assign the RDT_RESOURCE_PERF_PKG
> resource "num_rmids" value to the smallest of these values as this value
> will be used later to compare against the number of RMIDs supported
> by other resources to determine how many monitoring resource groups
> are supported.
> 
> N.B. Change type of rdt_resource::num_rmid to u32 to match type of
> event_group::num_rmids so that min(r->num_rmid, e->num_rmids) won't
> complain about mixing signed and unsigned types.  Print r->num_rmid as
> unsigned value in rdt_num_rmids_show().

"Print r->num_rmid ..." can be dropped since that is clear from the patch.

> 
> Signed-off-by: Tony Luck <tony.luck@...el.com>
> ---

...

> @@ -156,21 +168,59 @@ static bool skip_telem_region(struct telemetry_region *tr, struct event_group *e
>  	return false;
>  }
>  
> +/* Side effect: Detects unusable regions and marks them as unusable */

:/

> +static bool all_regions_have_sufficient_rmid(struct event_group *e, struct pmt_feature_group *p)
> +{
> +	struct telemetry_region *tr;
> +	bool ret = true;
> +
> +	for (int i = 0; i < p->count; i++) {
> +		tr = &p->regions[i];
> +		if (skip_telem_region(tr, e)) {
> +			mark_telem_region_unusable(tr);
> +			continue;
> +		}
> +
> +		if (tr->num_rmids < e->num_rmids)
> +			ret = false;
> +	}
> +
> +	return ret;
> +}

This does not look right. Wouldn't this return "true" for all_regions_have_sufficient_rmid()
when there are no usable regions? Trying to have one function do two things is not working well here.

This also seems awkward where the regions are marked as unusable here as a "side effect" but then
later the caller attempts to track "usable_events" separately? This change does not look to be
integrated well.

Why not just determine which regions are usable as a first step and from then on just interact with
usable regions? 

> +
>  static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
>  {
> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
>  	bool usable_events = false;
>  

This flow can start by determining and mark which regions are usable. Could be something like:
	if (!group_has_usable_regions(e, p))
		return false;

To reduce churn group_has_usable_regions() can be introduced in patch #17 to replace the open
code of it in enable_events().

>From this point the "feature group" is guaranteed to have at least one telemetry region usable
by the associated "event group" and all interactions can be with just the usable regions within it.
For example, all_regions_have_sufficient_rmid() can change to:

	static bool all_regions_have_sufficient_rmid(struct event_group *e, struct pmt_feature_group *p)
	{
		struct telemetry_region *tr;
		bool ret = true;

		for (int i = 0; i < p->count; i++) {
			if (!p->regions[i].addr)
				continue;
			tr = &p->regions[i];
			if (tr->num_rmids < e->num_rmids)
				ret = false;
		}

		return ret;
	}

> +	/* Disable feature if insufficient RMIDs */
> +	if (!all_regions_have_sufficient_rmid(e, p))
> +		rdt_set_feature_disabled(e->name);
> +
> +	/* User can override above disable from kernel command line */
> +	if (!rdt_is_feature_enabled(e->name))
> +		return false;
> +
>  	for (int i = 0; i < p->count; i++) {
> -		if (skip_telem_region(&p->regions[i], e)) {
> -			mark_telem_region_unusable(&p->regions[i]);
> +		if (!p->regions[i].addr)
>  			continue;
> -		}
> +		/*
> +		 * e->num_rmids only adjusted lower if user (via rdt= kernel
> +		 * parameter) forces an event group with insufficient RMID
> +		 * to be enabled.
> +		 */
> +		e->num_rmids = min(e->num_rmids, p->regions[i].num_rmids);
>  		usable_events = true;
>  	}
>  
>  	if (!usable_events)
>  		return false;
>  
> +	if (r->mon.num_rmid)
> +		r->mon.num_rmid = min(r->mon.num_rmid, e->num_rmids);
> +	else
> +		r->mon.num_rmid = e->num_rmids;
> +
>  	for (int j = 0; j < e->num_events; j++)
>  		resctrl_enable_mon_event(e->evts[j].id, true,
>  					 e->evts[j].bin_bits, &e->evts[j]);
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index 2238a5536f4b..f18cc5b38315 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -1135,7 +1135,7 @@ static int rdt_num_rmids_show(struct kernfs_open_file *of,
>  {
>  	struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
>  
> -	seq_printf(seq, "%d\n", r->mon.num_rmid);
> +	seq_printf(seq, "%u\n", r->mon.num_rmid);
>  
>  	return 0;
>  }

Reinette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ