lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c4518fe2-dce3-46d1-8d79-cd63377bdcad@intel.com>
Date: Tue, 22 Jul 2025 11:19:58 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Barret Rhoden <brho@...gle.com>, Tony Luck <tony.luck@...el.com>
CC: Dave Martin <Dave.Martin@....com>, James Morse <james.morse@....com>,
	<linux-kernel@...r.kernel.org>, "x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH] x86/resctrl: avoid divide by 0 num_rmid

Hi Barret,

On 7/21/25 11:00 AM, Barret Rhoden wrote:
> x86_cache_max_rmid's default is -1.  If the hardware or VM doesn't set
> the right cpuid bits, num_rmid can be 0.
> 
> Signed-off-by: Barret Rhoden <brho@...gle.com>
> 
> ---
> I ran into this on a VM on granite rapids.  I guess the VMM told the
> kernel it was a GNR, but didn't set all the cache/rsctl bits.
> 

The -1 default of x86_cache_max_rmid is assigned if the hardware does not
support *any* L3 monitoring. Specifically:

resctrl_cpu_detect():
	if (!cpu_has(c, X86_FEATURE_CQM_LLC)) {
		c->x86_cache_max_rmid  = -1;
		...
	}

The function modified by this patch, rdt_get_mon_l3_config() only runs if
the hardware supports one or more of the L3 monitoring sub-features
(X86_FEATURE_CQM_OCCUP_LLC, X86_FEATURE_CQM_MBM_TOTAL, or
X86_FEATURE_CQM_MBM_LOCAL) that depend on X86_FEATURE_CQM_LLC per cpuid_deps[].

I tried to reproduce the issue on real hardware by using clearcpuid to
disable X86_FEATURE_CQM_LLC and the CPUID dependencies did the right thing
by automatically disabling X86_FEATURE_CQM_OCCUP_LLC, X86_FEATURE_CQM_MBM_TOTAL, 
X86_FEATURE_CQM_MBM_LOCAL, not running rdt_get_mon_l3_config() at all, and
not even attempt to enumerate any of the L3 monitoring details.

What are the symptoms when you encounter this issue?

Would it be possible to send me the CPUID flags of leaf 7, subleaf 0 as
well as all sub-leaves of leaf 0xF?

Could you please also elaborate what the impact of this issue is? Is this
a VM that has been released with many users impacted or something encountered
during development of this VM?

>  arch/x86/kernel/cpu/resctrl/monitor.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index c261558276cd..226dee05f96e 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -359,6 +359,12 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>  	else if (mbm_offset > MBM_CNTR_WIDTH_OFFSET_MAX)
>  		pr_warn("Ignoring impossible MBM counter offset\n");
>  
> +	if (r->num_rmid < 1) {
> +		pr_warn("Invalid num_rmid %d, cach_max_rmid was %d\n",
> +			r->num_rmid, boot_cpu_data.x86_cache_max_rmid);
> +		r->num_rmid = 1;

I do not think enumeration of this feature should proceed/succeed if there clearly
is a configuration issue.

> +	}
> +
>  	/*
>  	 * A reasonable upper limit on the max threshold is the number
>  	 * of lines tagged per RMID if all RMIDs have the same number of

Reinette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ