lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ce02f1a8-870f-41bc-8650-4bd6103f9637@intel.com>
Date: Fri, 15 Mar 2024 10:58:38 -0700
From: Dave Hansen <dave.hansen@...el.com>
To: Feng Tang <feng.tang@...el.com>, Thomas Gleixner <tglx@...utronix.de>,
 Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
 "H . Peter Anvin" <hpa@...or.com>, Peter Zijlstra <peterz@...radead.org>,
 x86@...nel.org, paulmck@...nel.org, rui.zhang@...el.com,
 Waiman Long <longman@...hat.com>, linux-kernel@...r.kernel.org
Cc: Dave Hansen <dave.hansen@...ux.intel.com>
Subject: Re: [PATCH] x86/tsc: Use topology_max_packages() to get package
 number

On 3/15/24 04:26, Feng Tang wrote:
> Thomas' recent patchset of refactoring x86 topology code introduces
> topology_max_package(), which works well in most of the above cases.
> The only exceptions are 'nr_cpus=' and 'possible_cpus=' setup, which
> sets up the 'nr_cpu_ids' and rejects the rest of the CPUs, and may
> cause topology_max_package() less than the real package number, but
> it's fine as it is rarely used debug option, and logical package
> number really matters in this check. So use the more accurate
> topology_max_package() to replace nr_online_nodes().

In the end, we have a bunch of hardware enumeration and then a bunch of
processing on top of it taking CPU hotplug support and kernel command
lines into account.

The hardware enumeration is relatively simple.  The processing the
kernel does on top of it is complicated.

> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 5a69a49acc96..87e7c0e89db1 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1252,15 +1252,12 @@ static void __init check_system_tsc_reliable(void)
>  	 *  - TSC which does not stop in C-States
>  	 *  - the TSC_ADJUST register which allows to detect even minimal
>  	 *    modifications
> -	 *  - not more than two sockets. As the number of sockets cannot be
> -	 *    evaluated at the early boot stage where this has to be
> -	 *    invoked, check the number of online memory nodes as a
> -	 *    fallback solution which is an reasonable estimate.
> +	 *  - not more than four sockets.
>  	 */
>  	if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
>  	    boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
>  	    boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
> -	    nr_online_nodes <= 4)
> +	    topology_max_packages() <= 4)
>  		tsc_disable_clocksource_watchdog();
>  }

I know there's some history here, but the changelog itself is not clear
about what the problem is or how the patch solves it.

I also kinda dislike the comment talking about "sockets" and the code
talking about "packages".  I also did a big *gulp* when I saw this:

	#define topology_max_packages() (__max_logical_packages)

and:

>         /*
>          * Today neither Intel nor AMD support heterogeneous systems so
>          * extrapolate the boot cpu's data to all packages.
>          */
>         ncpus = cpu_data(0).booted_cores * topology_max_smt_threads();
>         __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);

Because Intel obviously has heterogeneous systems today.

So I'll buy that removing 'nr_online_nodes' takes NUMA out of the
picture (which is good), but I want to hear more about why
topology_max_packages() and '4' are the right things to be checking.

I suspect the real reason '4' was picked was to give the calculation
some wiggle room because it's not actually all that precise.



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ