lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Zfj0OvSG8Ox9x8Tj@feng-clx.sh.intel.com>
Date: Tue, 19 Mar 2024 10:11:06 +0800
From: Feng Tang <feng.tang@...el.com>
To: Dave Hansen <dave.hansen@...el.com>
CC: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
	Borislav Petkov <bp@...en8.de>, "H . Peter Anvin" <hpa@...or.com>, "Peter
 Zijlstra" <peterz@...radead.org>, <x86@...nel.org>, <paulmck@...nel.org>,
	<rui.zhang@...el.com>, Waiman Long <longman@...hat.com>,
	<linux-kernel@...r.kernel.org>, Dave Hansen <dave.hansen@...ux.intel.com>
Subject: Re: [PATCH] x86/tsc: Use topology_max_packages() to get package
 number

On Mon, Mar 18, 2024 at 09:35:57AM -0700, Dave Hansen wrote:
> On 3/15/24 04:26, Feng Tang wrote:
> > Thomas' recent patchset of refactoring x86 topology code introduces
> > topology_max_package(), which works well in most of the above cases.
> > The only exceptions are 'nr_cpus=' and 'possible_cpus=' setup, which
> > sets up the 'nr_cpu_ids' and rejects the rest of the CPUs, and may
> > cause topology_max_package() less than the real package number, but
> > it's fine as it is rarely used debug option, and logical package
> > number really matters in this check. So use the more accurate
> > topology_max_package() to replace nr_online_nodes().
> 
> This is a lot longer than what you have in this changelog, but I hope it
> makes sense.  This is retreading some of what Thomas's set does, but I
> think it's worth re-explaining:
> 
> The boot CPU iterates through all enumerated APIC ids and either accepts
> or rejects the APIC id.  For accepted ids, it figures out which bits of
> the id map to the package number.  It tracks which package numbers have
> been seen in a bitmap.  topology_max_package() just returns the number
> of bits set in that bitmap.
> 
> 'nr_cpus=' and 'possible_cpus=' can cause more APIC ids to be rejected
> and can artificially lower the number of bits in the package bitmap and
> thus topology_max_package().
> 
> This means that, for example, a system with 8 physical packages might
> reject all the CPUs on 6 of those packages and be left with only 2
> packages and 2 bits set in the package bitmap.  It needs the TSC
> watchdog, but would disable it anyway.  This isn't ideal, but this only
> happens for debug-oriented options.
> 
> This is fixable by tracking the package numbers for rejected CPUs.  But
> it's not worth the trouble for debugging.

Will steal it for the v2 :). Many thanks!

> 
> --
> 
> The only thing that comes to mind is if we need something simple like:
> 
> 	if (topo_info.nr_rejected_cpus)
> 		pr_info("TSC might be buggered\n");
> 
> ... somewhere.

I think logically this should be together with the package checking
code in tsc.c, and we need to export the info from static data
'topo_info'. The other thought is adding in topology code:

--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -460,8 +460,11 @@ void __init topology_init_possible_cpus(void)
 	pr_info("Num. threads per package: %3u\n", __num_threads_per_package);
 
 	pr_info("Allowing %u present CPUs plus %u hotplug CPUs\n", assigned, disabled);
-	if (topo_info.nr_rejected_cpus)
+	if (topo_info.nr_rejected_cpus) {
 		pr_info("Rejected CPUs %u\n", topo_info.nr_rejected_cpus);
+		if (__max_logical_packages <= 4)
+			pr_warning("TSC might be bugged due to these rejected CPUs\n");
+	}

Thanks,
Feng

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ