[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Zfj0OvSG8Ox9x8Tj@feng-clx.sh.intel.com>
Date: Tue, 19 Mar 2024 10:11:06 +0800
From: Feng Tang <feng.tang@...el.com>
To: Dave Hansen <dave.hansen@...el.com>
CC: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>, "H . Peter Anvin" <hpa@...or.com>, "Peter
Zijlstra" <peterz@...radead.org>, <x86@...nel.org>, <paulmck@...nel.org>,
<rui.zhang@...el.com>, Waiman Long <longman@...hat.com>,
<linux-kernel@...r.kernel.org>, Dave Hansen <dave.hansen@...ux.intel.com>
Subject: Re: [PATCH] x86/tsc: Use topology_max_packages() to get package
number
On Mon, Mar 18, 2024 at 09:35:57AM -0700, Dave Hansen wrote:
> On 3/15/24 04:26, Feng Tang wrote:
> > Thomas' recent patchset of refactoring x86 topology code introduces
> > topology_max_package(), which works well in most of the above cases.
> > The only exceptions are 'nr_cpus=' and 'possible_cpus=' setup, which
> > sets up the 'nr_cpu_ids' and rejects the rest of the CPUs, and may
> > cause topology_max_package() less than the real package number, but
> > it's fine as it is rarely used debug option, and logical package
> > number really matters in this check. So use the more accurate
> > topology_max_package() to replace nr_online_nodes().
>
> This is a lot longer than what you have in this changelog, but I hope it
> makes sense. This is retreading some of what Thomas's set does, but I
> think it's worth re-explaining:
>
> The boot CPU iterates through all enumerated APIC ids and either accepts
> or rejects the APIC id. For accepted ids, it figures out which bits of
> the id map to the package number. It tracks which package numbers have
> been seen in a bitmap. topology_max_package() just returns the number
> of bits set in that bitmap.
>
> 'nr_cpus=' and 'possible_cpus=' can cause more APIC ids to be rejected
> and can artificially lower the number of bits in the package bitmap and
> thus topology_max_package().
>
> This means that, for example, a system with 8 physical packages might
> reject all the CPUs on 6 of those packages and be left with only 2
> packages and 2 bits set in the package bitmap. It needs the TSC
> watchdog, but would disable it anyway. This isn't ideal, but this only
> happens for debug-oriented options.
>
> This is fixable by tracking the package numbers for rejected CPUs. But
> it's not worth the trouble for debugging.
Will steal it for the v2 :). Many thanks!
>
> --
>
> The only thing that comes to mind is if we need something simple like:
>
> if (topo_info.nr_rejected_cpus)
> pr_info("TSC might be buggered\n");
>
> ... somewhere.
I think logically this should be together with the package checking
code in tsc.c, and we need to export the info from static data
'topo_info'. The other thought is adding in topology code:
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -460,8 +460,11 @@ void __init topology_init_possible_cpus(void)
pr_info("Num. threads per package: %3u\n", __num_threads_per_package);
pr_info("Allowing %u present CPUs plus %u hotplug CPUs\n", assigned, disabled);
- if (topo_info.nr_rejected_cpus)
+ if (topo_info.nr_rejected_cpus) {
pr_info("Rejected CPUs %u\n", topo_info.nr_rejected_cpus);
+ if (__max_logical_packages <= 4)
+ pr_warning("TSC might be bugged due to these rejected CPUs\n");
+ }
Thanks,
Feng
Powered by blists - more mailing lists