[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20171025120940.15721-4-prarit@redhat.com>
Date: Wed, 25 Oct 2017 08:09:40 -0400
From: Prarit Bhargava <prarit@...hat.com>
To: linux-kernel@...r.kernel.org
Cc: Prarit Bhargava <prarit@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
Peter Zijlstra <peterz@...radead.org>,
Andi Kleen <ak@...ux.intel.com>,
Dave Hansen <dave.hansen@...el.com>,
Piotr Luc <piotr.luc@...el.com>,
Kan Liang <kan.liang@...el.com>, Borislav Petkov <bp@...e.de>,
Stephane Eranian <eranian@...gle.com>,
Arvind Yadav <arvind.yadav.cs@...il.com>,
Andy Lutomirski <luto@...nel.org>,
Christian Borntraeger <borntraeger@...ibm.com>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Tom Lendacky <thomas.lendacky@....com>,
Mathias Krause <minipli@...glemail.com>,
Tim Chen <tim.c.chen@...ux.intel.com>,
Vitaly Kuznetsov <vkuznets@...hat.com>
Subject: [PATCH 3/3 v4] x86/smpboot: Fix __max_logical_packages estimate
A system booted with a small number of cores enabled per package
panics because the estimate of __max_logical_packages is too low.
This occurs when the total number of active cores across all packages
is less than the maximum core count for a single package.
ie) On a 4 package system with 20 cores/package where only 4 cores
are enabled on each package, the value of __max_logical_packages is
calculated as DIV_ROUND_UP(16 / 20) = 1 and not 4.
Here's an example of the panic:
smpboot: Booting Node 1, Processors #1 OK
smpboot: Package 1 of CPU 1 exceeds BIOS package data 1.
------------[ cut here ]------------
kernel BUG at arch/x86/kernel/cpu/common.c:1087!
invalid opcode: 0000 [#1] SMP
Modules linked in:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.0-rc2+ #4
Calculate __max_logical_packages after the cpu enumeration has completed.
Use the boot cpu's data to extrapolate the number of packages.
Signed-off-by: Prarit Bhargava <prarit@...hat.com>
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: Ingo Molnar <mingo@...hat.com>
Cc: "H. Peter Anvin" <hpa@...or.com>
Cc: x86@...nel.org
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Andi Kleen <ak@...ux.intel.com>
Cc: Dave Hansen <dave.hansen@...el.com>
Cc: Piotr Luc <piotr.luc@...el.com>
Cc: Kan Liang <kan.liang@...el.com>
Cc: Borislav Petkov <bp@...e.de>
Cc: Stephane Eranian <eranian@...gle.com>
Cc: Arvind Yadav <arvind.yadav.cs@...il.com>
Cc: Andy Lutomirski <luto@...nel.org>
Cc: Christian Borntraeger <borntraeger@...ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
Cc: Tom Lendacky <thomas.lendacky@....com>
Cc: Mathias Krause <minipli@...glemail.com>
Cc: Tim Chen <tim.c.chen@...ux.intel.com>
Cc: Vitaly Kuznetsov <vkuznets@...hat.com>
---
arch/x86/kernel/smpboot.c | 57 +++++++++--------------------------------------
1 file changed, 10 insertions(+), 47 deletions(-)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index ca615bc94e82..6307e92f54ae 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -307,12 +307,6 @@ int topology_update_package_map(unsigned int pkg, unsigned int cpu)
if (new >= 0)
goto found;
- if (logical_packages >= __max_logical_packages) {
- pr_warn("Package %u of CPU %u exceeds BIOS package data %u.\n",
- logical_packages, cpu, __max_logical_packages);
- return -ENOSPC;
- }
-
new = logical_packages++;
/* Allocate and copy a new array */
@@ -335,46 +329,6 @@ int topology_update_package_map(unsigned int pkg, unsigned int cpu)
return 0;
}
-static void __init smp_init_package_map(struct cpuinfo_x86 *c, unsigned int cpu)
-{
- unsigned int ncpus;
-
- /*
- * Today neither Intel nor AMD support heterogenous systems. That
- * might change in the future....
- *
- * While ideally we'd want '* smp_num_siblings' in the below @ncpus
- * computation, this won't actually work since some Intel BIOSes
- * report inconsistent HT data when they disable HT.
- *
- * In particular, they reduce the APIC-IDs to only include the cores,
- * but leave the CPUID topology to say there are (2) siblings.
- * This means we don't know how many threads there will be until
- * after the APIC enumeration.
- *
- * By not including this we'll sometimes over-estimate the number of
- * logical packages by the amount of !present siblings, but this is
- * still better than MAX_LOCAL_APIC.
- *
- * We use total_cpus not nr_cpu_ids because nr_cpu_ids can be limited
- * on the command line leading to a similar issue as the HT disable
- * problem because the hyperthreads are usually enumerated after the
- * primary cores.
- */
- ncpus = boot_cpu_data.x86_max_cores;
- if (!ncpus) {
- pr_warn("x86_max_cores == zero !?!?");
- ncpus = 1;
- }
-
- __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);
- pr_info("Max logical packages: %u\n", __max_logical_packages);
-
- logical_packages = 0;
-
- topology_update_package_map(c->phys_proc_id, cpu);
-}
-
void __init smp_store_boot_cpu_info(void)
{
int id = 0; /* CPU 0 */
@@ -382,7 +336,7 @@ void __init smp_store_boot_cpu_info(void)
*c = boot_cpu_data;
c->cpu_index = id;
- smp_init_package_map(c, id);
+ topology_update_package_map(c->phys_proc_id, id);
}
/*
@@ -1382,7 +1336,16 @@ void __init native_smp_prepare_boot_cpu(void)
void __init native_smp_cpus_done(unsigned int max_cpus)
{
+ int ncpus;
+
pr_debug("Boot done\n");
+ /*
+ * Today neither Intel nor AMD support heterogenous systems so
+ * extrapolate the boot cpu's data to all packages.
+ */
+ ncpus = cpu_data(0).booted_cores * smp_num_siblings;
+ __max_logical_packages = DIV_ROUND_UP(nr_cpu_ids, ncpus);
+ pr_info("Max logical packages: %u\n", __max_logical_packages);
if (x86_has_numa_in_package)
set_sched_topology(x86_numa_in_package_topology);
--
2.15.0.rc0.39.g2f0e14e64
Powered by blists - more mailing lists