lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 7 Dec 2023 02:41:34 +0000
From:   "Zhang, Rui" <rui.zhang@...el.com>
To:     "andres@...razel.de" <andres@...razel.de>
CC:     "linux-tip-commits@...r.kernel.org" 
        <linux-tip-commits@...r.kernel.org>,
        "x86@...nel.org" <x86@...nel.org>, "bp@...en8.de" <bp@...en8.de>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "jsperbeck@...gle.com" <jsperbeck@...gle.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>,
        "sashal@...nel.org" <sashal@...nel.org>,
        "tip-bot2@...utronix.de" <tip-bot2@...utronix.de>
Subject: Re: [tip: x86/urgent] x86/acpi: Ignore invalid x2APIC entries

Hi, Andres,

On Tue, 2023-12-05 at 22:58 -0800, Andres Freund wrote:
> Hi,
> 
> On 2023-12-01 08:31:48 +0000, Zhang, Rui wrote:
> > As a quick fix, I'm not going to fix the "potential issue"
> > describes
> > above because we have not seen a real problem caused by this yet.
> > 
> > Can you please try the below patch to confirm if the problem is
> > gone on
> > your system?
> > This patch falls back to the previous way as sent at
> > https://lore.kernel.org/lkml/87pm4bp54z.ffs@tglx/T/
> 
> 
> I've just spent a couple hours bisecting why upgrading to 6.7-rc4
> left me with
> just a single CPU core on my dual socket workstation.
> 
> 
> before:
> [    0.000000] Linux version 6.6.0-andres-00003-g31255e072b2e ...
> ...
> [    0.022960] ACPI: Using ACPI (MADT) for SMP configuration
> information
> ...
> [    0.022968] smpboot: Allowing 40 CPUs, 0 hotplug CPUs
> ...
> [    0.345921] smpboot: CPU0: Intel(R) Xeon(R) Gold 5215 CPU @
> 2.50GHz (family: 0x6, model: 0x55, stepping: 0x7)
> ...
> [    0.347229] .... node  #0, CPUs:        #1  #2  #3  #4  #5  #6 
> #7  #8  #9
> [    0.349082] .... node  #1, CPUs:   #10 #11 #12 #13 #14 #15 #16 #17
> #18 #19
> [    0.003190] smpboot: CPU 10 Converting physical 0 to logical die 1
> 
> [    0.361053] .... node  #0, CPUs:   #20 #21 #22 #23 #24 #25 #26 #27
> #28 #29
> [    0.363990] .... node  #1, CPUs:   #30 #31 #32 #33 #34 #35 #36 #37
> #38 #39
> ...
> [    0.370886] smp: Brought up 2 nodes, 40 CPUs
> [    0.370891] smpboot: Max logical packages: 2
> [    0.370896] smpboot: Total of 40 processors activated (200000.00
> BogoMIPS)
> [    0.403905] node 0 deferred pages initialised in 32ms
> [    0.408865] node 1 deferred pages initialised in 37ms
> 
> 
> after:
> [    0.000000] Linux version 6.6.0-andres-00004-gec9aedb2aa1a ...
> ...
> [    0.022935] ACPI: Using ACPI (MADT) for SMP configuration
> information
> ...
> [    0.022942] smpboot: Allowing 1 CPUs, 0 hotplug CPUs
> ...
> [    0.356424] smpboot: CPU0: Intel(R) Xeon(R) Gold 5215 CPU @
> 2.50GHz (family: 0x6, model: 0x55, stepping: 0x7)
> ...
> [    0.357098] smp: Bringing up secondary CPUs ...
> [    0.357107] smp: Brought up 2 nodes, 1 CPU
> [    0.357108] smpboot: Max logical packages: 1
> [    0.357110] smpboot: Total of 1 processors activated (5000.00
> BogoMIPS)
> [    0.726283] node 0 deferred pages initialised in 368ms
> [    0.774704] node 1 deferred pages initialised in 418ms
> 
> 
> There does seem to be something off with the ACPI data, when booting
> without
> the patch,

which patch are you referring to? the original patch in this thread?

Does the second patch fixes the problem? I mean the patch at
https://lore.kernel.org/all/904ce2b870b8a7f34114f93adc7c8170420869d1.camel@intel.com/

thanks,
rui


>  I do see messages like:
> [    0.715228] APIC: NR_CPUS/possible_cpus limit of 40 reached.
> Processor 40/0x7f00 ignored.
> [    0.715231] ACPI: Unable to map lapic to logical cpu number
> 
> But other than that, the system has worked for a couple years.
> 
> 
> It's obviously not good to regress from 2x10/20 cores/threads to a
> single
> core.   I guess it's at least somewhat funny to imagine a 2 socket
> system with
> a single core...
> 
> 
> It seems particularly worrying that this patch has apparently been
> selected
> for -stable:
> https://lore.kernel.org/all/20231122153212.852040-2-sashal@kernel.org/
> 
> Even if it didn't have these unintended consequences, it seems like a
> commit
> like this hardly is -stable material?
> 
> 
> I've attached .config, dmesg of a boot with gec9aedb2aa1a and one
> with
> gec9aedb2aa1a^.
> 
> Greetings,
> 
> Andres Freund

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ