lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231206065850.hs7k554v6wym7gw5@awork3.anarazel.de>
Date:   Tue, 5 Dec 2023 22:58:50 -0800
From:   Andres Freund <andres@...razel.de>
To:     "Zhang, Rui" <rui.zhang@...el.com>
Cc:     "jsperbeck@...gle.com" <jsperbeck@...gle.com>,
        "tip-bot2@...utronix.de" <tip-bot2@...utronix.de>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-tip-commits@...r.kernel.org" 
        <linux-tip-commits@...r.kernel.org>,
        "x86@...nel.org" <x86@...nel.org>, Sasha Levin <sashal@...nel.org>,
        stable@...r.kernel.org, Borislav Petkov <bp@...en8.de>
Subject: Re: [tip: x86/urgent] x86/acpi: Ignore invalid x2APIC entries

Hi,

On 2023-12-01 08:31:48 +0000, Zhang, Rui wrote:
> As a quick fix, I'm not going to fix the "potential issue" describes
> above because we have not seen a real problem caused by this yet.
>
> Can you please try the below patch to confirm if the problem is gone on
> your system?
> This patch falls back to the previous way as sent at
> https://lore.kernel.org/lkml/87pm4bp54z.ffs@tglx/T/


I've just spent a couple hours bisecting why upgrading to 6.7-rc4 left me with
just a single CPU core on my dual socket workstation.


before:
[    0.000000] Linux version 6.6.0-andres-00003-g31255e072b2e ...
...
[    0.022960] ACPI: Using ACPI (MADT) for SMP configuration information
...
[    0.022968] smpboot: Allowing 40 CPUs, 0 hotplug CPUs
...
[    0.345921] smpboot: CPU0: Intel(R) Xeon(R) Gold 5215 CPU @ 2.50GHz (family: 0x6, model: 0x55, stepping: 0x7)
...
[    0.347229] .... node  #0, CPUs:        #1  #2  #3  #4  #5  #6  #7  #8  #9
[    0.349082] .... node  #1, CPUs:   #10 #11 #12 #13 #14 #15 #16 #17 #18 #19
[    0.003190] smpboot: CPU 10 Converting physical 0 to logical die 1

[    0.361053] .... node  #0, CPUs:   #20 #21 #22 #23 #24 #25 #26 #27 #28 #29
[    0.363990] .... node  #1, CPUs:   #30 #31 #32 #33 #34 #35 #36 #37 #38 #39
...
[    0.370886] smp: Brought up 2 nodes, 40 CPUs
[    0.370891] smpboot: Max logical packages: 2
[    0.370896] smpboot: Total of 40 processors activated (200000.00 BogoMIPS)
[    0.403905] node 0 deferred pages initialised in 32ms
[    0.408865] node 1 deferred pages initialised in 37ms


after:
[    0.000000] Linux version 6.6.0-andres-00004-gec9aedb2aa1a ...
...
[    0.022935] ACPI: Using ACPI (MADT) for SMP configuration information
...
[    0.022942] smpboot: Allowing 1 CPUs, 0 hotplug CPUs
...
[    0.356424] smpboot: CPU0: Intel(R) Xeon(R) Gold 5215 CPU @ 2.50GHz (family: 0x6, model: 0x55, stepping: 0x7)
...
[    0.357098] smp: Bringing up secondary CPUs ...
[    0.357107] smp: Brought up 2 nodes, 1 CPU
[    0.357108] smpboot: Max logical packages: 1
[    0.357110] smpboot: Total of 1 processors activated (5000.00 BogoMIPS)
[    0.726283] node 0 deferred pages initialised in 368ms
[    0.774704] node 1 deferred pages initialised in 418ms


There does seem to be something off with the ACPI data, when booting without
the patch, I do see messages like:
[    0.715228] APIC: NR_CPUS/possible_cpus limit of 40 reached. Processor 40/0x7f00 ignored.
[    0.715231] ACPI: Unable to map lapic to logical cpu number

But other than that, the system has worked for a couple years.


It's obviously not good to regress from 2x10/20 cores/threads to a single
core.   I guess it's at least somewhat funny to imagine a 2 socket system with
a single core...


It seems particularly worrying that this patch has apparently been selected
for -stable:
https://lore.kernel.org/all/20231122153212.852040-2-sashal@kernel.org/

Even if it didn't have these unintended consequences, it seems like a commit
like this hardly is -stable material?


I've attached .config, dmesg of a boot with gec9aedb2aa1a and one with
gec9aedb2aa1a^.

Greetings,

Andres Freund

View attachment "dmesg-6.7-ec9aedb2aa1a-onecpu" of type "text/plain" (179401 bytes)

View attachment "dmesg-6.7-ec9aedb2aa1a^-onecpu" of type "text/plain" (179396 bytes)

View attachment ".config" of type "text/plain" (169396 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ