lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87r0g3hm5o.ffs@tglx>
Date: Thu, 21 Mar 2024 12:14:27 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: Guenter Roeck <linux@...ck-us.net>
Cc: LKML <linux-kernel@...r.kernel.org>, x86@...nel.org, Linus Torvalds
 <torvalds@...uxfoundation.org>, Uros Bizjak <ubizjak@...il.com>,
 linux-sparse@...r.kernel.org, lkp@...el.com, oe-kbuild-all@...ts.linux.dev
Subject: Re: [patch 5/9] x86: Cure per CPU madness on UP

On Wed, Mar 20 2024 at 08:46, Guenter Roeck wrote:
> On 3/20/24 01:58, Thomas Gleixner wrote:
>> On Fri, Mar 15 2024 at 09:17, Guenter Roeck wrote:
>>> I don't know the code well enough to determine what is wrong.
>>> Please let me know what I can do to help debugging the problem.
>> 
>> Could you provide me the config and the qemu command line?
>> 
>
> defconfig-CONFIG_SMP and
>
> qemu-system-x86_64 -kernel arch/x86/boot/bzImage -cpu Haswell \
>       --append "console=ttyS0" -nographic -monitor none
>
> The cpu doesn't really matter as long as it is an Intel CPU.
> A root file system isn't needed since the boot doesn't get that far.

Now it get's interesting because I can't reproduce it with that setup at
all.

What's weird is that I saw it exactly once on 64-bit in a VM with a UP
config two days ago, but when I started to add instrumentation it never
came back even after backing the instrumentation changes out. I have
seriously no idea what's going on there.

Is it fully reproducible on your side?

If so can you please provide a full dmesg and then apply the patch below
and provide the resulting full dmesg too?

I found two other issues while trying to find a way to reproduce, but
those are completely unrelated to the problem you are observing.

Thanks,

        tglx
---
 arch/x86/kernel/cpu/topology.c |   19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -176,6 +176,8 @@ static __init void topo_register_apic(u3
 {
 	int cpu, dom;
 
+	pr_info("APIC: %x %d\n", apic_id, present);
+
 	if (present) {
 		set_bit(apic_id, phys_cpu_present_map);
 
@@ -277,10 +279,23 @@ int topology_get_logical_id(u32 apicid,
 	/* Remove the bits below @at_level to get the proper level ID of @apicid */
 	unsigned int lvlid = topo_apicid(apicid, at_level);
 
-	if (lvlid >= MAX_LOCAL_APIC)
+	pr_info("APIC logical ID: %x %x %d\n", apicid, lvlid, at_level);
+
+	if (WARN_ON_ONCE(lvlid >= MAX_LOCAL_APIC))
 		return -ERANGE;
-	if (!test_bit(lvlid, apic_maps[at_level].map))
+
+	/*
+	 * If there was no APIC registered, then the map check below would
+	 * fail. With no APIC this is guaranteed to be an UP system and
+	 * therefore all topology levels have only one entry and their
+	 * logical ID is obviously 0.
+	 */
+	if (topo_info.boot_cpu_apic_id == BAD_APICID)
+		return 0;
+
+	if (WARN_ON_ONCE(!test_bit(lvlid, apic_maps[at_level].map)))
 		return -ENODEV;
+
 	/* Get the number of set bits before @lvlid. */
 	return bitmap_weight(apic_maps[at_level].map, lvlid);
 }

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ