[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87o7bfjeae.ffs@tglx>
Date: Fri, 15 Mar 2024 23:55:37 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: Linus Torvalds <torvalds@...uxfoundation.org>, Guenter Roeck
<linux@...ck-us.net>
Cc: LKML <linux-kernel@...r.kernel.org>, x86@...nel.org, Uros Bizjak
<ubizjak@...il.com>, linux-sparse@...r.kernel.org, lkp@...el.com,
oe-kbuild-all@...ts.linux.dev
Subject: Re: [patch 5/9] x86: Cure per CPU madness on UP
On Fri, Mar 15 2024 at 18:40, Thomas Gleixner wrote:
> On Fri, Mar 15 2024 at 09:42, Linus Torvalds wrote:
>> On Fri, 15 Mar 2024 at 09:17, Guenter Roeck <linux@...ck-us.net> wrote:
>> Thomas, over to you. I wonder if maybe all those topology macros
>> should just return 0 on an UP build, but that
>> topology_get_logical_id() thing looks a bit wrong regardless.
>>
>> It really shouldn't depend on local apic data for configs that may not
>> *have* a local apic.
>
> Right. Let me look.
Not really. The problem is that a SMP build can run on a UP machine w/o
APIC or command line disables the APIC and will run into the exactly
same problem. The only case where we know that it is impossible is when
APIC support is disabled, which is silly but topic for a different
discussion.
So the proper thing to do is to check for num_possible_cpus() == 1 in
that function.
Sure you can argue that we could avoid it for SMP=n builds completely,
but I think the right thing to do is to aim for removing CONFIG_SMP and
make the UP build a subset of a generic SMP capable build which has
CONFIG_NR_CPUS=1, i.e. num_possible_cpus() = 1. Why?
Because it consolidates the code and makes UP use exactly the same
mechanisms as SMP which pretty much avoids the problem we see today that
UP lacks test coverage and becomes more esoteric and untested over time.
The downside is a slightly larger kernel image for such a build.
Though if we pretend that we seriously care about that 10% larger memory
footprint or about the marginal performance benefit of SMP=n on dead
hardware, then we are just taking the wrong pills.
The point is that this very commit in question was heading deliberately
into the direction of removing the by now silly differences of UP/SMP
for correctness sake. It just happened to unearth the missing check in
topology_get_logical_id(), but that check is required independent of
SMP=y/n as I pointed out above.
Thanks,
tglx
---
diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index 3259b1d4fefe..118d9f7792ee 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -279,6 +279,15 @@ int topology_get_logical_id(u32 apicid, enum x86_topology_domains at_level)
if (lvlid >= MAX_LOCAL_APIC)
return -ERANGE;
+ /*
+ * Spare the exercise on UP as there is only one instance at any
+ * level and the map check below might fail because the CPU does
+ * not have a local APIC or local APIC has been disabled on the
+ * kernel command line.
+ */
+ if (num_possible_cpus() == 1)
+ return 0;
+
if (!test_bit(lvlid, apic_maps[at_level].map))
return -ENODEV;
/* Get the number of set bits before @lvlid. */
Powered by blists - more mailing lists