[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7d0be0f7-e988-4ca2-b61b-21853e4268f6@roeck-us.net>
Date: Fri, 15 Mar 2024 17:56:36 -0700
From: Guenter Roeck <linux@...ck-us.net>
To: Thomas Gleixner <tglx@...utronix.de>,
Linus Torvalds <torvalds@...uxfoundation.org>
Cc: LKML <linux-kernel@...r.kernel.org>, x86@...nel.org,
Uros Bizjak <ubizjak@...il.com>, linux-sparse@...r.kernel.org,
lkp@...el.com, oe-kbuild-all@...ts.linux.dev
Subject: Re: [patch 5/9] x86: Cure per CPU madness on UP
On 3/15/24 15:55, Thomas Gleixner wrote:
> On Fri, Mar 15 2024 at 18:40, Thomas Gleixner wrote:
>> On Fri, Mar 15 2024 at 09:42, Linus Torvalds wrote:
>>> On Fri, 15 Mar 2024 at 09:17, Guenter Roeck <linux@...ck-us.net> wrote:
>>> Thomas, over to you. I wonder if maybe all those topology macros
>>> should just return 0 on an UP build, but that
>>> topology_get_logical_id() thing looks a bit wrong regardless.
>>>
>>> It really shouldn't depend on local apic data for configs that may not
>>> *have* a local apic.
>>
>> Right. Let me look.
>
> Not really. The problem is that a SMP build can run on a UP machine w/o
> APIC or command line disables the APIC and will run into the exactly
> same problem. The only case where we know that it is impossible is when
> APIC support is disabled, which is silly but topic for a different
> discussion.
>
> So the proper thing to do is to check for num_possible_cpus() == 1 in
> that function.
>
> Sure you can argue that we could avoid it for SMP=n builds completely,
> but I think the right thing to do is to aim for removing CONFIG_SMP and
> make the UP build a subset of a generic SMP capable build which has
> CONFIG_NR_CPUS=1, i.e. num_possible_cpus() = 1. Why?
>
> Because it consolidates the code and makes UP use exactly the same
> mechanisms as SMP which pretty much avoids the problem we see today that
> UP lacks test coverage and becomes more esoteric and untested over time.
>
> The downside is a slightly larger kernel image for such a build.
>
> Though if we pretend that we seriously care about that 10% larger memory
> footprint or about the marginal performance benefit of SMP=n on dead
> hardware, then we are just taking the wrong pills.
>
FWIW, I would very much prefer for SMP=n builds to go away for x86.
I don't think anyone uses that in the real world nowadays, and I never
know if I should report problems like this one or just stop testing it.
> The point is that this very commit in question was heading deliberately
> into the direction of removing the by now silly differences of UP/SMP
> for correctness sake. It just happened to unearth the missing check in
> topology_get_logical_id(), but that check is required independent of
> SMP=y/n as I pointed out above.
>
> Thanks,
>
> tglx
>
> ---
> diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
> index 3259b1d4fefe..118d9f7792ee 100644
> --- a/arch/x86/kernel/cpu/topology.c
> +++ b/arch/x86/kernel/cpu/topology.c
> @@ -279,6 +279,15 @@ int topology_get_logical_id(u32 apicid, enum x86_topology_domains at_level)
>
> if (lvlid >= MAX_LOCAL_APIC)
> return -ERANGE;
> + /*
> + * Spare the exercise on UP as there is only one instance at any
> + * level and the map check below might fail because the CPU does
> + * not have a local APIC or local APIC has been disabled on the
> + * kernel command line.
> + */
> + if (num_possible_cpus() == 1)
> + return 0;
> +
That works.
Tested-by: Guenter Roeck <linux@...ck-us.net>
> if (!test_bit(lvlid, apic_maps[at_level].map))
> return -ENODEV;
> /* Get the number of set bits before @lvlid. */
Powered by blists - more mailing lists