[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87letfmk8e.fsf@linux.ibm.com>
Date: Wed, 29 Jun 2022 12:51:13 -0500
From: Nathan Lynch <nathanl@...ux.ibm.com>
To: Michael Ellerman <mpe@...erman.id.au>
Cc: linux-kernel@...r.kernel.org, npiggin@...il.com,
brking@...ux.ibm.com, srikar@...ux.vnet.ibm.com,
linuxppc-dev@...ts.ozlabs.org
Subject: Re: [PATCH] powerpc/smp: poll cpu_callin_map more aggressively in
__cpu_up()
Michael Ellerman <mpe@...erman.id.au> writes:
> Nathan Lynch <nathanl@...ux.ibm.com> writes:
>> Replace the outdated iteration and timeout calculations here with
>> indefinite spin_until_cond()-wrapped poll of cpu_callin_map. __cpu_up()
>> already does this when waiting for the cpu to set its online bit before
>> returning, so this change is not really making the function more brittle.
>
> I'm not sure I agree that this doesn't make the code more brittle.
>
> The existing indefinite wait you mention is later in the function, and
> happens after the CPU has successfully come into the kernel.
>
> I think it's more common that a stuck/borked CPU doesn't come into the
> kernel at all, rather than comes in and then fails to online.
>
> So I think the bail out when the CPU fails to call in is useful, I would
> guess I see that "Processor x is stuck" message multiple times a year
> while debugging various things.
Yeah I can see how my claim is too strong here.
>> Removing the msleep(1) in the hotplug path here reduces the time it takes
>> to online a CPU on a P9 PowerVM LPAR from about 30ms to 1ms when exercised
>> via thaw_secondary_cpus().
>
> That is a nice improvement.
>
> Can we do something that returns quickly in the happy case and still has
> a timeout when things go wrong? Seems like a busy loop with a
> time_after() check would do the trick.
Yes, I'll rework it like that. Thanks.
Powered by blists - more mailing lists