lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 29 Jun 2022 19:19:01 +1000
From:   Michael Ellerman <mpe@...erman.id.au>
To:     Nathan Lynch <nathanl@...ux.ibm.com>, linuxppc-dev@...ts.ozlabs.org
Cc:     linux-kernel@...r.kernel.org, npiggin@...il.com,
        brking@...ux.ibm.com, srikar@...ux.vnet.ibm.com
Subject: Re: [PATCH] powerpc/smp: poll cpu_callin_map more aggressively in
 __cpu_up()

Nathan Lynch <nathanl@...ux.ibm.com> writes:
> Replace the outdated iteration and timeout calculations here with
> indefinite spin_until_cond()-wrapped poll of cpu_callin_map. __cpu_up()
> already does this when waiting for the cpu to set its online bit before
> returning, so this change is not really making the function more brittle.

Sorry for the glacial response.

I'm not sure I agree that this doesn't make the code more brittle.

The existing indefinite wait you mention is later in the function, and
happens after the CPU has successfully come into the kernel.

I think it's more common that a stuck/borked CPU doesn't come into the
kernel at all, rather than comes in and then fails to online.

So I think the bail out when the CPU fails to call in is useful, I would
guess I see that "Processor x is stuck" message multiple times a year
while debugging various things.

> Removing the msleep(1) in the hotplug path here reduces the time it takes
> to online a CPU on a P9 PowerVM LPAR from about 30ms to 1ms when exercised
> via thaw_secondary_cpus().

That is a nice improvement.

Can we do something that returns quickly in the happy case and still has
a timeout when things go wrong? Seems like a busy loop with a
time_after() check would do the trick.

cheers

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ