linux-kernel - Re: [PATCH v9 0/8] Parallel CPU bringup for x86

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <eb6717dfc4ceb99803c0396f950db7c3231c75ef.camel@infradead.org>
Date:   Thu, 23 Feb 2023 15:12:00 +0000
From:   David Woodhouse <dwmw2@...radead.org>
To:     Thomas Gleixner <tglx@...utronix.de>,
        Usama Arif <usama.arif@...edance.com>, kim.phillips@....com
Cc:     arjan@...ux.intel.com, mingo@...hat.com, bp@...en8.de,
        dave.hansen@...ux.intel.com, hpa@...or.com, x86@...nel.org,
        pbonzini@...hat.com, paulmck@...nel.org,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        rcu@...r.kernel.org, mimoja@...oja.de, hewenliang4@...wei.com,
        thomas.lendacky@....com, seanjc@...gle.com, pmenzel@...gen.mpg.de,
        fam.zheng@...edance.com, punit.agrawal@...edance.com,
        simon.evans@...edance.com, liangma@...ngbit.com,
        Ashok Raj <ashok.raj@...el.com>
Subject: Re: [PATCH v9 0/8] Parallel CPU bringup for x86_64

On Thu, 2023-02-23 at 15:37 +0100, Thomas Gleixner wrote:
> David!
> 
> On Thu, Feb 23 2023 at 11:07, David Woodhouse wrote:
> > On Wed, 2023-02-22 at 17:42 +0100, Thomas Gleixner wrote:
> > > The low hanging fruit which brings most is the identification/topology
> > > muck and the microcode loading. That needs to be addressed first anyway.
> > 
> > Agreed, thanks.
> 
> So the problem with microcode loading is that we must ensure that a HT
> sibling is not executing anything else than a trivial loop waiting for
> the update to complete. So something like this should work:
> 
>    1) Kick all CPUs into life and let them run up to cpu_init() and
>       retrieve only the topology information.
>
>    2) Wait for all CPUs to reach this point
>
>    3) Release all primary HT threads so they can load microcode in
>       parallel. The secondary HT threads stay in the wait loop and are
>       released once the primary thread has finished the microcode
>       update.
> 
>    4) Let the CPUs do the full CPUID readout and let them synchronize
>       with the control CPU again.
> 
>    5) Complete bringup one by one

Can we move the microcode loading to happen earlier, during the x86-
specific CPUHP_BP_PARALLEL_DYN stage(s) while they're running in
parallel.

In the existing set of patches, we send INIT/SIPI/SIPI to each CPU in
parallel and they run to the first part of start_secondary(), up to the
point where it calls cpu_init_secondary() and sets their bit in
cpu_initialized_mask, then spinning and waiting for cpu_callout_mask.

My "part 2" test patch does another round in parallel, setting each
CPU's bit in 'cpu_callout_mask' and letting them run a bit further
through start_secondary() until they get to the end of smp_callin(),
where they set their bit in smp_callin_mask and (in my patch) wait for
their bit in a new cpu_finishup_mask to be set — which is what releases
them to proceed to completion in the final native_cpu_up() bringup.

So perhaps the BSP doesn't need to coordinate anything here, if we can
let the siblings work it out between themselves in the (now-)parallel
stage at the end of smp_callin()? And only set their bit in
smp_callin_mask when the microcode update is done?

Hm, maybe it's as simple as the first¹ thread on a core waiting for all
its siblings' bits in cpu_callin_mask to be set, and *then* doing the
update before setting its own bit?

¹ As long as we define "first" as the one with the lowest CPU#, which
means that the BSP won't release any of the siblings before it releases
the "first".

Then the siblings are just spinning on cpu_callin_mask anyway; they
don't need to do anything *more*.

Probably worth knocking it up and seeing how badly it explodes?

Download attachment "smime.p7s" of type "application/pkcs7-signature" (5965 bytes)