[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <eb6717dfc4ceb99803c0396f950db7c3231c75ef.camel@infradead.org>
Date: Thu, 23 Feb 2023 15:12:00 +0000
From: David Woodhouse <dwmw2@...radead.org>
To: Thomas Gleixner <tglx@...utronix.de>,
Usama Arif <usama.arif@...edance.com>, kim.phillips@....com
Cc: arjan@...ux.intel.com, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com, hpa@...or.com, x86@...nel.org,
pbonzini@...hat.com, paulmck@...nel.org,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
rcu@...r.kernel.org, mimoja@...oja.de, hewenliang4@...wei.com,
thomas.lendacky@....com, seanjc@...gle.com, pmenzel@...gen.mpg.de,
fam.zheng@...edance.com, punit.agrawal@...edance.com,
simon.evans@...edance.com, liangma@...ngbit.com,
Ashok Raj <ashok.raj@...el.com>
Subject: Re: [PATCH v9 0/8] Parallel CPU bringup for x86_64
On Thu, 2023-02-23 at 15:37 +0100, Thomas Gleixner wrote:
> David!
>
> On Thu, Feb 23 2023 at 11:07, David Woodhouse wrote:
> > On Wed, 2023-02-22 at 17:42 +0100, Thomas Gleixner wrote:
> > > The low hanging fruit which brings most is the identification/topology
> > > muck and the microcode loading. That needs to be addressed first anyway.
> >
> > Agreed, thanks.
>
> So the problem with microcode loading is that we must ensure that a HT
> sibling is not executing anything else than a trivial loop waiting for
> the update to complete. So something like this should work:
>
> 1) Kick all CPUs into life and let them run up to cpu_init() and
> retrieve only the topology information.
>
> 2) Wait for all CPUs to reach this point
>
> 3) Release all primary HT threads so they can load microcode in
> parallel. The secondary HT threads stay in the wait loop and are
> released once the primary thread has finished the microcode
> update.
>
> 4) Let the CPUs do the full CPUID readout and let them synchronize
> with the control CPU again.
>
> 5) Complete bringup one by one
Can we move the microcode loading to happen earlier, during the x86-
specific CPUHP_BP_PARALLEL_DYN stage(s) while they're running in
parallel.
In the existing set of patches, we send INIT/SIPI/SIPI to each CPU in
parallel and they run to the first part of start_secondary(), up to the
point where it calls cpu_init_secondary() and sets their bit in
cpu_initialized_mask, then spinning and waiting for cpu_callout_mask.
My "part 2" test patch does another round in parallel, setting each
CPU's bit in 'cpu_callout_mask' and letting them run a bit further
through start_secondary() until they get to the end of smp_callin(),
where they set their bit in smp_callin_mask and (in my patch) wait for
their bit in a new cpu_finishup_mask to be set — which is what releases
them to proceed to completion in the final native_cpu_up() bringup.
So perhaps the BSP doesn't need to coordinate anything here, if we can
let the siblings work it out between themselves in the (now-)parallel
stage at the end of smp_callin()? And only set their bit in
smp_callin_mask when the microcode update is done?
Hm, maybe it's as simple as the first¹ thread on a core waiting for all
its siblings' bits in cpu_callin_mask to be set, and *then* doing the
update before setting its own bit?
¹ As long as we define "first" as the one with the lowest CPU#, which
means that the BSP won't release any of the siblings before it releases
the "first".
Then the siblings are just spinning on cpu_callin_mask anyway; they
don't need to do anything *more*.
Probably worth knocking it up and seeing how badly it explodes?
Download attachment "smime.p7s" of type "application/pkcs7-signature" (5965 bytes)
Powered by blists - more mailing lists