lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 22 Feb 2023 10:11:05 +0000
From:   David Woodhouse <dwmw2@...radead.org>
To:     Usama Arif <usama.arif@...edance.com>, tglx@...utronix.de,
        kim.phillips@....com
Cc:     arjan@...ux.intel.com, mingo@...hat.com, bp@...en8.de,
        dave.hansen@...ux.intel.com, hpa@...or.com, x86@...nel.org,
        pbonzini@...hat.com, paulmck@...nel.org,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        rcu@...r.kernel.org, mimoja@...oja.de, hewenliang4@...wei.com,
        thomas.lendacky@....com, seanjc@...gle.com, pmenzel@...gen.mpg.de,
        fam.zheng@...edance.com, punit.agrawal@...edance.com,
        simon.evans@...edance.com, liangma@...ngbit.com
Subject: Re: [PATCH v9 0/8] Parallel CPU bringup for x86_64

On Wed, 2023-02-15 at 14:54 +0000, Usama Arif wrote:
> The main change over v8 is dropping the patch to avoid repeated saves of MTRR
> at boot time. It didn't make a difference to smpboot time and is independent
> of parallel CPU bringup, so if needed can be explored in a separate patchset.
> 
> The patches have also been rebased to v6.2-rc8 and retested and the
> improvement in boot time is the same as v8.

Thanks for picking this up, Usama.

So the next thing that might be worth looking at is allowing the APs
all to be running their hotplug thread simultaneously, bringing
themselves from CPUHP_BRINGUP_CPU to CPUHP_AP_ONLINE. This series eats
the initial INIT/SIPI/SIPI latency, but if there's any significant time
in the AP hotplug thread, that could be worth parallelising.

There may be further wins in the INIT/SIPI/SIPI too. Currently we
process each CPU at a time, sending INIT, SIPI, waiting 10µs and
sending another SIPI.

What if we sent the first INIT+SIPI to all CPUs, then did another pass
sending another SIPI only to those which hadn't already started running
and set their bit in cpu_initialized_mask ? 

Might not be worth it, and there's an added complexity that they all
have to wait for each other (on the real mode trampoline lock) before
they can take their turn and get as far as setting their bit in
cpu_initialized_mask. So we'd probably end up sending the second SIPI
to most of them *anyway*.

Download attachment "smime.p7s" of type "application/pkcs7-signature" (5965 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ