linux-kernel - Re: [PATCH] use x86 cpu park to speedup smp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5039f6178715dc4725a8c7f071dfd9ef5d70ae43.camel@infradead.org>
Date:   Tue, 16 Feb 2021 15:10:21 +0000
From:   David Woodhouse <dwmw2@...radead.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Andy Lutomirski <luto@...nel.org>,
        "shenkai (D)" <shenkai8@...wei.com>,
        "Schander, Johanna 'Mimoja' Amelie" <mimoja@...zon.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        X86 ML <x86@...nel.org>, "H. Peter Anvin" <hpa@...or.com>,
        hewenliang4@...wei.com, hushiyuan@...wei.com,
        luolongjun@...wei.com, hejingxian <hejingxian@...wei.com>
Subject: Re: [PATCH] use x86 cpu park to speedup smp_init in kexec situation

On Tue, 2021-02-16 at 13:53 +0000, David Woodhouse wrote:
> I threw it into my tree at
> https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/parallel
>
> It seems to work fairly nicely. The parallel SIPI seems to win be about
> a third of the bringup time on my 28-thread Haswell box. This is at the
> penultimate commit of the above branch:
> 
> [    0.307590] smp: Bringing up secondary CPUs ...
> [    0.307826] x86: Booting SMP configuration:
> [    0.307830] .... node  #0, CPUs:        #1  #2  #3  #4  #5  #6  #7  #8  #9 #10 #11 #12 #13 #14
> [    0.376677] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
> [    0.377177]  #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27
> [    0.402323] Brought CPUs online in 246691584 cycles
> [    0.402323] smp: Brought up 1 node, 28 CPUs
> 
> ... and this is the tip of the branch:
> 
> [    0.308332] smp: Bringing up secondary CPUs ...<dwmw2_gone> 
> [    0.308569] x86: Booting SMP configuration:
> [    0.308572] .... node  #0, CPUs:        #1  #2  #3  #4  #5  #6  #7  #8  #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27
> [    0.321120] Brought 28 CPUs to x86/cpu:kick in 34828752 cycles
> [    0.366663] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
> [    0.368749] Brought CPUs online in 124913032 cycles
> [    0.368749] smp: Brought up 1 node, 28 CPUs
> [    0.368749] smpboot: Max logical packages: 1
> [    0.368749] smpboot: Total of 28 processors activated (145259.85 BogoMIPS)
> 
> There's more to be gained here if we can fix up the next stage. Right
> now if I set every CPU's bit in cpu_initialized_mask to allow them to
> proceed from wait_for_master_cpu() through to the end of cpu_init() and
> onwards through start_secondary(), they all end up hitting
> check_tsc_sync_target() in parallel and it goes horridly wrong.

Actually it breaks before that, in rcu_cpu_starting(). A spinlock
around that, an atomic_t to let the APs do their TSC sync one at a time
(both in the above tree now), and I have a 75% saving on CPU bringup
time for my 28-thread Haswell:

[    0.307341] smp: Bringing up secondary CPUs ...
[    0.307576] x86: Booting SMP configuration:
[    0.307579] .... node  #0, CPUs:        #1  #2  #3  #4  #5  #6  #7  #8  #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27
[    0.320100] Brought 28 CPUs to x86/cpu:kick in 34645984 cycles
[    0.325032] Brought 28 CPUs to x86/cpu:wait-init in 12865752 cycles
[    0.326902] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
[    0.328739] Brought CPUs online in 11702224 cycles
[    0.328739] smp: Brought up 1 node, 28 CPUs
[    0.328739] smpboot: Max logical packages: 1
[    0.328739] smpboot: Total of 28 processors activated (145261.81 BogoMIPS)


Download attachment "smime.p7s" of type "application/x-pkcs7-signature" (5174 bytes)