linux-kernel - Re: [PATCH] use x86 cpu park to speedup smp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f2a4d172-fa17-9f98-ad8f-d69f84ad0df5@huawei.com>
Date:   Wed, 16 Dec 2020 16:45:34 +0800
From:   "shenkai (D)" <shenkai8@...wei.com>
To:     Thomas Gleixner <tglx@...utronix.de>,
        Andy Lutomirski <luto@...nel.org>
CC:     LKML <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        X86 ML <x86@...nel.org>, "H. Peter Anvin" <hpa@...or.com>,
        <hewenliang4@...wei.com>, <hushiyuan@...wei.com>,
        <luolongjun@...wei.com>, <hejingxian@...wei.com>
Subject: Re: [PATCH] use x86 cpu park to speedup smp_init in kexec situation

在 2020/12/16 5:20, Thomas Gleixner 写道:
> On Tue, Dec 15 2020 at 08:31, Andy Lutomirski wrote:
>> On Tue, Dec 15, 2020 at 6:46 AM shenkai (D) <shenkai8@...wei.com> wrote:
>>> From: shenkai <shenkai8@...wei.com>
>>> Date: Tue, 15 Dec 2020 01:58:06 +0000
>>> Subject: [PATCH] use x86 cpu park to speedup smp_init in kexec situation
>>>
>>> In kexec reboot on x86 machine, APs will be halted and then waked up
>>> by the apic INIT and SIPI interrupt. Here we can let APs spin instead
>>> of being halted and boot APs by writing to specific address. In this way
>>> we can accelerate smp_init procedure for we don't need to pull APs up
>>> from a deep C-state.
>>>
>>> This is meaningful in many situations where users are sensitive to reboot
>>> time cost.
>> I like the concept.
> No. This is the wrong thing to do. We are not optimizing for _one_
> special case.
>
> We can optimize it for all operations where all the non boot CPUs have
> to brought up, be it cold boot, hibernation resume or kexec.
>
> Aside of that this is not a magic X86 special problem. Pretty much all
> architectures have the same issue and it can be solved very simple,
> which has been discussed before and I outlined the solution years ago,
> but nobody sat down and actually made it work.
>
> Since the rewrite of the CPU hotplug infrastructure to a state machine
> it's pretty obvious that the bringup of APs can changed from the fully
> serialized:
>
>       for_each_present_cpu(cpu) {
>       	if (!cpu_online(cpu))
>             cpu_up(cpu, CPUHP_ONLINE);
>       }
>
> to
>
>       for_each_present_cpu(cpu) {
>       	if (!cpu_online(cpu))
>             cpu_up(cpu, CPUHP_KICK_CPU);
>       }
>
>       for_each_present_cpu(cpu) {
>       	if (!cpu_active(cpu))
>             cpu_up(cpu, CPUHP_ONLINE);
>       }
>
> The CPUHP_KICK_CPU state does not exist today, but it's just the logical
> consequence of the state machine. It's basically splitting __cpu_up()
> into:
>
> __cpu_kick()
> {
>      prepare();
>      arch_kick_remote_cpu();     -> Send IPI/NMI, Firmware call .....
> }
>      
> __cpu_wait_online()
> {
>      wait_until_cpu_online();
>      do_further_stuff();
> }
>
> There is some more to it than just blindly splitting it up at the
> architecture level.
>
> All __cpu_up() implementations across arch/ have a lot of needlessly
> duplicated and pointlessly differently implemented code which can move
> completely into the core.
>
> So actually we want to split this further up:
>
>     CPUHP_PREPARE_CPU_UP:	Generic preparation step where all
>                                  the magic cruft which is duplicated
>                                  across architectures goes to
>
>     CPUHP_KICK_CPU:		Architecture specific prepare and kick
>
>     CPUHP_WAIT_ONLINE:           Generic wait function for CPU coming
>                                  online: wait_for_completion_timeout()
>                                  which releases the upcoming CPU and
>                                  invokes an optional arch_sync_cpu_up()
>                                  function which finalizes the bringup.
> and on the AP side:
>
>     CPU comes up, does all the low level setup, sets online, calls
>     complete() and the spinwaits for release.
>
> Once the control CPU comes out of the completion it releases the
> spinwait.
>
> That works for all bringup situations and not only for kexec and the
> simple trick is that by the time the last CPU has been kicked in the
> first step, the first kicked CPU is already spinwaiting for release.
>
> By the time the first kicked CPU has completed the process, i.e. reached
> the active state, then the next CPU is spinwaiting and so on.
>
> If you look at the provided time saving:
>
>     Mainline:		210ms
>     Patched:		 80ms
> -----------------------------
>     Delta                130ms
>
> i.e. it takes ~ 1.8ms to kick and wait for the AP to come up and ~ 1.1ms
> per CPU for the whole bringup. It does not completly add up, but it has
> a clear benefit for everything.
>
> Also the changelog says that the delay is related to CPUs in deep
> C-states. If CPUs are brought down for kexec then it's trivial enough to
> limit the C-states or just not use mwait() at all.
>
> It would be interesting to see the numbers just with play_dead() using
> hlt() or mwait(eax=0, 0) for the kexec case and no other change at all.
>
> Thanks,
>
>          tglx
>
Thanks for your and Andy's precious comments. I would like to take a try on

reconstructing this patch to make it more decent and generic.


Thanks again

Kai