lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 16 Dec 2020 16:31:53 +0100 From: Thomas Gleixner <tglx@...utronix.de> To: "shenkai \(D\)" <shenkai8@...wei.com>, Andy Lutomirski <luto@...nel.org> Cc: LKML <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, X86 ML <x86@...nel.org>, "H. Peter Anvin" <hpa@...or.com>, hewenliang4@...wei.com, hushiyuan@...wei.com, luolongjun@...wei.com, hejingxian@...wei.com Subject: Re: [PATCH] use x86 cpu park to speedup smp_init in kexec situation Kai, On Wed, Dec 16 2020 at 22:18, shenkai wrote: > After some tests, the conclusion that time cost is from deep C-state > turns out to be wrong > > Sorry for that. No problem. > In kexec case, first let APs spinwait like what I didĀ in that patch, > but wake APs up by sending apic INIT and SIPIĀ interrupts as normal > procedure instead of writing to some address and there is no > acceleration (time cost is still 210ms). Ok. > So can we say that the main time cost is from apic INIT and SIPI > interrupts and the handling of them instead of deep C-state? That's a fair conclusion. > I didn't test with play_dead() because in kexec case, one new kernel > will be started and APs can't be waken up by normal interrupts like in > hibernate case for the irq vectors are gone with the old kernel. > > Or maybe I didn't get the point correctly? Not exactly, but your experiment answered the question already. My point was that the regular kexec unplugs the APs which then end up in play_dead() and trying to use the deepest C-state via mwait(). So if the overhead would be related to getting them out of a deep C-state then forcing that play_dead() to use the HLT instruction or the most shallow C-state with mwait() would have brought an improvement, right? But obviously the C-state in which the APs are waiting is not really relevant, as you demonstrated that the cost is due to INIT/SIPI even with spinwait, which is what I suspected. OTOH, the advantage of INIT/SIPI is that the AP comes up in a well known state. Thanks, tglx
Powered by blists - more mailing lists