linux-kernel - Re: [PATCH] use x86 cpu park to speedup smp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87im91sr6e.fsf@nanos.tec.linutronix.de>
Date:   Wed, 16 Dec 2020 16:31:53 +0100
From:   Thomas Gleixner <tglx@...utronix.de>
To:     "shenkai \(D\)" <shenkai8@...wei.com>,
        Andy Lutomirski <luto@...nel.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        X86 ML <x86@...nel.org>, "H. Peter Anvin" <hpa@...or.com>,
        hewenliang4@...wei.com, hushiyuan@...wei.com,
        luolongjun@...wei.com, hejingxian@...wei.com
Subject: Re: [PATCH] use x86 cpu park to speedup smp_init in kexec situation

Kai,

On Wed, Dec 16 2020 at 22:18, shenkai wrote:
> After some tests, the conclusion that time cost is from deep C-state 
> turns out to be wrong
>
> Sorry for that.

No problem.

> In kexec case, first let APs spinwait like what I did  in that patch,
> but wake APs up by sending apic INIT and SIPI  interrupts as normal
> procedure instead of writing to some address and there is no
> acceleration (time cost is still 210ms).

Ok.

> So can we say that the main time cost is from apic INIT and SIPI
> interrupts and the handling of them instead of deep C-state?

That's a fair conclusion.

> I didn't test with play_dead() because in kexec case, one new kernel
> will be started and APs can't be waken up by normal interrupts like in
> hibernate case for the irq vectors are gone with the old kernel.
>
> Or maybe I didn't get the point correctly?

Not exactly, but your experiment answered the question already.

My point was that the regular kexec unplugs the APs which then end up in
play_dead() and trying to use the deepest C-state via mwait(). So if the
overhead would be related to getting them out of a deep C-state then
forcing that play_dead() to use the HLT instruction or the most shallow
C-state with mwait() would have brought an improvement, right?

But obviously the C-state in which the APs are waiting is not really
relevant, as you demonstrated that the cost is due to INIT/SIPI even
with spinwait, which is what I suspected.

OTOH, the advantage of INIT/SIPI is that the AP comes up in a well known
state.

Thanks,

        tglx