linux-kernel - Re: kexec reboot fails with extra wbinvd introduced for AME SME

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d60aa430-e9ce-d9f1-d66c-977a3559e338@amd.com>
Date:   Wed, 17 Jan 2018 16:53:32 -0600
From:   Tom Lendacky <thomas.lendacky@....com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Dave Young <dyoung@...hat.com>
Cc:     Yu Chen <yu.c.chen@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Juergen Gross <jgross@...e.com>,
        Tony Luck <tony.luck@...el.com>,
        Boris Ostrovsky <boris.ostrovsky@...cle.com>,
        Borislav Petkov <bp@...en8.de>,
        Rui Zhang <rui.zhang@...el.com>,
        Arjan van de Ven <arjan@...ux.intel.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Ingo Molnar <mingo@...nel.org>,
        Kexec Mailing List <kexec@...ts.infradead.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        ebiederm@...hat.com, Baoquan He <bhe@...hat.com>
Subject: Re: kexec reboot fails with extra wbinvd introduced for AME SME

On 1/17/2018 2:01 PM, Tom Lendacky wrote:
> On 1/17/2018 1:42 PM, Linus Torvalds wrote:
>> On Tue, Jan 16, 2018 at 11:22 PM, Dave Young <dyoung@...hat.com> wrote:
>>>
>>> For the kexec reboot hang, if I remove the wbinvd in stop_this_cpu()
>>> then kexec works fine. like this:
>>
>> Honestly, I think we should apply that patch regardless.
>>
>> Using 'wbinvd' should not be some "just because of random reasons".
>> There are CPU's with errata on wbinvd, and the thing in general is
>> slow and nasty.
>>
>> Doing the wbinvd in a loop sounds even stranger.
>>
>> If we're only doing it because of some SME issue, why isn't it
>> dependent on SME? And why is it inside that loop at all?
> 
> My original patches did check for X86_FEATURE_SME and only do the
> wbinvd if SME was supported (although still in the loop).  The general
> consensus was to just do the wbinvd no matter what and so it is as it is
> today.
> 
> It can probably be outside of the loop.  The issue I was seeing was
> memory corruption from the stack when using halt() with paravirt ops
> enabled.  So a native_halt() should be used.
> 
>>
>> Anyway, does it work for you if you just do the wbinvd() once, outside
>> the loop? Admittedly the loop shouldn't actually loop (hlt with
>> interrupts disabled), but who the hell knows.. Some of the errata
>> around SME have been about machine check exceptions or something.
> 
> I think that should work as long as it's a native_wbinvd() call and it
> can also be conditional on boot_cpu_has(X86_FEATURE_SME).
> 
> I'll do some testing.

Looks like everything is good with the suggested changes.  Patch to follow
shortly.

Thanks,
Tom

> 
> Thanks,
> Tom
> 
>>
>> See commit a68e5c94f7d3 ("x86, hotplug: Move WBINVD back outside the
>> play_dead loop") for another example where wbinvd was inside a loop
>> and apparently caused some odd issues.
>>
>>               Linus
>>