linux-kernel - Re: [PATCH v4] x86/power: Fix 'nosmt' vs. hibernation triple fault during resume

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <B7AC83ED-3F11-42B9-8506-C842A5937B50@amacapital.net>
Date:   Fri, 31 May 2019 07:46:44 -0700
From:   Andy Lutomirski <luto@...capital.net>
To:     Jiri Kosina <jikos@...nel.org>
Cc:     Andy Lutomirski <luto@...nel.org>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        the arch/x86 maintainers <x86@...nel.org>,
        Pavel Machek <pavel@....cz>, Ingo Molnar <mingo@...hat.com>,
        Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Linux PM <linux-pm@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4] x86/power: Fix 'nosmt' vs. hibernation triple fault during resume

> On May 31, 2019, at 7:31 AM, Jiri Kosina <jikos@...nel.org> wrote:
> 
>> On Fri, 31 May 2019, Andy Lutomirski wrote:
>> 
>> 2. Put the CPU all the way to sleep by sending it an INIT IPI.
>> 
>> Version 2 seems very simple and robust.  Is there a reason we can't do
>> it?  We obviously don't want to do it for normal offline because it
>> might be a high-power state, but a cpu in the wait-for-SIPI state is
>> not going to exit that state all by itself.
>> 
>> The patch to implement #2 should be short and sweet as long as we are
>> careful to only put genuine APs to sleep like this.  The only downside
>> I can see is that an new kernel resuming and old kernel that was
>> booted with nosmt is going to waste power, but I don't think that's a
>> showstopper.
> 
> Well, if *that* is not an issue, than the original 3-liner that just 
> forces them to 'hlt' [1] would be good enough as well.
> 
> 

Seems okay to me as long as we’re confident we won’t get a spurious interrupt.

In general, I don’t think we’re ever suppose to rely on mwait *staying* asleep.  As I understand it, mwait can wake up whenever it wants, and the only real guarantee we have is that the CPU makes some effort to stay asleep until an interrupt is received or the monitor address is poked.

As a trivial example, if we are in a VM and we get scheduled out at any point between MONITOR and the eventual intentional wakeup, we’re toast. Same if we get an SMI due to bad luck or due to a thermal event happening shortly after pushing the power button to resume from hibernate.

For that matter, what actually happens if we get an SMI while halted?  Does RSM go directly to sleep or does it re-fetch the HLT?

It seems to me that we should just avoid the scenario where we have IP pointed to a bogus address and we just cross our fingers and hope the CPU doesn’t do anything.

I think that, as a short term fix, we should use HLT and, as a long term fix, we should either keep the CPU state fully valid or we should hard-offline the CPU using documented mechanisms, e.g. the WAIT-for-SIPI state.