lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <B7AC83ED-3F11-42B9-8506-C842A5937B50@amacapital.net>
Date:   Fri, 31 May 2019 07:46:44 -0700
From:   Andy Lutomirski <luto@...capital.net>
To:     Jiri Kosina <jikos@...nel.org>
Cc:     Andy Lutomirski <luto@...nel.org>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        the arch/x86 maintainers <x86@...nel.org>,
        Pavel Machek <pavel@....cz>, Ingo Molnar <mingo@...hat.com>,
        Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Linux PM <linux-pm@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4] x86/power: Fix 'nosmt' vs. hibernation triple fault during resume



> On May 31, 2019, at 7:31 AM, Jiri Kosina <jikos@...nel.org> wrote:
> 
>> On Fri, 31 May 2019, Andy Lutomirski wrote:
>> 
>> 2. Put the CPU all the way to sleep by sending it an INIT IPI.
>> 
>> Version 2 seems very simple and robust.  Is there a reason we can't do
>> it?  We obviously don't want to do it for normal offline because it
>> might be a high-power state, but a cpu in the wait-for-SIPI state is
>> not going to exit that state all by itself.
>> 
>> The patch to implement #2 should be short and sweet as long as we are
>> careful to only put genuine APs to sleep like this.  The only downside
>> I can see is that an new kernel resuming and old kernel that was
>> booted with nosmt is going to waste power, but I don't think that's a
>> showstopper.
> 
> Well, if *that* is not an issue, than the original 3-liner that just 
> forces them to 'hlt' [1] would be good enough as well.
> 
> 

Seems okay to me as long as we’re confident we won’t get a spurious interrupt.

In general, I don’t think we’re ever suppose to rely on mwait *staying* asleep.  As I understand it, mwait can wake up whenever it wants, and the only real guarantee we have is that the CPU makes some effort to stay asleep until an interrupt is received or the monitor address is poked.

As a trivial example, if we are in a VM and we get scheduled out at any point between MONITOR and the eventual intentional wakeup, we’re toast. Same if we get an SMI due to bad luck or due to a thermal event happening shortly after pushing the power button to resume from hibernate.

For that matter, what actually happens if we get an SMI while halted?  Does RSM go directly to sleep or does it re-fetch the HLT?

It seems to me that we should just avoid the scenario where we have IP pointed to a bogus address and we just cross our fingers and hope the CPU doesn’t do anything.

I think that, as a short term fix, we should use HLT and, as a long term fix, we should either keep the CPU state fully valid or we should hard-offline the CPU using documented mechanisms, e.g. the WAIT-for-SIPI state.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ