linux-kernel - Re: [PATCH v4] x86/power: Fix 'nosmt' vs. hibernation triple fault during resume

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrUQzZTRnvmOS09UvRM9UCGEDvSdbJtkeeEa2foMf+hF2w@mail.gmail.com>
Date:   Fri, 31 May 2019 09:23:41 -0700
From:   Andy Lutomirski <luto@...nel.org>
To:     Jiri Kosina <jikos@...nel.org>
Cc:     Andy Lutomirski <luto@...nel.org>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        "the arch/x86 maintainers" <x86@...nel.org>,
        Pavel Machek <pavel@....cz>, Ingo Molnar <mingo@...hat.com>,
        Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Linux PM <linux-pm@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4] x86/power: Fix 'nosmt' vs. hibernation triple fault
 during resume

On Fri, May 31, 2019 at 7:54 AM Jiri Kosina <jikos@...nel.org> wrote:
>
> On Fri, 31 May 2019, Andy Lutomirski wrote:
>
> > For that matter, what actually happens if we get an SMI while halted?
> > Does RSM go directly to sleep or does it re-fetch the HLT?
>
> Our mails just crossed, I replied to Josh's mwait() proposal patch a
> minute ago.
>
> HLT is guaranteed to be re-entered if SMM interrupted it, while MWAIT is
> not.
>
> So as a short-term fix for 5.2, I still believe in v4 of my patch that
> does the mwait->hlt->mwait transition across hibernate/resume, and for 5.3
> I can look into forcing it to wait-for-SIPI proper.
>

How sure are you?  From http://www.rcollins.org/ddj/Mar97/Mar97.html I see:

In general, the AutoHALT field directs the microprocessor whether or
not to restart the HLT instruction upon exit of SMM. This is
accomplished by decrementing EIP and executing whatever instruction
resides at that position. AutoHALT restart behavior is consistent,
regardless of whether or not EIP-1 contains a HLT instruction. If the
SMM handler set Auto HALT[bit0]=1 when the interrupted instruction was
not a HLT instruction(AutoHALT[bit0]= 0 upon entrance), they would run
the risk of resuming execution at an undesired location. The RSM
microcode doesn’t know the length of the interrupted instruction.
Therefore when AutoHALT[bit0]=1 upon exit, the RSM microcode blindly
decrements the EIP register by 1 and resumes execution. This explains
why Intel warns that unpredictable behavior may result from setting
this field to restart a HLT instruction when the microprocessor wasn’t
in a HALT state upon entrance. Listing One presents an algorithm that
describes the AutoHALT Restart feature.

The AMD APM says:

If the return from SMM takes the processor back to the halt state, the
HLT instruction is not re-
executed. However, the halt special bus-cycle is driven on the
processor bus after the RSM instruction
executes.

The Intel SDM Vol 3 34.10 says:

If the HLT instruction is restarted, the processor will generate a
memory access to fetch the HLT instruction (if it is
not in the internal cache), and execute a HLT bus transaction. This
behavior results in multiple HLT bus transactions
for the same HLT instruction.

And the SDM is very clear that one should *not* do RSM with auto-halt
set if the instruction that was interrupted wasn't HLT.

It sounds to me like Intel does not architecturally guarantee that
it's okay to do HLT from one CPU and then remove the HLT instruction
out from under it on another CPU.