linux-kernel - Re: [PATCH v4] x86/power: Fix 'nosmt' vs. hibernation triple fault during resume

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <nycvar.YFH.7.76.1905312251350.1962@cbobk.fhfr.pm>
Date:   Fri, 31 May 2019 23:05:15 +0200 (CEST)
From:   Jiri Kosina <jikos@...nel.org>
To:     Andy Lutomirski <luto@...nel.org>
cc:     "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        the arch/x86 maintainers <x86@...nel.org>,
        Pavel Machek <pavel@....cz>, Ingo Molnar <mingo@...hat.com>,
        Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Linux PM <linux-pm@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4] x86/power: Fix 'nosmt' vs. hibernation triple fault
 during resume

On Fri, 31 May 2019, Andy Lutomirski wrote:

> The Intel SDM Vol 3 34.10 says:
> 
> If the HLT instruction is restarted, the processor will generate a
> memory access to fetch the HLT instruction (if it is
> not in the internal cache), and execute a HLT bus transaction. This
> behavior results in multiple HLT bus transactions
> for the same HLT instruction.

Which basically means that both hibernation and kexec have been broken in 
this respect for gazillions of years, and seems like noone noticed. Makes 
one wonder what the reason for that might be.

Either SDM is not precise and the refetch actually never happens for real 
(or is always in these cases satisfied from I$ perhaps?), or ... ?

So my patch basically puts things back where they have been for ages 
(while mwait is obviously much worse, as that gets woken up by the write 
to the monitored address, which inevitably does happen during resume), but 
seems like SDM is suggesting that we've been in a grey zone wrt RSM at 
least for all those ages.

So perhaps we really should ditch resume_play_dead() altogether 
eventually, and replace it with sending INIT IPI around instead (and then 
waking the CPUs properly via INIT INIT START). I'd still like to do that 
for 5.3 though, as that'd be slightly bigger surgery, and conservatively 
put things basically back to state they have been up to now for 5.2.

Thanks,

-- 
Jiri Kosina
SUSE Labs