linux-kernel - Re: [PATCH v4] x86/power: Fix 'nosmt' vs. hibernation triple fault during resume

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190531181130.afwizqcwibm5dmml@treble>
Date:   Fri, 31 May 2019 13:11:30 -0500
From:   Josh Poimboeuf <jpoimboe@...hat.com>
To:     Andy Lutomirski <luto@...nel.org>
Cc:     Jiri Kosina <jikos@...nel.org>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        the arch/x86 maintainers <x86@...nel.org>,
        Pavel Machek <pavel@....cz>, Ingo Molnar <mingo@...hat.com>,
        Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Linux PM <linux-pm@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4] x86/power: Fix 'nosmt' vs. hibernation triple fault
 during resume

On Fri, May 31, 2019 at 09:51:09AM -0700, Andy Lutomirski wrote:
> Just to clarify what I was thinking, it seems like soft-offlining a
> CPU and resuming a kernel have fundamentally different requirements.
> To soft-offline a CPU, we want to get power consumption as low as
> possible and make sure that MCE won't kill the system.  It's okay for
> the CPU to occasionally execute some code.  For resume, what we're
> really doing is trying to hand control of all CPUs from kernel A to
> kernel B.  There are two basic ways to hand off control of a given
> CPU: we can jump (with JMP, RET, horrible self-modifying code, etc)
> from one kernel to the other, or we can attempt to make a given CPU
> stop executing code from either kernel at all and then forcibly wrench
> control of it in kernel B.  Either approach seems okay, but the latter
> approach depends on getting the CPU to reliably stop executing code.
> We don't care about power consumption for resume, and I'm not even
> convinced that we need to be able to survive an MCE that happens while
> we're resuming, although surviving MCE would be nice.

I'd thought you were proposing a global improvement: we get rid of
mwait_play_dead() everywhere, i.e. all the time, not just for the resume
path.

Instead it sounds like you were proposing a local improvement to the
resume path, to continue doing what
hibernate_resume_nonboot_cpu_disable() is already doing, but use an INIT
IPI instead of HLT to make sure the CPU is completely dead.

That may be a theoretical improvement but we'd still need to do the
whole "wake and play dead" dance which Jiri's patch is doing for offline
CPUs.  So Jiri's patch looks ok to me.

-- 
Josh