[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190531181130.afwizqcwibm5dmml@treble>
Date: Fri, 31 May 2019 13:11:30 -0500
From: Josh Poimboeuf <jpoimboe@...hat.com>
To: Andy Lutomirski <luto@...nel.org>
Cc: Jiri Kosina <jikos@...nel.org>,
"Rafael J. Wysocki" <rjw@...ysocki.net>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
the arch/x86 maintainers <x86@...nel.org>,
Pavel Machek <pavel@....cz>, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>,
"H. Peter Anvin" <hpa@...or.com>,
Peter Zijlstra <peterz@...radead.org>,
Linux PM <linux-pm@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4] x86/power: Fix 'nosmt' vs. hibernation triple fault
during resume
On Fri, May 31, 2019 at 09:51:09AM -0700, Andy Lutomirski wrote:
> Just to clarify what I was thinking, it seems like soft-offlining a
> CPU and resuming a kernel have fundamentally different requirements.
> To soft-offline a CPU, we want to get power consumption as low as
> possible and make sure that MCE won't kill the system. It's okay for
> the CPU to occasionally execute some code. For resume, what we're
> really doing is trying to hand control of all CPUs from kernel A to
> kernel B. There are two basic ways to hand off control of a given
> CPU: we can jump (with JMP, RET, horrible self-modifying code, etc)
> from one kernel to the other, or we can attempt to make a given CPU
> stop executing code from either kernel at all and then forcibly wrench
> control of it in kernel B. Either approach seems okay, but the latter
> approach depends on getting the CPU to reliably stop executing code.
> We don't care about power consumption for resume, and I'm not even
> convinced that we need to be able to survive an MCE that happens while
> we're resuming, although surviving MCE would be nice.
I'd thought you were proposing a global improvement: we get rid of
mwait_play_dead() everywhere, i.e. all the time, not just for the resume
path.
Instead it sounds like you were proposing a local improvement to the
resume path, to continue doing what
hibernate_resume_nonboot_cpu_disable() is already doing, but use an INIT
IPI instead of HLT to make sure the CPU is completely dead.
That may be a theoretical improvement but we'd still need to do the
whole "wake and play dead" dance which Jiri's patch is doing for offline
CPUs. So Jiri's patch looks ok to me.
--
Josh
Powered by blists - more mailing lists