[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <772ACE2A-FD8B-492F-960E-981ECC72E283@amacapital.net>
Date: Wed, 19 Feb 2020 14:48:34 -0800
From: Andy Lutomirski <luto@...capital.net>
To: "Luck, Tony" <tony.luck@...el.com>
Cc: Andy Lutomirski <luto@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Borislav Petkov <bp@...en8.de>,
LKML <linux-kernel@...r.kernel.org>,
linux-arch <linux-arch@...r.kernel.org>,
Steven Rostedt <rostedt@...dmis.org>,
Ingo Molnar <mingo@...nel.org>,
Joel Fernandes <joel@...lfernandes.org>,
Greg KH <gregkh@...uxfoundation.org>,
"gustavo@...eddedor.com" <gustavo@...eddedor.com>,
Thomas Gleixner <tglx@...utronix.de>,
"paulmck@...nel.org" <paulmck@...nel.org>,
Josh Triplett <josh@...htriplett.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Lai Jiangshan <jiangshanlai@...il.com>,
Frederic Weisbecker <frederic@...nel.org>,
Dan Carpenter <dan.carpenter@...cle.com>,
Masami Hiramatsu <mhiramat@...nel.org>
Subject: Re: [PATCH v3 02/22] x86,mce: Delete ist_begin_non_atomic()
> On Feb 19, 2020, at 2:33 PM, Luck, Tony <tony.luck@...el.com> wrote:
>
>
>>
>> One big question here: are memory failure #MC exceptions synchronous
>> or can they be delayed? If we get a memory failure, is it possible
>> that the #MC hits some random context and not the actual context where
>> the error occurred?
>
> There are a few cases:
> 1) SRAO (Software recoverable action optional) [Patrol scrub or L3 cache eviction]
> These aren't synchronous with any core execution. Using machine check to signal
> was probably a mistake - compounded by it being broadcast :-( Could pick any CPU
> to handle (actually choose the first to arrive in do_machine_check()). That guy should
> arrange to soft offline the affected page. Every CPU can return to what they were doing
> before.
You could handle this by sending IPI-to-self and dealing with it in the interrupt handler. Or even wake a high-priority kthread or workqueue. irq_work may help. Relying on task_work or the non_atomic stuff seems silly - you can’t rely on anything about the interrupted context, and the context is more or less irrelevant anyway.
>
> 2) SRAR (Software recoverable action required)
> These are synchronous. Starting with Skylake they may be signaled just to the thread
> that hit the poison. Earlier generations broadcast.
Here’s where dealing with one that came from kernel code is just nasty, right?
I would argue that, if IF=0, killing the machine is reasonable. If IF=1, we should be okay. Actually making this work sanely is gross, and arguably the goal should be minimizing grossness.
Perhaps, if we came from kernel mode, we should IPI-to-self and use a special vector that is idtentry, not apicinterrupt. Or maybe even do this for entries from usermode just to keep everything consistent.
> 2a) Hit in ring3 code ... we want to offline the page and SIGBUS the task(s)
> 2b) Memcpy_mcsafe() ... kernel has a recovery path. "Return" to the recovery code instead of to the original RIP.
> 2c) copy_from_user ... not implemented yet. We are in kernel, but would like to treat this like case 2a
>
> 3) Fatal
> Always broadcast. Some bank has MCi_STATUS.PCC==1. System must be shutdown.
Easy :)
It would be really, really nice if NMI was masked in MCE context.
>
> -Tony
Powered by blists - more mailing lists