[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1492810703.2738.27.camel@intel.com>
Date: Fri, 21 Apr 2017 21:39:45 +0000
From: "Verma, Vishal L" <vishal.l.verma@...el.com>
To: "Luck, Tony" <tony.luck@...el.com>, "bp@...e.de" <bp@...e.de>
CC: "tglx@...utronix.de" <tglx@...utronix.de>,
"Williams, Dan J" <dan.j.williams@...el.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"ross.zwisler@...ux.intel.com" <ross.zwisler@...ux.intel.com>,
"x86@...nel.org" <x86@...nel.org>,
"linux-nvdimm@...1.01.org" <linux-nvdimm@...1.01.org>
Subject: Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking'
from 'atomic'
On Thu, 2017-04-13 at 13:31 +0200, Borislav Petkov wrote:
> On Thu, Apr 13, 2017 at 12:29:25AM +0200, Borislav Petkov wrote:
> > On Wed, Apr 12, 2017 at 03:26:19PM -0700, Luck, Tony wrote:
> > > We can futz with that and have them specify which chain (or both)
> > > that they want to be added to.
> >
> > Well, I didn't want the atomic chain to be a notifier because we can
> > keep it simple and non-blocking. Only the process context one will
> > be.
> >
> > So the question is, do we even have a use case for outside consumers
> > hanging on the atomic chain? Because if not, we're good to go.
>
> Ok, new day, new patch.
>
> Below is what we could do: we don't call the notifier at all on the
> atomic path but only print the MCEs. We do log them and if the machine
> survives, we process them accordingly. This is only a fix for upstream
> so that the current issue at hand is addressed.
>
> For later, we'd need to split the paths in:
>
> critical_print_mce()
>
> or somesuch which immediately dumps the MCE to dmesg, and
>
> mce_log()
>
> which does the slow path of logging MCEs and calling the blocking
> notifier.
>
> Now, I'd want to have decoding of the MCE on the critical path too so
> I have to think about how to do that nicely. Maybe move the decoding
> bits which are the same between Intel and AMD in mce.c and have some
> vendor-specific, fast calls. We'll see. Btw, this is something Ingo
> has
> been mentioning for a while.
>
> Anyway, here's just the urgent fix for now.
>
> Thanks.
>
> ---
> From: Vishal Verma <vishal.l.verma@...el.com>
> Date: Tue, 11 Apr 2017 16:44:57 -0600
> Subject: [PATCH] x86/mce: Make the MCE notifier a blocking one
>
> The NFIT MCE handler callback (for handling media errors on NVDIMMs)
> takes a mutex to add the location of a memory error to a list. But
> since
> the notifier call chain for machine checks (x86_mce_decoder_chain) is
> atomic, we get a lockdep splat like:
>
> BUG: sleeping function called from invalid context at
> kernel/locking/mutex.c:620
> in_atomic(): 1, irqs_disabled(): 0, pid: 4, name: kworker/0:0
> [..]
> Call Trace:
> dump_stack
> ___might_sleep
> __might_sleep
> mutex_lock_nested
> ? __lock_acquire
> nfit_handle_mce
> notifier_call_chain
> atomic_notifier_call_chain
> ? atomic_notifier_call_chain
> mce_gen_pool_process
>
> Convert the notifier to a blocking one which gets to run only in
> process
> context.
>
> Boris: remove the notifier call in atomic context in print_mce(). For
> now, let's print the MCE on the atomic path so that we can make sure
> it
> goes out. We still log it for process context later.
>
> Reported-by: Ross Zwisler <ross.zwisler@...ux.intel.com>
> Signed-off-by: Vishal Verma <vishal.l.verma@...el.com>
> Cc: Tony Luck <tony.luck@...el.com>
> Cc: Dan Williams <dan.j.williams@...el.com>
> Cc: linux-edac <linux-edac@...r.kernel.org>
> Cc: x86-ml <x86@...nel.org>
> Cc: <stable@...r.kernel.org>
> Link: http://lkml.kernel.org/r/20170411224457.24777-1-vishal.l.verma@i
> ntel.com
> Fixes: 6839a6d96f4e ("nfit: do an ARS scrub on hitting a latent media
> error")
> Signed-off-by: Borislav Petkov <bp@...e.de>
> ---
> arch/x86/kernel/cpu/mcheck/mce-genpool.c | 2 +-
> arch/x86/kernel/cpu/mcheck/mce-internal.h | 2 +-
> arch/x86/kernel/cpu/mcheck/mce.c | 18 ++++--------------
> 3 files changed, 6 insertions(+), 16 deletions(-)
>
I noticed this patch was picked up in tip, in ras/urgent, but didn't see
a pull request for 4.11 - was this the intention? Or will it just be
added for 4.12?
-Vishal
Powered by blists - more mailing lists