[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160918183905.GB331@nazgul.tnic>
Date: Sun, 18 Sep 2016 20:39:05 +0200
From: Borislav Petkov <bp@...en8.de>
To: "Luck, Tony" <tony.luck@...el.com>
Cc: Yinghai Lu <yinghai@...nel.org>,
the arch/x86 maintainers <x86@...nel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
Yinghai Lu <yinghai.lu@...cle.com>
Subject: Re: [RFC PATCH] x86: Do not panic if mce=2 is passed
On Fri, Sep 16, 2016 at 08:28:44PM +0000, Luck, Tony wrote:
> > For UE recovery support, current we need mce=2 in command line
> > and also disable panic_on_oops with sysctl.
>
> Please explain. I've never given mce=2 on command line, and have
> had my kernel recover from thousands of (injected) UE memory errors.
So frankly, that panic_on_oops doesn't make a whole lotta sense to me.
It is promoting MCEs with severity MCE_UC_SEVERITY and higher to a
panic.
So let's look at those:
MCE_UC_SEVERITY, - we don't do anything special in the kernel for
those so just as well.
MCE_AR_SEVERITY, - those end up in the memory failure code if
they're memory errors
MCE_PANIC_SEVERITY, - causes panic
so if anything, panic_on_oops shouldn't control the panicking behavior
as tolerant does that already:
* Tolerant levels:
* 0: always panic on uncorrected errors, log corrected errors
* 1: panic or SIGBUS on uncorrected errors, log corrected errors
* 2: SIGBUS or log uncorrected errors (if possible), log corr. errors
* 3: never panic or SIGBUS, log all errors (for testing only)
IOW, I think that patch makes sense but please doublecheck my logic
above first.
Thanks.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
Powered by blists - more mailing lists