[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110509172935.GD1963@in.ibm.com>
Date: Mon, 9 May 2011 22:59:35 +0530
From: "K.Prasad" <prasad@...ux.vnet.ibm.com>
To: Andi Kleen <andi@...stfloor.org>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
"Luck, Tony" <tony.luck@...el.com>,
Vivek Goyal <vgoyal@...hat.com>, kexec@...ts.infradead.org,
Srivatsa Vaddagiri <vatsa@...ibm.com>,
Ananth N Mavinakayanahalli <ananth@...ibm.com>
Subject: Re: [RFC] Kdump and memory error handling
On Wed, May 04, 2011 at 10:39:14PM +0200, Andi Kleen wrote:
> > Any thoughts/suggestions?
>
> My old attempts to solve this are
>
> Don't dump on MCE:
>
> http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=shortlog;h=refs/heads/mce/xpanic
>
The problem we seen in avoiding a panic->crash_kexec->[coredump capture] is
that the user may not have a means to know the reason for crash, unless
the serial console is connected to capture and store the panic string.
Alternatively a 'slim' kdump (as described here:
https://lkml.org/lkml/2011/5/4/396) would not contain meaningless data from
the old memory, but inform the user about the cause of the crash. I'm
intending to post some patches with a quick implementation of it soon.
> Handle dumps of corrupted memory regresions:
>
> http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=shortlog;h=refs/heads/mce/crashdump
>
> IMHO these patches are still the right solutions for this.
>
Like Vatsa had raised, the processor's behaviour upon reading (or any I/O
operation) the faulty memory location isn't clearly defined (to the
extent I read through System Programming Guide Part 1, Volume 3A,
Chapter 15). In such a scenario, disabling MCE for the kdump kernel (which can
potentially read the faulty memory) is making things hazy.
Thanks,
K.Prasad
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists