[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090616205449.GA4858@sgi.com>
Date: Tue, 16 Jun 2009 15:54:49 -0500
From: Russ Anderson <rja@....com>
To: "H. Peter Anvin" <hpa@...or.com>
Cc: Andi Kleen <andi@...stfloor.org>,
Alan Cox <alan@...rguk.ukuu.org.uk>,
Hugh Dickins <hugh.dickins@...cali.co.uk>,
Wu Fengguang <fengguang.wu@...el.com>,
Balbir Singh <balbir@...ux.vnet.ibm.com>,
Andrew Morton <akpm@...ux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...e.hu>,
Mel Gorman <mel@....ul.ie>,
Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Nick Piggin <npiggin@...e.de>,
"riel@...hat.com" <riel@...hat.com>,
"chris.mason@...cle.com" <chris.mason@...cle.com>,
"linux-mm@...ck.org" <linux-mm@...ck.org>, rja@....com
Subject: Re: [PATCH 00/22] HWPOISON: Intro (v5)
On Tue, Jun 16, 2009 at 01:28:54PM -0700, H. Peter Anvin wrote:
> Russ Anderson wrote:
> > On Mon, Jun 15, 2009 at 03:29:34PM +0200, Andi Kleen wrote:
> >> I think you're wrong about killing processes decreasing
> >> reliability. Traditionally we always tried to keep things running if possible
> >> instead of panicing.
> >
> > Customers love the ia64 feature of killing a user process instead of
> > panicing the system when a user process hits a memory uncorrectable
> > error. Avoiding a system panic is a very good thing.
>
> Sometimes (sometimes it's a very bad thing.)
>
> However, the more fundamental thing is that it is always trivial to
> promote an error to a higher severity; the opposite is not true. As
> such, it becomes an administrator-set policy, which is what it needs to be.
Good point. On ia64 the recovery code is implemented as a kernel
loadable module. Installing the module turns on the feature.
That is handy for customer demos. Install the module, inject a
memory error, have an application read the bad data and get killed.
Repeat a few times. Then uninstall the module, inject a
memory error, have an application read the bad data and watch
the system panic.
Then it is the customer's choice to have it on or off.
--
Russ Anderson, OS RAS/Partitioning Project Lead
SGI - Silicon Graphics Inc rja@....com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists