[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110413022402.GA31652@sgi.com>
Date: Tue, 12 Apr 2011 21:24:03 -0500
From: Russ Anderson <rja@....com>
To: "Luck, Tony" <tony.luck@...el.com>
Cc: Borislav Petkov <bp@...en8.de>,
Prarit Bhargava <prarit@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"dzickus@...hat.com" <dzickus@...hat.com>,
"mstowe@...hat.com" <mstowe@...hat.com>,
"dnelson@...hat.com" <dnelson@...hat.com>, rja@...ricas.sgi.com
Subject: Re: [PATCH]: mce: don't print "human readable" message for corrected errors
On Tue, Apr 12, 2011 at 01:02:21PM -0700, Luck, Tony wrote:
> > Why not? This way you turn reporting of _ALL_ correctable MCEs
> > completely off and some users would actually like to run them through
> > mcelog on Intel.
>
> pr_emerg() is rather overkill for a corrected error - on large systems
> corrected errors are going to be a routine occurrence (my personal estimation
> is "one soft error per gigabyte per month" ... which is pretty much the
> same as "one per terabyte per hour" for the people with the really cool
> toys.
Good point.
> We are also setting TAINT_MACHINE_CHECK for corrected errors - perhaps
> this made sense when systems were small and machine checks were rare and
> scary. But I think we need to start working with the reality that
> corrected errors are normal events.
I agree. Corrected errors - by definition - have hardware corrected data.
There is no corruption so there is no reason for kernel taint. It would
be like setting taint when one hard drive of a RAID file system goes bad.
It's worth noting that linux does not set taint when it recovers from
_uncorrected_ memory errors on IA64 (by killing the application
that consumed the bad data and discarding the bad page). Modern hardware
has enough error detection/correction code to avoid undetected data
corruption from memory errors.
--
Russ Anderson, OS RAS/Partitioning Project Lead
SGI - Silicon Graphics Inc rja@....com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists