[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3908561D78D1C84285E8C5FCA982C28F31D45B22@ORSMSX106.amr.corp.intel.com>
Date: Mon, 21 Oct 2013 17:14:05 +0000
From: "Luck, Tony" <tony.luck@...el.com>
To: "Naveen N. Rao" <naveen.n.rao@...ux.vnet.ibm.com>,
"bp@...en8.de" <bp@...en8.de>, "joe@...ches.com" <joe@...ches.com>,
"m.chehab@...sung.com" <m.chehab@...sung.com>,
"arozansk@...hat.com" <arozansk@...hat.com>,
"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Chen Gong <gong.chen@...ux.intel.com>
Subject: RE: [PATCH v3 8/9] ACPI, APEI, CPER: Cleanup CPER memory error
output format
>>>> + if (severity != CPER_SEV_FATAL)
>>>
>>> Shouldn't this just be (severity == CPER_SEV_CORRECTED)?
>> IMO, only fatal error can't be handlered gracefully in current
>> kernel plus H/W. Once it can be recovered by H/W and OS, we
>> can call it recovered.
> Sure, but we don't recover in all scenarios. So, calling it corrected
> seems incorrect to me.
Even if we recovered from a UC error (which is by no means a sure
thing) ... I don't think the "requires no further action" message applies.
Soft single bit errors are common (well, common-ish ... they should still
be somewhat rare by most objective standard). Double bit errors are
much rarer ... and are very unlikely to be the result of two single bit errors
happening to be inside the same cache line. I'd recommend further investigation
of the source of a UC error (even one that is "recovered" in software).
-Tony
Powered by blists - more mailing lists