lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 8 Aug 2016 15:39:44 +0000
From:	york sun <york.sun@....com>
To:	Borislav Petkov <bp@...en8.de>
CC:	"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
	"morbidrsa@...il.com" <morbidrsa@...il.com>,
	"oss@...error.net" <oss@...error.net>,
	Stuart Yoder <stuart.yoder@....com>,
	Doug Thompson <dougthompson@...ssion.com>,
	"mchehab@...nel.org" <mchehab@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [Patch v3 03/11] driver/edac/mpc85xx_edac: Drop setting/clearing
 RFXE bit in HID1

On 08/08/2016 12:11 AM, Borislav Petkov wrote:
> On Thu, Aug 04, 2016 at 03:58:28PM -0700, York Sun wrote:
>> On e500v1, read fault exception enable (RFXE) controls whether
>> assertion of core_fault_in causes a machine check interrupt.
>> Assertion of core_fault_in can result from uncorrectable data
>> error, such as  an L2 multibit ECC error. It can also occur from
>> a system error if logic on the integrated device signals a fault
>> for nonfatal errors. RFXE bit is cleared out of reset, and should
>> be left clear for normal operation. Assertion of core_fault_in does
>> not cause a machine check.
>>
>> RFXE is set specifically for RIO (Rapid IO) and PCI for book E to
>> catch the errors by machine check. With this bit set, EDAC driver
>> can't get the interrupt in case of uncorrectable error. So this
>> bit is cleared in favor of EDAC. However, the benefit of catching
>> such uncorrectable error doesn't outweight the other errors which
>> may hang the system. Beside, e500v2 has different errors maksed
>> by RFXE, and e500mc doesn't support this bit. It is more reasonable
>> to leave RFXE as is in EDAC driver, and leave the uncorrectable
>> errors triggering machine check for e500v1.
>
> Very nice, thanks for expanding it!
>
> Two final remarks:
>
> - please use a spell checker
>
> - now, what happens if you leave RFXE clear and mpc85xx_edac gets the
> error? Is it going to do proper error handling of the uncorrectable
> error or are we better off handling the error in the #MC interrupt
> handler?
>
> IOW, is mpc85xx_edac well equipped to handle those multibit errors or
> should we leave the current setting as is?
>

RFXE is cleared by default. So for most SoCs, this is not even a concern 
at all. But for e500v1, when RIO or PCI are used, this bit is set 
specifically to catch an error by machine check (see commit 4e0e3435). 
This is not the uncorrectable error from DDR. We will be better off to 
let this error happen.

York

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ