lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f962eb83-da13-a5de-9f06-b1b987f1e621@arm.com>
Date:   Tue, 25 Aug 2020 14:18:09 +0100
From:   James Morse <james.morse@....com>
To:     Alison Wang <alison.wang@....com>, "bp@...en8.de" <bp@...en8.de>,
        "tony.luck@...el.com" <tony.luck@...el.com>
Cc:     "mchehab@...nel.org" <mchehab@...nel.org>,
        "rrichter@...vell.com" <rrichter@...vell.com>,
        "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [EXT] Re: [PATCH] edac: nxp: Add L1 and L2 error detection for
 A53 and A72 cores

Hi Alison,

On 25/08/2020 03:31, Alison Wang wrote:
>> On 09/07/2020 09:22, Alison Wang wrote:
>>> Add error detection for A53 and A72 cores. Hardware error injection is
>>> supported on A53. Software error injection is supported on both.
>>
> <snip>
>>
>> As we can't safely write to these registers from linux, so I think this means all
>> the error injection and maybe SMC stuff can disappear.

> I agreed with your opinion that CPUACTLR_EL1 and L2ACTLR can't be written in Linux.

Well, we can't do what the TRM tells us we must before writing to that register.


> So the error injection can't be done in Linux. Do you mean the error injection can
> only be done in firmware before Linux boots up? If so, the system is running with error
> injection enabled all the time, it may be not a good idea too. Any suggestion?

These registers are expected to have one value, forever. The errata document sometimes
tells us to to set or clear one of these bits to workaround an issue. Because they can
only be written to when the system is idle, typically during boot, this is firmware's
responsibility.

I expect firmware to set the bits in ACTLR_EL3, to prevent lower exception levels from
touching any of these registers.


I don't know how the error injection on A53 or A72 works, so I don't know if you can leave
it enabled all the time. The bit you are setting is described as RES0 by the A53 and A72
TRMs. I suspect I had the wrong TRM open, as my 'L1DEIEN' comment seems to be what your
CPUACTLR_EL1[6] is called on A35. (35, 53? Guess how that happened!)

A35's error injection says:
| While this bit is set, double-bit errors are injected on all writes to the L1 D-cache
| data RAMs for the first word of each 32-byte region.

You certainly can't leave this sort of thing enabled! And you can't change it at runtime,
so we can't use it.


I think features like this are intended to be used to check the integration, not to test
the software.


After I sent the original comments on this, I found Sascha's version, which has these
issues resolved:
https://lore.kernel.org/linux-arm-kernel/20200813075721.27981-1-s.hauer@pengutronix.de/

I think this version should work on your platform too.


Thanks,

James

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ