linux-kernel - Re: [EXT] Re: [PATCH] edac: nxp: Add L1 and L2 error detection for A53 and A72 cores

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f962eb83-da13-a5de-9f06-b1b987f1e621@arm.com>
Date:   Tue, 25 Aug 2020 14:18:09 +0100
From:   James Morse <james.morse@....com>
To:     Alison Wang <alison.wang@....com>, "bp@...en8.de" <bp@...en8.de>,
        "tony.luck@...el.com" <tony.luck@...el.com>
Cc:     "mchehab@...nel.org" <mchehab@...nel.org>,
        "rrichter@...vell.com" <rrichter@...vell.com>,
        "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [EXT] Re: [PATCH] edac: nxp: Add L1 and L2 error detection for
 A53 and A72 cores

Hi Alison,

On 25/08/2020 03:31, Alison Wang wrote:
>> On 09/07/2020 09:22, Alison Wang wrote:
>>> Add error detection for A53 and A72 cores. Hardware error injection is
>>> supported on A53. Software error injection is supported on both.
>>
> <snip>
>>
>> As we can't safely write to these registers from linux, so I think this means all
>> the error injection and maybe SMC stuff can disappear.

> I agreed with your opinion that CPUACTLR_EL1 and L2ACTLR can't be written in Linux.

Well, we can't do what the TRM tells us we must before writing to that register.

> So the error injection can't be done in Linux. Do you mean the error injection can
> only be done in firmware before Linux boots up? If so, the system is running with error
> injection enabled all the time, it may be not a good idea too. Any suggestion?

These registers are expected to have one value, forever. The errata document sometimes
tells us to to set or clear one of these bits to workaround an issue. Because they can
only be written to when the system is idle, typically during boot, this is firmware's
responsibility.

I expect firmware to set the bits in ACTLR_EL3, to prevent lower exception levels from
touching any of these registers.

I don't know how the error injection on A53 or A72 works, so I don't know if you can leave
it enabled all the time. The bit you are setting is described as RES0 by the A53 and A72
TRMs. I suspect I had the wrong TRM open, as my 'L1DEIEN' comment seems to be what your
CPUACTLR_EL1[6] is called on A35. (35, 53? Guess how that happened!)

A35's error injection says:
| While this bit is set, double-bit errors are injected on all writes to the L1 D-cache
| data RAMs for the first word of each 32-byte region.

You certainly can't leave this sort of thing enabled! And you can't change it at runtime,
so we can't use it.

I think features like this are intended to be used to check the integration, not to test
the software.

After I sent the original comments on this, I found Sascha's version, which has these
issues resolved:
https://lore.kernel.org/linux-arm-kernel/20200813075721.27981-1-s.hauer@pengutronix.de/

I think this version should work on your platform too.

Thanks,

James