[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <57c09d48-2c09-f1e1-0f70-c8249bc8329f@arm.com>
Date: Wed, 30 Jan 2019 14:56:52 +0000
From: James Morse <james.morse@....com>
To: Catalin Marinas <catalin.marinas@....com>,
"Zhang, Lei" <zhang.lei@...fujitsu.com>
Cc: "'linux-kernel@...r.kernel.org'" <linux-kernel@...r.kernel.org>,
'Mark Rutland' <mark.rutland@....com>,
"'linux-arm-kernel@...ts.infradead.org'"
<linux-arm-kernel@...ts.infradead.org>,
"'will.deacon@....com'" <will.deacon@....com>
Subject: Re: [PATCH v3 0/1] arm64: Add workaround for Fujitsu A64FX erratum
010001
Hi guys,
On 01/29/2019 06:10 PM, Catalin Marinas wrote:
> Could you please copy the whole description from the cover letter to the
> actual patch and only send one email (full description as in here
> together with the patch)? If we commit this to the kernel, it would be
> useful to have the information in the log for reference later on.
>
> More comments below:
>
> On Tue, Jan 29, 2019 at 12:29:58PM +0000, Zhang, Lei wrote:
>> On some variants of the Fujitsu-A64FX cores ver(1.0, 1.1),
>> memory accesses may cause undefined fault (Data abort, DFSC=0b111111).
>> This problem will be fixed by next version of Fujitsu-A64FX.
>>
>> This fault occurs under a specific hardware condition
>> when a load/store instruction perform an address translation using:
>> case-1 TTBR0_EL1 with TCR_EL1.NFD0 == 1.
>> case-2 TTBR0_EL2 with TCR_EL2.NFD0 == 1.
>> case-3 TTBR1_EL1 with TCR_EL1.NFD1 == 1.
>> case-4 TTBR1_EL2 with TCR_EL2.NFD1 == 1.
>> And this fault occurs completely spurious.
>
> So this looks like new information on the hardware behaviour since the
> v2 of the patch. Can this fault occur for any type of instruction
> accessing the memory or only for SVE instructions?
>
>> Since TCR_ELx.NFD1 is set to '1' at the kernel in versions
>> past 4.17, the case-3 or case-4 may happen.
>>
>> This fault can be taken only at stage-1,
>> so this fault is taken from EL0 to EL1/EL2, from EL1 to EL1,
>> or from EL2 to EL2.
>>
>> I would like to post a workaround to avoid this problem on
>> existing Fujitsu-A64FX version.
>
> How likely is it to trigger this erratum? In other words, aren't we
> better off with a spurious fault that we ignore rather than toggling the
> TCR_ELx.NFD1 bit?
It sounds like the spurious fault can occur as a result of load/store.
('there is no load/store instruction between'...).
If this can happen in kernel_enter it will overwrite the exception
registers, and we lose the original ELR.
If load/store trigger it, I don't think we can ignore it.
Thanks,
James
Powered by blists - more mailing lists