linux-kernel - Re: [PATCH v3 0/1] arm64: Add workaround for Fujitsu A64FX erratum 010001

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <57c09d48-2c09-f1e1-0f70-c8249bc8329f@arm.com>
Date:   Wed, 30 Jan 2019 14:56:52 +0000
From:   James Morse <james.morse@....com>
To:     Catalin Marinas <catalin.marinas@....com>,
        "Zhang, Lei" <zhang.lei@...fujitsu.com>
Cc:     "'linux-kernel@...r.kernel.org'" <linux-kernel@...r.kernel.org>,
        'Mark Rutland' <mark.rutland@....com>,
        "'linux-arm-kernel@...ts.infradead.org'" 
        <linux-arm-kernel@...ts.infradead.org>,
        "'will.deacon@....com'" <will.deacon@....com>
Subject: Re: [PATCH v3 0/1] arm64: Add workaround for Fujitsu A64FX erratum
 010001

Hi guys,

On 01/29/2019 06:10 PM, Catalin Marinas wrote:
> Could you please copy the whole description from the cover letter to the
> actual patch and only send one email (full description as in here
> together with the patch)? If we commit this to the kernel, it would be
> useful to have the information in the log for reference later on.
> 
> More comments below:
> 
> On Tue, Jan 29, 2019 at 12:29:58PM +0000, Zhang, Lei wrote:
>> On some variants of the Fujitsu-A64FX cores ver(1.0, 1.1),
>> memory accesses may cause undefined fault (Data abort, DFSC=0b111111).
>> This problem will be fixed by next version of Fujitsu-A64FX.
>>
>> This fault occurs under a specific hardware condition
>> when a load/store instruction perform an address translation using:
>>    case-1  TTBR0_EL1 with TCR_EL1.NFD0 == 1.
>>    case-2  TTBR0_EL2 with TCR_EL2.NFD0 == 1.
>>    case-3  TTBR1_EL1 with TCR_EL1.NFD1 == 1.
>>    case-4  TTBR1_EL2 with TCR_EL2.NFD1 == 1.
>> And this fault occurs completely spurious.
> 
> So this looks like new information on the hardware behaviour since the
> v2 of the patch. Can this fault occur for any type of instruction
> accessing the memory or only for SVE instructions?
> 
>> Since TCR_ELx.NFD1 is set to '1' at the kernel in versions
>> past 4.17, the case-3 or case-4 may happen.
>>
>> This fault can be taken only at stage-1,
>> so this fault is taken from EL0 to EL1/EL2, from EL1 to EL1,
>> or from EL2 to EL2.
>>
>> I would like to post a workaround to avoid this problem on
>> existing Fujitsu-A64FX version.
> 
> How likely is it to trigger this erratum? In other words, aren't we
> better off with a spurious fault that we ignore rather than toggling the
> TCR_ELx.NFD1 bit?

It sounds like the spurious fault can occur as a result of load/store. 
('there is no load/store instruction between'...).

If this can happen in kernel_enter it will overwrite the exception 
registers, and we lose the original ELR.

If load/store trigger it, I don't think we can ignore it.

Thanks,

James