[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8898674D84E3B24BA3A2D289B872026A6A30C360@G01JPEXMBKW03>
Date: Tue, 5 Feb 2019 12:49:28 +0000
From: "Zhang, Lei" <zhang.lei@...fujitsu.com>
To: 'Catalin Marinas' <catalin.marinas@....com>
CC: "'linux-kernel@...r.kernel.org'" <linux-kernel@...r.kernel.org>,
"'Mark Rutland'" <mark.rutland@....com>,
"'linux-arm-kernel@...ts.infradead.org'"
<linux-arm-kernel@...ts.infradead.org>,
"'will.deacon@....com'" <will.deacon@....com>,
"'james.morse@....com'" <james.morse@....com>
Subject: RE: [PATCH v3 0/1] arm64: Add workaround for Fujitsu A64FX erratum
010001
Hi Catalin,
> -----Original Message-----
> From: Catalin Marinas [mailto:catalin.marinas@....com]
> Sent: Wednesday, January 30, 2019 3:11 AM
> To: Zhang, Lei
> Cc: 'linux-kernel@...r.kernel.org'; 'Mark Rutland';
> 'linux-arm-kernel@...ts.infradead.org'; 'will.deacon@....com';
> 'james.morse@....com'
> Subject: Re: [PATCH v3 0/1] arm64: Add workaround for Fujitsu A64FX
> erratum 010001
>
> Could you please copy the whole description from the cover letter to the
> actual patch and only send one email (full description as in here
> together with the patch)? If we commit this to the kernel, it would be
> useful to have the information in the log for reference later on.
Thank you for your suggestion. I will send one email with whole description.
> So this looks like new information on the hardware behaviour since the
> v2 of the patch. Can this fault occur for any type of instruction
> accessing the memory or only for SVE instructions?
This erratum is that any load/store instruction, including Armv8 and SVE,
except non-fault access might occur a spurious fault.
> How likely is it to trigger this erratum? In other words, aren't we
> better off with a spurious fault that we ignore rather than toggling the
> TCR_ELx.NFD1 bit?
Although the erratum occurs exceptionally rare, this path is required
to handle the issue pointed out by James and Mark in:
https://lkml.org/lkml/2019/1/22/533,
https://lkml.org/lkml/2019/1/22/642.
As James and Mark pointed, if the erratum occurs at EL1/EL2 before
system registers, ELR and SPSR, are backed up, these registers will
be overwritten and we will lose that information.
So, we set the TCR_ELx.NFD1=0 during EL1/EL2.
Please see the supplemental explanation in the end of this mail.
> The problem is that this bit may be cached in the TLB (I haven't checked
> the ARM ARM but that's usually the case with the TCR_ELx bits). If
> that's the case, you can't guarantee a change unless you also perform
> a
> TLBI VMALL. Arguably, if Fujitsu's microarchitecture doesn't cache the
> NFD bits in the TLB, we could apply the workaround but I'd rather have
> the spurious trap if it's not too often.
It is not necessary to perform a TLBI VMALL in A64FX microarchitecture
to guarantee a change of TCR_ELx.{NFD0,NFD1}.
> Could speculative loads also trigger this? Another option would be to
> toggle it during kernel_neon_begin/end (with the caveat of TLBI as
> mentioned above).
No, a speculative load does not trigger this erratum.
Here are supplemental explanations:
Since this erratum occurs only when TCR_ELx.NFD1=1,
we keep TCR_ELx.NFD1=0 during EL1/EL2.
By doing so, the erratum occurs only in EL0 and the
spurious trap can be handled by the fault handler.
To keep TCR_ELx.NFD1=0 in EL1/EL2, there are two critical
sections to assure the completeness of the implementation.
One is the transition from EL0 to EL1/EL2 and the other
is from EL1/EL2 to EL0
For the former case, I set TCR_ELx.NFD1=0 at codes tramp_map_kernel.
And there is no load/store instruction before setting
TCR_ELx.NFD1=0 at EL1/EL2, so undefined fault will not be happened.
For the latter case, I set TCR_ELx.NFD1=1 at codes tramp_unmap_kernel.
And there is no load/store instruction after setting
TCR_ELx.NFD1=1 at EL1/EL2, so undefined fault will not be happened.
To handle the spurious fault in EL0,
I replace the fault handler for Data abort DFSC=0b111111 with
a new fault handler to ignore this spurious fault caused by the erratum.
Thanks,
Zhang Lei
Powered by blists - more mailing lists