lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 29 Jan 2019 12:29:58 +0000
From:   "Zhang, Lei" <zhang.lei@...fujitsu.com>
To:     'Catalin Marinas' <catalin.marinas@....com>,
        "'linux-kernel@...r.kernel.org'" <linux-kernel@...r.kernel.org>,
        "'Mark Rutland'" <mark.rutland@....com>,
        "'linux-arm-kernel@...ts.infradead.org'" 
        <linux-arm-kernel@...ts.infradead.org>,
        "'will.deacon@....com'" <will.deacon@....com>,
        "'james.morse@....com'" <james.morse@....com>
Subject: [PATCH v3 0/1] arm64: Add workaround for Fujitsu A64FX erratum
 010001

On some variants of the Fujitsu-A64FX cores ver(1.0, 1.1),  
memory accesses may cause undefined fault (Data abort, DFSC=0b111111).
This problem will be fixed by next version of Fujitsu-A64FX.

This fault occurs under a specific hardware condition 
when a load/store instruction perform an address translation using:
  case-1  TTBR0_EL1 with TCR_EL1.NFD0 == 1.
  case-2  TTBR0_EL2 with TCR_EL2.NFD0 == 1.
  case-3  TTBR1_EL1 with TCR_EL1.NFD1 == 1.
  case-4  TTBR1_EL2 with TCR_EL2.NFD1 == 1.
And this fault occurs completely spurious.

Since TCR_ELx.NFD1 is set to '1' at the kernel in versions 
past 4.17, the case-3 or case-4 may happen.

This fault can be taken only at stage-1, 
so this fault is taken from EL0 to EL1/EL2, from EL1 to EL1, 
or from EL2 to EL2.

I would like to post a workaround to avoid this problem on 
existing Fujitsu-A64FX version.

There are 2 points in this workaround.
Point1: trap from EL1 to EL1, EL2 to EL2
Set '0' to TCR_ELx.NFD1in kernel-entry, 
and set '1' in kernel-exit.

From the view point of ARM specification, there is no problem to 
reset TCR_ELx.{NFD0,NFD1} while in EL1/EL2, because 
TCR_ELx.{NFD0,NFD1} controls whether to perform a translation 
table walk in response to an access from EL0.

I confirmed that:
・There is no load/store instruction between 
  tramp_ventry and setting TCR_ELx.NFD1 to '0'.
・There is no load/store instruction between 
  setting TCR_ELx.NFD1 to '1' and tramp_exit.

Point2: trap from EL0 to EL1/EL2
Since this fault also occurs in EL0,
replace the fault handler for Data abort
DFSC=0b111111 with a new one to ignore this undefined fault.
I guarantee that a thread will stop delivering this fault code by ignore
this undefined fault.

The hardware condition which cause this fault is reset at exception entry, 
therefore execution of at least one instruction is 
guaranteed by this single retry.


This workaround is based on linux-5.0-rc2,
which TCR_ELx.NFD1 is set to '1' 
only once at boot sequence, 
and TCR_ELx.NFD0 is not set by kernel.
I will update my patch if new kernel makes some changes
about TCR_ELx.{NFD0,NFD1}.

Changes since [v1]
As Mark's review:

 * Adopted errata framework.

Changes since [v2]
As Mark and James' review:
 
 * Added framework to change TCR_ELx.NFD1.
  - Change TCR_ELx.NFD1 to 0 when entry kernel.
  - Change TCR_ELx.NFD1 to 1 when exit kernel.

I fully appreciate that if someone can test this patch on different chips 
to verity no harmful effect on other chips.

If there is no problem on other chips, please merge this patch.

The patch based on linux-5.0-rc2.

Zhang Lei (1):
  Arm64: Add workaround for Fujitsu A64FX erratum 010001

 Documentation/arm64/silicon-errata.txt |  1 +
 arch/arm64/Kconfig                     | 22 ++++++++++++++++++++++
 arch/arm64/include/asm/cpucaps.h       |  3 ++-
 arch/arm64/include/asm/cputype.h       |  4 ++++
 arch/arm64/kernel/cpu_errata.c         |  8 ++++++++
 arch/arm64/kernel/entry.S              | 16 ++++++++++++++++
 arch/arm64/mm/fault.c                  | 16 +++++++++++++++-
 arch/arm64/mm/proc.S                   | 20 ++++++++++++++++++++
 8 files changed, 88 insertions(+), 2 deletions(-)

-- 
1.8.3.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ