linux-kernel - [PATCH 2/2] arm: lib: xor-neon: disable clang vectorization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20201106051436.2384842-3-adrian.ratiu@collabora.com>
Date:   Fri,  6 Nov 2020 07:14:36 +0200
From:   Adrian Ratiu <adrian.ratiu@...labora.com>
To:     linux-arm-kernel@...ts.infradead.org
Cc:     Nathan Chancellor <natechancellor@...il.com>,
        Nick Desaulniers <ndesaulniers@...gle.com>,
        Arnd Bergmann <arnd@...db.de>,
        clang-built-linux@...glegroups.com,
        Russell King <linux@...linux.org.uk>,
        linux-kernel@...r.kernel.org, kernel@...labora.com
Subject: [PATCH 2/2] arm: lib: xor-neon: disable clang vectorization

Due to a Clang bug [1] neon autoloop vectorization does not happen or
happens badly with no gains and considering previous GCC experiences
which generated unoptimized code which was worse than the default asm
implementation, it is safer to default clang builds to the known good
generic implementation.

The kernel currently supports a minimum Clang version of v10.0.1, see
commit 1f7a44f63e6c ("compiler-clang: add build check for clang 10.0.1").

When the bug gets eventually fixed, this commit could be reverted or,
if the minimum clang version bump takes a long time, a warning could
be added for users to upgrade their compilers like was done for GCC.

[1] https://bugs.llvm.org/show_bug.cgi?id=40976

Signed-off-by: Adrian Ratiu <adrian.ratiu@...labora.com>
---
 arch/arm/include/asm/xor.h | 3 ++-
 arch/arm/lib/Makefile      | 3 +++
 arch/arm/lib/xor-neon.c    | 4 ++++
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/xor.h b/arch/arm/include/asm/xor.h
index aefddec79286..49937dafaa71 100644
--- a/arch/arm/include/asm/xor.h
+++ b/arch/arm/include/asm/xor.h
@@ -141,7 +141,8 @@ static struct xor_block_template xor_block_arm4regs = {
 		NEON_TEMPLATES;			\
 	} while (0)
 
-#ifdef CONFIG_KERNEL_MODE_NEON
+/* disabled on clang/arm due to https://bugs.llvm.org/show_bug.cgi?id=40976 */
+#if defined(CONFIG_KERNEL_MODE_NEON) && !defined(CONFIG_CC_IS_CLANG)
 
 extern struct xor_block_template const xor_block_neon_inner;
 
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index 6d2ba454f25b..53f9e7dd9714 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -43,8 +43,11 @@ endif
 $(obj)/csumpartialcopy.o:	$(obj)/csumpartialcopygeneric.S
 $(obj)/csumpartialcopyuser.o:	$(obj)/csumpartialcopygeneric.S
 
+# disabled on clang/arm due to https://bugs.llvm.org/show_bug.cgi?id=40976
+ifndef CONFIG_CC_IS_CLANG
 ifeq ($(CONFIG_KERNEL_MODE_NEON),y)
   NEON_FLAGS			:= -march=armv7-a -mfloat-abi=softfp -mfpu=neon
   CFLAGS_xor-neon.o		+= $(NEON_FLAGS)
   obj-$(CONFIG_XOR_BLOCKS)	+= xor-neon.o
 endif
+endif
diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
index e1e76186ec23..84c91c48dfa2 100644
--- a/arch/arm/lib/xor-neon.c
+++ b/arch/arm/lib/xor-neon.c
@@ -18,6 +18,10 @@ MODULE_LICENSE("GPL");
  * Pull in the reference implementations while instructing GCC (through
  * -ftree-vectorize) to attempt to exploit implicit parallelism and emit
  * NEON instructions.
+
+ * On Clang the loop vectorizer is enabled by default, but due to a bug
+ * (https://bugs.llvm.org/show_bug.cgi?id=40976) vectorization is broke
+ * so xor-neon is disabled in favor of the default reg implementations.
  */
 #ifdef CONFIG_CC_IS_GCC
 #pragma GCC optimize "tree-vectorize"
-- 
2.29.0