[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201106101419.GB3811063@ubuntu-m3-large-x86>
Date: Fri, 6 Nov 2020 03:14:19 -0700
From: Nathan Chancellor <natechancellor@...il.com>
To: Adrian Ratiu <adrian.ratiu@...labora.com>
Cc: linux-arm-kernel@...ts.infradead.org,
Nick Desaulniers <ndesaulniers@...gle.com>,
Arnd Bergmann <arnd@...db.de>,
clang-built-linux@...glegroups.com,
Russell King <linux@...linux.org.uk>,
linux-kernel@...r.kernel.org, kernel@...labora.com,
Ard Biesheuvel <ardb@...nel.org>
Subject: Re: [PATCH 2/2] arm: lib: xor-neon: disable clang vectorization
+ Ard, who wrote this code.
On Fri, Nov 06, 2020 at 07:14:36AM +0200, Adrian Ratiu wrote:
> Due to a Clang bug [1] neon autoloop vectorization does not happen or
> happens badly with no gains and considering previous GCC experiences
> which generated unoptimized code which was worse than the default asm
> implementation, it is safer to default clang builds to the known good
> generic implementation.
>
> The kernel currently supports a minimum Clang version of v10.0.1, see
> commit 1f7a44f63e6c ("compiler-clang: add build check for clang 10.0.1").
>
> When the bug gets eventually fixed, this commit could be reverted or,
> if the minimum clang version bump takes a long time, a warning could
> be added for users to upgrade their compilers like was done for GCC.
>
> [1] https://bugs.llvm.org/show_bug.cgi?id=40976
>
> Signed-off-by: Adrian Ratiu <adrian.ratiu@...labora.com>
Thank you for the patch! We are also tracking this here:
https://github.com/ClangBuiltLinux/linux/issues/496
It was on my TODO to revist getting the warning eliminated, which likely
would have involved a patch like this as well.
I am curious if it is worth revisting or dusting off Arnd's patch in the
LLVM bug tracker first. I have not tried it personally. If that is not a
worthwhile option, I am fine with this for now. It would be nice to try
and get a fix pinned down on the LLVM side at some point but alas,
finite amount of resources and people :(
Should no other options come to fruition from further discussions, you
can carry my tag forward:
Acked-by: Nathan Chancellor <natechancellor@...il.com>
Hopefully others can comment soon.
> ---
> arch/arm/include/asm/xor.h | 3 ++-
> arch/arm/lib/Makefile | 3 +++
> arch/arm/lib/xor-neon.c | 4 ++++
> 3 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm/include/asm/xor.h b/arch/arm/include/asm/xor.h
> index aefddec79286..49937dafaa71 100644
> --- a/arch/arm/include/asm/xor.h
> +++ b/arch/arm/include/asm/xor.h
> @@ -141,7 +141,8 @@ static struct xor_block_template xor_block_arm4regs = {
> NEON_TEMPLATES; \
> } while (0)
>
> -#ifdef CONFIG_KERNEL_MODE_NEON
> +/* disabled on clang/arm due to https://bugs.llvm.org/show_bug.cgi?id=40976 */
> +#if defined(CONFIG_KERNEL_MODE_NEON) && !defined(CONFIG_CC_IS_CLANG)
>
> extern struct xor_block_template const xor_block_neon_inner;
>
> diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
> index 6d2ba454f25b..53f9e7dd9714 100644
> --- a/arch/arm/lib/Makefile
> +++ b/arch/arm/lib/Makefile
> @@ -43,8 +43,11 @@ endif
> $(obj)/csumpartialcopy.o: $(obj)/csumpartialcopygeneric.S
> $(obj)/csumpartialcopyuser.o: $(obj)/csumpartialcopygeneric.S
>
> +# disabled on clang/arm due to https://bugs.llvm.org/show_bug.cgi?id=40976
> +ifndef CONFIG_CC_IS_CLANG
> ifeq ($(CONFIG_KERNEL_MODE_NEON),y)
> NEON_FLAGS := -march=armv7-a -mfloat-abi=softfp -mfpu=neon
> CFLAGS_xor-neon.o += $(NEON_FLAGS)
> obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o
> endif
> +endif
> diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
> index e1e76186ec23..84c91c48dfa2 100644
> --- a/arch/arm/lib/xor-neon.c
> +++ b/arch/arm/lib/xor-neon.c
> @@ -18,6 +18,10 @@ MODULE_LICENSE("GPL");
> * Pull in the reference implementations while instructing GCC (through
> * -ftree-vectorize) to attempt to exploit implicit parallelism and emit
> * NEON instructions.
> +
> + * On Clang the loop vectorizer is enabled by default, but due to a bug
> + * (https://bugs.llvm.org/show_bug.cgi?id=40976) vectorization is broke
> + * so xor-neon is disabled in favor of the default reg implementations.
> */
> #ifdef CONFIG_CC_IS_GCC
> #pragma GCC optimize "tree-vectorize"
> --
> 2.29.0
>
Powered by blists - more mailing lists