[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK8P3a3hbts4k+rrfnE8Z78ezCaME0UVgwqkdLW5NOps2YHUQQ@mail.gmail.com>
Date: Tue, 18 May 2021 17:41:00 +0200
From: Arnd Bergmann <arnd@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Eric Biggers <ebiggers@...nel.org>,
linux-arch <linux-arch@...r.kernel.org>,
Vineet Gupta <vgupta@...opsys.com>,
Russell King <linux@...linux.org.uk>,
Herbert Xu <herbert@...dor.apana.org.au>,
"David S. Miller" <davem@...emloft.net>,
Thomas Bogendoerfer <tsbogend@...ha.franken.de>,
Linux ARM <linux-arm-kernel@...ts.infradead.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
"open list:HARDWARE RANDOM NUMBER GENERATOR CORE"
<linux-crypto@...r.kernel.org>,
"open list:BROADCOM NVRAM DRIVER" <linux-mips@...r.kernel.org>
Subject: Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers
On Tue, May 18, 2021 at 4:56 PM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> On Tue, May 18, 2021 at 12:27 AM Arnd Bergmann <arnd@...nel.org> wrote:
> > >
> > > I wonder if the kernel should do the same, or whether there are still cases
> > > where memcpy() isn't compiled optimally. armv6/7 used to be one such case, but
> > > it was fixed in gcc 6.
> >
> > It would have to be memmove(), not memcpy() in this case, right?
>
> No, it would simply be something like
>
> #define __get_unaligned_t(type, ptr) \
> ({ type __val; memcpy(&__val, ptr, sizeof(type)); __val; })
>
> #define get_unaligned(ptr) \
> __get_unaligned_t(typeof(*(ptr)), ptr)
>
> but honestly, the likelihood that the compiler generates something
> horrible (possibly because of KASAN etc) is uncomfortably high.
>
> I'd prefer the __packed thing. We don't actually use -O3, and it's
> considered a bad idea, and the gcc bug is as such less likely than
> just the above generating unacceptable code (we have several cases
> where "bad code generation" ends up being an actual bug, since we
> depend on inlining and depend on some code sequences not generating
> calls etc).
I think the important question is whether we know that the bug that Eric
pointed to can only happen with -O3, or whether it is something in
gcc-10.1 that got triggered by -O3 -msse on x86-64 but could equally
well get triggered on some other architecture without -O3 or
vector instructions enabled.
>From the gcc fix at
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=9fa5b473b5b8e289b
it looks like this code path is entered when compiling with
-ftree-loop-vectorize, which is documented as
'-ftree-loop-vectorize'
Perform loop vectorization on trees. This flag is enabled by
default at '-O3' and by '-ftree-vectorize', '-fprofile-use', and
'-fauto-profile'.
-ftree-vectorize is set in arch/arm/lib/xor-neon.c
-O3 is set for the lz4 and zstd compression helpers and for wireguard.
To be on the safe side, we could pass -fno-tree-loop-vectorize along
with -O3 on the affected gcc versions, or use a bigger hammer
(not use -O3 at all, always set -fno-tree-loop-vectorize, ...).
Arnd
Powered by blists - more mailing lists