lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 5 May 2020 15:37:31 -0700
From:   George Burgess IV <george.burgess.iv@...il.com>
To:     Nathan Chancellor <natechancellor@...il.com>
Cc:     Nick Desaulniers <ndesaulniers@...gle.com>,
        "Jason A. Donenfeld" <Jason@...c4.com>,
        "open list:HARDWARE RANDOM NUMBER GENERATOR CORE" 
        <linux-crypto@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        clang-built-linux <clang-built-linux@...glegroups.com>,
        Arnd Bergmann <arnd@...db.de>,
        Kees Cook <keescook@...omium.org>,
        George Burgess <gbiv@...gle.com>
Subject: Re: [PATCH] Kbuild: disable FORTIFY_SOURCE on clang-10

This code generated by Clang here is the unfortunate side-effect of a
bug introduced during Clang-10's development phase. From discussion
with Kees on the links Nick mentioned, I surmise that FORTIFY in the
kernel never worked as well for Clang as it does for GCC today. In
many cases, it'd compile into nothing, but with the aforementioned
Clang bug, it would turn into very suboptimal code.

Kees sounded interested in getting a FORTIFY that plays more nicely
with Clang into the kernel. Until that happens, we'll be in a world
where an unpatched Clang-10 generates suboptimal code, and where a
patched Clang-10 only FORTIFYs a subset of the kernel's `mem*`/`str*`
functions. (I haven't checked assembly, but I assume that not every
FORTIFY'ed function gets compiled into 'nothingness').

I don't have sufficient context to be opinionated on whether it's
"better" to prefer a subset of opportune checks vs better codegen on
unpatched versions of clang.

If we do turn it off, it'd be nice to have some idea of when it can be
turned back on (do we need a modified implementation as mentioned
earlier? N months after clang's next point release is released,
provided the fixes land in it?)

> I can file an upstream PR for Tom to take a look out later tonight.

Thank you for the bisection and for handling the merge :)





On Tue, May 5, 2020 at 3:25 PM Nathan Chancellor
<natechancellor@...il.com> wrote:
>
> On Tue, May 05, 2020 at 03:02:16PM -0700, 'Nick Desaulniers' via Clang Built Linux wrote:
> > On Tue, May 5, 2020 at 2:55 PM Jason A. Donenfeld <Jason@...c4.com> wrote:
> > >
> > > clang-10 has a broken optimization stage that doesn't enable the
> > > compiler to prove at compile time that certain memcpys are within
> > > bounds, and thus the outline memcpy is always called, resulting in
> > > horrific performance, and in some cases, excessive stack frame growth.
> > > Here's a simple reproducer:
> > >
> > >     typedef unsigned long size_t;
> > >     void *c(void *dest, const void *src, size_t n) __asm__("memcpy");
> > >     extern inline __attribute__((gnu_inline)) void *memcpy(void *dest, const void *src, size_t n) { return c(dest, src, n); }
> > >     void blah(char *a)
> > >     {
> > >       unsigned long long b[10], c[10];
> > >       int i;
> > >
> > >       memcpy(b, a, sizeof(b));
> > >       for (i = 0; i < 10; ++i)
> > >         c[i] = b[i] ^ b[9 - i];
> > >       for (i = 0; i < 10; ++i)
> > >         b[i] = c[i] ^ a[i];
> > >       memcpy(a, b, sizeof(b));
> > >     }
> > >
> > > Compile this with clang-9 and clang-10 and observe:
> > >
> > > zx2c4@...nkpad /tmp/curve25519-hacl64-stack-frame-size-test $ clang-10 -Wframe-larger-than=0 -O3 -c b.c -o c10.o
> > > b.c:5:6: warning: stack frame size of 104 bytes in function 'blah' [-Wframe-larger-than=]
> > > void blah(char *a)
> > >      ^
> > > 1 warning generated.
> > > zx2c4@...nkpad /tmp/curve25519-hacl64-stack-frame-size-test $ clang-9 -Wframe-larger-than=0 -O3 -c b.c -o c9.o
> > >
> > > Looking at the disassembly of c10.o and c9.o, one can see that c9.o is
> > > properly optimized in the obvious way one would expect, while c10.o has
> > > blown up and includes extern calls to memcpy.
> > >
> > > This is present on the most trivial bits of code. Thus, for clang-10, we
> > > just set __NO_FORTIFY globally, so that this issue won't be incurred.
> > >
> > > Cc: Arnd Bergmann <arnd@...db.de>
> > > Cc: LKML <linux-kernel@...r.kernel.org>
> > > Cc: clang-built-linux <clang-built-linux@...glegroups.com>
> > > Cc: Kees Cook <keescook@...omium.org>
> > > Cc: George Burgess <gbiv@...gle.com>
> > > Cc: Nick Desaulniers <ndesaulniers@...gle.com>
> > > Link: https://bugs.llvm.org/show_bug.cgi?id=45802
> > > Signed-off-by: Jason A. Donenfeld <Jason@...c4.com>
> >
> > I'm going to request this not be merged until careful comment from
> > George and Kees. My hands are quite full at the moment with other
> > regressions.  I suspect these threads may be relevant, though I
> > haven't had time to read through them myself.
> > https://github.com/ClangBuiltLinux/linux/issues/979
> > https://github.com/ClangBuiltLinux/linux/pull/980
>
> I believe these issues are one in the same. I did a reverse bisect with
> Arnd's test case and converged on George's first patch:
>
> https://github.com/llvm/llvm-project/commit/2dd17ff08165e6118e70f00e22b2c36d2d4e0a9a
>
> I think that in lieu of this patch, we should have that patch and its
> follow-up fix merged into 10.0.1.
>
> I can file an upstream PR for Tom to take a look out later tonight.
>
> Cheers,
> Nathan
>
> > > ---
> > >  Makefile | 7 +++++++
> > >  1 file changed, 7 insertions(+)
> > >
> > > diff --git a/Makefile b/Makefile
> > > index 49b2709ff44e..f022f077591d 100644
> > > --- a/Makefile
> > > +++ b/Makefile
> > > @@ -768,6 +768,13 @@ KBUILD_CFLAGS += -Wno-gnu
> > >  # source of a reference will be _MergedGlobals and not on of the whitelisted names.
> > >  # See modpost pattern 2
> > >  KBUILD_CFLAGS += -mno-global-merge
> > > +
> > > +# clang-10 has a broken optimization stage that causes memcpy to always be
> > > +# outline, resulting in excessive stack frame growth and poor performance.
> > > +ifeq ($(shell test $(CONFIG_CLANG_VERSION) -ge 100000 && test $(CONFIG_CLANG_VERSION) -lt 110000; echo $$?),0)
> > > +KBUILD_CFLAGS += -D__NO_FORTIFY
> > > +endif
> > > +
> > >  else
> > >
> > >  # These warnings generated too much noise in a regular build.
> > > --
> > > 2.26.2
> > >
> >
> >
> > --
> > Thanks,
> > ~Nick Desaulniers
> >
> > --
> > You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@...glegroups.com.
> > To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/CAKwvOdk32cDowvrqRPKDRpf2ZiXh%3DjVnBTmhM-NWD%3DOwnq9v3w%40mail.gmail.com.
>
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@...glegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20200505222540.GA230458%40ubuntu-s3-xlarge-x86.

Powered by blists - more mailing lists