lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 13 May 2020 17:51:00 -0700
From:   Nick Desaulniers <>
To:     Linus Torvalds <>,
        Borislav Petkov <>
Cc:     Arnd Bergmann <>,
        Arvind Sankar <>,
        Kalle Valo <>,
        linux-wireless <>,
        "" <>,
        "the arch/x86 maintainers" <>,
        Kees Cook <>,
        Thomas Gleixner <>
Subject: Re: gcc-10: kernel stack is corrupted and fails to boot

On Wed, May 13, 2020 at 5:11 PM Linus Torvalds
<> wrote:
> On Wed, May 13, 2020 at 4:36 PM Borislav Petkov <> wrote:
> >
> >
> > Looking at them, they do have an mb() too so how about this then
> > instead?
> >
> > #define prevent_tail_call_optimization()        mb()
> Yeah, I think a full mb() is likely safe, because that's pretty much
> always going to be a real instruction with real semantics, and no
> amount of link-time optimizations can move it around a call
> instruction.

Are you sure LTO treats empty asm statements differently than full
memory barriers in regards to preventing tail calls?  (I'll take your
word for it, I don't actually know, but seeing an example of real code
run through a production compiler is much much more convincing).

The TL;DR of the very long thread is that is a proper fix, on
the GCC side.  Adding arbitrary empty asm statements to work around
it? Hacks.  Full memory barriers? Hacks.

I'm happy that GCC does an optimization that Clang does not.  At the
same time, it sucks to pay a penalty for a bug we don't trigger.  This
is the same reason why `asm_volatile_goto` expands differently between
GCC and Clang (and why I tried to undo that like a year ago).

If Clang realizes the same optimization GCC is doing here (related to
tailcalls) tomorrow, well we already support
__attribute__((no_stack_protector)) which can be added to the callees
we don't want tail called in this case (i.e. allowing tail calls).  I
should send a patch adding that to include/linux/compiler_attributes.h
and annotate the callees in question, before we forget about this

Sprinkling empty asm statements or full memory barriers should be
treated with the same hesitancy as adding sleep()s to "work around"
concurrency bugs.  Red flag.

And LTO is fun; we've been shipping it in Android for years (and need
to attempt upstreaming again).  Just today we found an ODR violation
in one of the most important symbols in the kernel.  Will be sending a
patch for that tomorrow.

> I could imagine some completely UP in-order CPU that doesn't need to
> serialize with anything at all, and even "mb()" might be empty.  I
> think you can compile old ARM kernels for that. But realistically I
> think we can ignore them at least for now - I'm not sure the link-time
> optimization will even do things like that tailcall conversion, and
> I'm not convinced that old pre-ARMv7 systems will be relevant by the
> time (if) it ever does.
>                    Linus

~Nick Desaulniers

Powered by blists - more mailing lists