lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFULd4bcnEf6MCa1L=hYHHMOP=jB4Bc0Uugugg8xgNogQAU+YA@mail.gmail.com>
Date: Sat, 6 Apr 2024 14:30:38 +0200
From: Uros Bizjak <ubizjak@...il.com>
To: Ingo Molnar <mingo@...nel.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>, 
	Nick Desaulniers <ndesaulniers@...gle.com>, Nathan Chancellor <nathan@...nel.org>, 
	Thomas Gleixner <tglx@...utronix.de>, Peter Anvin <hpa@...or.com>, 
	"the arch/x86 maintainers" <x86@...nel.org>, Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: More annoying code generation by clang

On Sat, Apr 6, 2024 at 12:56 PM Ingo Molnar <mingo@...nel.org> wrote:
>
>
> [ I've Cc:-ed a few more people who might be interested in this. ]
> [ Copy of Linus's email below. ]
>
> * Linus Torvalds <torvalds@...ux-foundation.org> wrote:
>
> > So this doesn't really matter in any real life situation, but it
> > really grated on me.
> >
> > Clang has this nasty habit of taking our nice asm constraints, and
> > turning them into worst-case garbage. It's been reported a couple of
> > times where we use "g" to tell the compiler that pretty much any
> > source to the asm works, and then clang takes that to mean "I will
> > take that to use 'memory'" even when that makes no sense what-so-ever.
> >
> > See for example
> >
> >     https://lore.kernel.org/all/CAHk-=wgobnShg4c2yyMbk2p=U-wmnOmX_0=b3ZY_479Jjey2xw@mail.gmail.com/
> >
> > where I was ranting about clang just doing pointlessly stupid things.
> >
> > However, I found a case where yes, clang does pointlessly stupid
> > things, but it's at least _partly_ our fault, and gcc can't generate
> > optimal code either.
> >
> > We have this fairly critical code in __fget_files_rcu() to look up a
> > 'struct file *' from an fd, and it does this:
> >
> >                 /* Mask is a 0 for invalid fd's, ~0 for valid ones */
> >                 nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);
> >
> > and clang makes a *horrid* mess of it, generating this code:
> >
> >         movl    %edi, %r14d
> >         movq    32(%rbx), %rdx
> >         movl    (%rdx), %eax
> >         movq    %rax, 8(%rsp)
> >         cmpq    8(%rsp), %r14
> >         sbbq    %rcx, %rcx
> >
> > which is just crazy. Notice how it does that "move rax to stack, then
> > do the compare against the stack", instead of just using %rax.
> >
> > In fact, that function shouldn't have a stack frame at all, and the
> > only reason it is generated is because of this whole oddity.
> >
> > All clang's fault, right?
> >
> > Yeah, mostly. But it turns out that what really messes with clangs
> > little head is that the x86 array_index_mask_nospec() function is
> > being a bit annoying.
> >
> > This is what we do:
> >
> >   static __always_inline unsigned long
> > array_index_mask_nospec(unsigned long index,
> >                 unsigned long size)
> >   {
> >         unsigned long mask;
> >
> >         asm volatile ("cmp %1,%2; sbb %0,%0;"
> >                         :"=r" (mask)
> >                         :"g"(size),"r" (index)
> >                         :"cc");
> >         return mask;
> >   }
> >
> > and look at the use again:
> >
> >         nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);
> >
> > here all the values are actually 'unsigned int'. So what happens is
> > that clang can't just use the fdt->max_fds value *directly* from
> > memory, because it needs to be expanded from 32-bit to 64-bit because
> > we've made our array_index_mask_nospec() function only work on 64-bit
> > 'unsigned long' values.
> >
> > So it turns out that by massaging this a bit, and making it just be a
> > macro - so that the asm can decide that "I can do this in 32-bit" - I
> > can get clang to generate much better code.
> >
> > Clang still absolutely hates the "g" constraint, so to get clang to
> > really get this right I have to use "ir" instead of "g". Which is
> > wrong. Because gcc does this right, and could use the memory op
> > directly. But even gcc cannot do that with our *current* function,
> > because of that "the memory value is 32-bit, we require a 64-bit
> > value"
> >
> > Anyway, I can get gcc to generate the right code:
> >
> >         movq    32(%r13), %rdx
> >         cmp (%rdx),%ebx
> >         sbb %esi,%esi
> >
> > which is basically the right code for the six crazy instructions clang
> > generates. And if I make the "g" be "ir", I can get clang to generate
> >
> >         movq    32(%rdi), %rcx
> >         movl    (%rcx), %eax
> >         cmpl    %eax, %esi
> >         sbbl    %esi, %esi
> >
> > which is the same thing, but with that (pointless) load to a register.
> >
> > And now clang doesn't generate that stack frame at all.
> >
> > Anyway, this was a long email to explain the odd attached patch.
> >
> > Comments? Note that this patch is *entirely* untested, I have done
> > this purely by looking at the code generation in fs/file.c.

FYI, please note that gcc-12 is able to synthesize carry-flag compares
on its own:

--cut here--
unsigned long foo (unsigned long index, unsigned long size)
{
    return (index <= size) ? 0 : -1;
}

extern unsigned int size;

unsigned long bar (unsigned int index)
{
    return (index <= size) ? 0 : -1;
}
--cut here--

gcc-12 -O2:

foo:
        cmpq    %rdi, %rsi
        sbbq    %rax, %rax
        ret
bar:
        cmpl    %edi, size(%rip)
        sbbq    %rax, %rax
        ret

Uros.

> >
> >                 Linus
>
> >  arch/x86/include/asm/barrier.h | 23 +++++++++--------------
> >  1 file changed, 9 insertions(+), 14 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
> > index 66e57c010392..6159d2cbbfde 100644
> > --- a/arch/x86/include/asm/barrier.h
> > +++ b/arch/x86/include/asm/barrier.h
> > @@ -33,20 +33,15 @@
> >   * Returns:
> >   *     0 - (index < size)
> >   */
> > -static __always_inline unsigned long array_index_mask_nospec(unsigned long index,
> > -             unsigned long size)
> > -{
> > -     unsigned long mask;
> > -
> > -     asm volatile ("cmp %1,%2; sbb %0,%0;"
> > -                     :"=r" (mask)
> > -                     :"g"(size),"r" (index)
> > -                     :"cc");
> > -     return mask;
> > -}
> > -
> > -/* Override the default implementation from linux/nospec.h. */
> > -#define array_index_mask_nospec array_index_mask_nospec
> > +#define array_index_mask_nospec(idx,sz) ({   \
> > +     typeof((idx)+(sz)) __idx = (idx);       \
> > +     typeof(__idx) __sz = (sz);              \
> > +     typeof(__idx) __mask;                   \
> > +     asm volatile ("cmp %1,%2; sbb %0,%0"    \
> > +                     :"=r" (__mask)          \
> > +                     :"ir"(__sz),"r" (__idx) \
> > +                     :"cc");                 \
> > +     __mask; })
> >
> >  /* Prevent speculative execution past this barrier. */
> >  #define barrier_nospec() asm volatile("lfence":::"memory")
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ