lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Thu, 4 Apr 2024 15:53:49 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Ingo Molnar <mingo@...nel.org>, Thomas Gleixner <tglx@...utronix.de>, Peter Anvin <hpa@...or.com>
Cc: "the arch/x86 maintainers" <x86@...nel.org>, 
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: More annoying code generation by clang

So this doesn't really matter in any real life situation, but it
really grated on me.

Clang has this nasty habit of taking our nice asm constraints, and
turning them into worst-case garbage. It's been reported a couple of
times where we use "g" to tell the compiler that pretty much any
source to the asm works, and then clang takes that to mean "I will
take that to use 'memory'" even when that makes no sense what-so-ever.

See for example

    https://lore.kernel.org/all/CAHk-=wgobnShg4c2yyMbk2p=U-wmnOmX_0=b3ZY_479Jjey2xw@mail.gmail.com/

where I was ranting about clang just doing pointlessly stupid things.

However, I found a case where yes, clang does pointlessly stupid
things, but it's at least _partly_ our fault, and gcc can't generate
optimal code either.

We have this fairly critical code in __fget_files_rcu() to look up a
'struct file *' from an fd, and it does this:

                /* Mask is a 0 for invalid fd's, ~0 for valid ones */
                nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);

and clang makes a *horrid* mess of it, generating this code:

        movl    %edi, %r14d
        movq    32(%rbx), %rdx
        movl    (%rdx), %eax
        movq    %rax, 8(%rsp)
        cmpq    8(%rsp), %r14
        sbbq    %rcx, %rcx

which is just crazy. Notice how it does that "move rax to stack, then
do the compare against the stack", instead of just using %rax.

In fact, that function shouldn't have a stack frame at all, and the
only reason it is generated is because of this whole oddity.

All clang's fault, right?

Yeah, mostly. But it turns out that what really messes with clangs
little head is that the x86 array_index_mask_nospec() function is
being a bit annoying.

This is what we do:

  static __always_inline unsigned long
array_index_mask_nospec(unsigned long index,
                unsigned long size)
  {
        unsigned long mask;

        asm volatile ("cmp %1,%2; sbb %0,%0;"
                        :"=r" (mask)
                        :"g"(size),"r" (index)
                        :"cc");
        return mask;
  }

and look at the use again:

        nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);

here all the values are actually 'unsigned int'. So what happens is
that clang can't just use the fdt->max_fds value *directly* from
memory, because it needs to be expanded from 32-bit to 64-bit because
we've made our array_index_mask_nospec() function only work on 64-bit
'unsigned long' values.

So it turns out that by massaging this a bit, and making it just be a
macro - so that the asm can decide that "I can do this in 32-bit" - I
can get clang to generate much better code.

Clang still absolutely hates the "g" constraint, so to get clang to
really get this right I have to use "ir" instead of "g". Which is
wrong. Because gcc does this right, and could use the memory op
directly. But even gcc cannot do that with our *current* function,
because of that "the memory value is 32-bit, we require a 64-bit
value"

Anyway, I can get gcc to generate the right code:

        movq    32(%r13), %rdx
        cmp (%rdx),%ebx
        sbb %esi,%esi

which is basically the right code for the six crazy instructions clang
generates. And if I make the "g" be "ir", I can get clang to generate

        movq    32(%rdi), %rcx
        movl    (%rcx), %eax
        cmpl    %eax, %esi
        sbbl    %esi, %esi

which is the same thing, but with that (pointless) load to a register.

And now clang doesn't generate that stack frame at all.

Anyway, this was a long email to explain the odd attached patch.

Comments? Note that this patch is *entirely* untested, I have done
this purely by looking at the code generation in fs/file.c.

                Linus

View attachment "patch.diff" of type "text/x-patch" (1122 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ