linux-kernel - Re: [GIT PULL] x86 cleanups for v5.7

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=whCw-WFbHhq6uYZcXrMEoi4y_FhZk48adf4JePxBzmFsg@mail.gmail.com>
Date:   Thu, 2 Apr 2020 10:00:26 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Borislav Petkov <bp@...en8.de>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>,
        Al Viro <viro@...iv.linux.org.uk>,
        Andy Lutomirski <luto@...capital.net>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Michael Matz <matz@...e.de>
Subject: Re: [GIT PULL] x86 cleanups for v5.7

On Thu, Apr 2, 2020 at 6:40 AM Borislav Petkov <bp@...en8.de> wrote:
>
> Btw, looking at this:
>
> https://reviews.llvm.org/rG50cac248773
>
> and talking to a gcc guy (CCed), it should be also relatively easy to do
> the fallthrough variant in gcc too so you could open a feature request
> for that in the gcc bugzilla.

I absolutely want to have that in gcc too (I've mentioned it a couple
of times to people like rth), but I wanted to have more than just an
abstract test-case for it.

Sure, I could attach my current patch to the bug report, but honestly,
if I was a compiler guy, I'd care a lot less about a "hey, you have to
apply this patch that won't even _compile_ for you right now to a tree
you don't really care about that much" kind of bug-report than a "hey.
look, upstream Linux already does this, and with clang I get this
code, and with gcc it's much worse".

And to be honest, while I love the asm goto output, it's not *that*
noticeable. Particularly since we don't have a lot. This is the inner
loop of "strncpy_from_user" with the feature (and clang, of course):

  90: mov    (%r15,%r13,1),%rdx
  94: mov    %rdx,(%r12,%r13,1)
  98: mov    %rdx,%rsi
  9b: not    %rsi
  9e: and    %rax,%rsi
  a1: add    %rcx,%rdx
  a4: and    %rsi,%rdx
  a7: jne    ec <strncpy_from_user+0xec>
  a9: add    $0x8,%r13
  ad: add    $0xfffffffffffffff8,%rbx
  b1: cmp    $0x7,%rbx
  b5: ja     90 <strncpy_from_user+0x90>

And all the jumps are just for testing if there was a zero in there
(the jne in the middle) or testing for the length of the remaining
space (that "cmp $7;ja" at the end).

This is the same thing with gcc (and without asm goto, of course):

  91: lea    (%rcx,%rdi,1),%rdx
  95: mov    %rcx,0x0(%r13,%rax,1)
  9a: not    %rcx
  9d: and    %rcx,%rdx
  a0: mov    %rdx,%rcx
  a3: and    %rsi,%rcx
  a6: jne    108 <strncpy_from_user+0x108>
  a8: sub    $0x8,%rbx
  ac: add    $0x8,%rax
  b0: cmp    $0x7,%rbx
  b4: jbe    12d <strncpy_from_user+0x12d>
  b6: mov    %r8d,%edx
  b9: mov    0x0(%rbp,%rax,1),%rcx
  be: test   %edx,%edx
  c0: je     91 <strncpy_from_user+0x91>

and the only real difference in that inner loop from the asm goto is
that the gcc code has three extra instructions (don't ask me why gcc
decided to cache the value 0 in %r8, that just looks stupid):

  b6: mov    %r8d,%edx
...
  be: test   %edx,%edx
  c0: je     91 <strncpy_from_user+0x91>

(Ok, the code sequence looks completely different because the two
compilers also end up generating the function differently: gcc does
the user space load at the end of the loop while clang does it at the
top. That's probably related to the fact that gcc has to generate that
extra jump anyway, and decided to make that the loop finishing jump).

So realistically, it doesn't make a huge difference. It's a bit more
noticeable when you have the "multiple unsafe_get_user()s in a row"
pattern, but we don't really have that (we have lots of "multiple
unsafe_put_user() in a row").

Of course, one reason we don't have that pattern is that it generates
nasty code with gcc (exactly because of the extra test for each
access).

But while I love looking at small things like this, and I'd like to
have all compilers support it, I have to admit that it's not likely to
really _matter_ much.

                Linus