lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKwvOd=fcF=y-mBtPZ9QcVe++__jo11St4=+roPKrGh5D6FH_g@mail.gmail.com>
Date:   Tue, 17 Jan 2023 11:17:15 -0800
From:   Nick Desaulniers <ndesaulniers@...gle.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Andy Lutomirski <luto@...nel.org>, x86@...nel.org,
        Kostya Serebryany <kcc@...gle.com>,
        Andrey Ryabinin <ryabinin.a.a@...il.com>,
        Andrey Konovalov <andreyknvl@...il.com>,
        Alexander Potapenko <glider@...gle.com>,
        Taras Madan <tarasmadan@...gle.com>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        "H . J . Lu" <hjl.tools@...il.com>,
        Andi Kleen <ak@...ux.intel.com>,
        Rick Edgecombe <rick.p.edgecombe@...el.com>,
        Bharata B Rao <bharata@....com>,
        Jacob Pan <jacob.jun.pan@...ux.intel.com>,
        Ashok Raj <ashok.raj@...el.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org,
        Sami Tolvanen <samitolvanen@...gle.com>,
        joao@...rdrivepizza.com
Subject: Re: [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until
 the first LAM user

On Tue, Jan 17, 2023 at 10:34 AM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> On Tue, Jan 17, 2023 at 10:26 AM Nick Desaulniers
> <ndesaulniers@...gle.com> wrote:
> >
> > On Tue, Jan 17, 2023 at 9:29 AM Linus Torvalds
> > <torvalds@...ux-foundation.org> wrote:
> > >
> > > Side note: that's not something new or unusual. It's been the case
> > > since I started testing clang - we have several code-paths where we
> > > use "unlikely()" to try to get very unlikely cases to be out-of-line,
> > > and clang just mostly ignores it, or treats it as a very weak hint. I
> > > think the only way to get clang to treat it as a *strong* hint is to
> > > use PGO.
> >
> > I'd be surprised if that were intentional or by design.
> >
> > Do you guys have a bug report we could look at?
>
> Heh. I actually sent you an example long ago. Let me go fish it out of
> my mail archives and quote some of it below so that you can find it in
> yours..
>
>               Linus
>
> [ Time passes. Found this email to you and Bill Wendling from Feb 16,
> 2020, Message-ID
> CAHk-=wigVshsByCMjkUiZyQSR5N5zi2aAeQc+VJCzQV=nm8E7g@...l.gmail.com ]:
>
>   Anyway, I'm looking at clang code generation, and comparing it with
>   gcc on one of my "this has been optimized to hell and back" functions:
>   __d_lookup_rcu().
>
>   It looks like clang does frame pointers, and ignores our
>   likely/unlikely annotations.
>
>   So this code:
>
>                 if (unlikely(parent->d_flags & DCACHE_OP_COMPARE)) {
>                         int tlen;
>                         const char *tname;
>                         ......
>
>   doesn't actually jump out of line, but instead generates the unlikely
>   case as the fallthrough:
>
>         testb   $2, (%r12)
>         je      .LBB50_9
>         ... unlikely code goes here...


Perhaps that was compiler version or config specific?

$ make LLVM=1 -j128 defconfig fs/dcache.o
$ llvm-objdump -d  --no-show-raw-insn
--disassemble-symbols=__d_lookup_rcu fs/dcache.o
0000000000003210 <__d_lookup_rcu>:
    3210:      endbr64
    3214:      pushq %rbp
    3215:      pushq %r15
    3217:      pushq %r14
    3219:      pushq %r12
    321b:      pushq %rbx
    321c:      testb $0x2, (%rdi)
    321f:      jne 0x32d7 <__d_lookup_rcu+0xc7>
...
    32d7:      popq %rbx
    32d8:      popq %r12
    32da:      popq %r14
    32dc:      popq %r15
    32de:      popq %rbp
    32df:      jmp 0x3300 <__d_lookup_rcu_op_compare>

That looks like what you want, yeah?

Your original report was from nearly 3 years ago; could have fixed a
few instances of branch weights not getting propagated since then.

What's going on in this case in this thread? I know we don't support
hot/cold attributes on labels yet, but if static_branch_likely (or
friends) is being used, we assign the indirect branches a 0%
likeliness/branch-weight.

>
>   and then the likely case ends up having unfortunate reloads inside the
>   hot loop. Possibly because it has one fewer free registers than gcc
>   because of the frame pointer.
>
>   I didn't look into _why_ clang generates frame pointers but gcc
>   doesn't. It may be just a compiler default, I think we don't end up
>   explicitly asking either way.
>
> [ And then Bill replied with this ]
>
>   It's not a no-op. We add branch probabilities to the IR, whether
>   they're honored or not depends on the transformation. But they
>   *should* be honored when available. I've seen in the past that instead
>   of moving unlikely blocks out of the way (like gcc, which moves them
>   below the function's "ret" instruction), LLVM will do something like
>   this:
>
>     <normal code>
>     <jmp to loop conditional test>
>         <unlikely code>
>         <more likely code>
>     <loop conditional test>
>     <...>
>
>   I.e. the loop is rotated and the unlikely code is first and the hotter
>   code is closer together but between the unlikely and conditional test.
>   Could this be what's going on? Otherwise, maybe clang decided that
>   it's not beneficial to move the code out-of-line because the benefit
>   was minimal? (I'm guessing here.)



-- 
Thanks,
~Nick Desaulniers

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ