[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKwvOd=fcF=y-mBtPZ9QcVe++__jo11St4=+roPKrGh5D6FH_g@mail.gmail.com>
Date: Tue, 17 Jan 2023 11:17:15 -0800
From: Nick Desaulniers <ndesaulniers@...gle.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Peter Zijlstra <peterz@...radead.org>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Andy Lutomirski <luto@...nel.org>, x86@...nel.org,
Kostya Serebryany <kcc@...gle.com>,
Andrey Ryabinin <ryabinin.a.a@...il.com>,
Andrey Konovalov <andreyknvl@...il.com>,
Alexander Potapenko <glider@...gle.com>,
Taras Madan <tarasmadan@...gle.com>,
Dmitry Vyukov <dvyukov@...gle.com>,
"H . J . Lu" <hjl.tools@...il.com>,
Andi Kleen <ak@...ux.intel.com>,
Rick Edgecombe <rick.p.edgecombe@...el.com>,
Bharata B Rao <bharata@....com>,
Jacob Pan <jacob.jun.pan@...ux.intel.com>,
Ashok Raj <ashok.raj@...el.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org,
Sami Tolvanen <samitolvanen@...gle.com>,
joao@...rdrivepizza.com
Subject: Re: [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until
the first LAM user
On Tue, Jan 17, 2023 at 10:34 AM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> On Tue, Jan 17, 2023 at 10:26 AM Nick Desaulniers
> <ndesaulniers@...gle.com> wrote:
> >
> > On Tue, Jan 17, 2023 at 9:29 AM Linus Torvalds
> > <torvalds@...ux-foundation.org> wrote:
> > >
> > > Side note: that's not something new or unusual. It's been the case
> > > since I started testing clang - we have several code-paths where we
> > > use "unlikely()" to try to get very unlikely cases to be out-of-line,
> > > and clang just mostly ignores it, or treats it as a very weak hint. I
> > > think the only way to get clang to treat it as a *strong* hint is to
> > > use PGO.
> >
> > I'd be surprised if that were intentional or by design.
> >
> > Do you guys have a bug report we could look at?
>
> Heh. I actually sent you an example long ago. Let me go fish it out of
> my mail archives and quote some of it below so that you can find it in
> yours..
>
> Linus
>
> [ Time passes. Found this email to you and Bill Wendling from Feb 16,
> 2020, Message-ID
> CAHk-=wigVshsByCMjkUiZyQSR5N5zi2aAeQc+VJCzQV=nm8E7g@...l.gmail.com ]:
>
> Anyway, I'm looking at clang code generation, and comparing it with
> gcc on one of my "this has been optimized to hell and back" functions:
> __d_lookup_rcu().
>
> It looks like clang does frame pointers, and ignores our
> likely/unlikely annotations.
>
> So this code:
>
> if (unlikely(parent->d_flags & DCACHE_OP_COMPARE)) {
> int tlen;
> const char *tname;
> ......
>
> doesn't actually jump out of line, but instead generates the unlikely
> case as the fallthrough:
>
> testb $2, (%r12)
> je .LBB50_9
> ... unlikely code goes here...
Perhaps that was compiler version or config specific?
$ make LLVM=1 -j128 defconfig fs/dcache.o
$ llvm-objdump -d --no-show-raw-insn
--disassemble-symbols=__d_lookup_rcu fs/dcache.o
0000000000003210 <__d_lookup_rcu>:
3210: endbr64
3214: pushq %rbp
3215: pushq %r15
3217: pushq %r14
3219: pushq %r12
321b: pushq %rbx
321c: testb $0x2, (%rdi)
321f: jne 0x32d7 <__d_lookup_rcu+0xc7>
...
32d7: popq %rbx
32d8: popq %r12
32da: popq %r14
32dc: popq %r15
32de: popq %rbp
32df: jmp 0x3300 <__d_lookup_rcu_op_compare>
That looks like what you want, yeah?
Your original report was from nearly 3 years ago; could have fixed a
few instances of branch weights not getting propagated since then.
What's going on in this case in this thread? I know we don't support
hot/cold attributes on labels yet, but if static_branch_likely (or
friends) is being used, we assign the indirect branches a 0%
likeliness/branch-weight.
>
> and then the likely case ends up having unfortunate reloads inside the
> hot loop. Possibly because it has one fewer free registers than gcc
> because of the frame pointer.
>
> I didn't look into _why_ clang generates frame pointers but gcc
> doesn't. It may be just a compiler default, I think we don't end up
> explicitly asking either way.
>
> [ And then Bill replied with this ]
>
> It's not a no-op. We add branch probabilities to the IR, whether
> they're honored or not depends on the transformation. But they
> *should* be honored when available. I've seen in the past that instead
> of moving unlikely blocks out of the way (like gcc, which moves them
> below the function's "ret" instruction), LLVM will do something like
> this:
>
> <normal code>
> <jmp to loop conditional test>
> <unlikely code>
> <more likely code>
> <loop conditional test>
> <...>
>
> I.e. the loop is rotated and the unlikely code is first and the hotter
> code is closer together but between the unlikely and conditional test.
> Could this be what's going on? Otherwise, maybe clang decided that
> it's not beneficial to move the code out-of-line because the benefit
> was minimal? (I'm guessing here.)
--
Thanks,
~Nick Desaulniers
Powered by blists - more mailing lists