lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wggKAKq1k0Z+EdWR0W9ULM7x9MhEbd_LmU2XPLM9WcjkQ@mail.gmail.com>
Date:   Tue, 17 Jan 2023 12:10:03 -0800
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Nick Desaulniers <ndesaulniers@...gle.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Andy Lutomirski <luto@...nel.org>, x86@...nel.org,
        Kostya Serebryany <kcc@...gle.com>,
        Andrey Ryabinin <ryabinin.a.a@...il.com>,
        Andrey Konovalov <andreyknvl@...il.com>,
        Alexander Potapenko <glider@...gle.com>,
        Taras Madan <tarasmadan@...gle.com>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        "H . J . Lu" <hjl.tools@...il.com>,
        Andi Kleen <ak@...ux.intel.com>,
        Rick Edgecombe <rick.p.edgecombe@...el.com>,
        Bharata B Rao <bharata@....com>,
        Jacob Pan <jacob.jun.pan@...ux.intel.com>,
        Ashok Raj <ashok.raj@...el.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org,
        Sami Tolvanen <samitolvanen@...gle.com>,
        joao@...rdrivepizza.com
Subject: Re: [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until
 the first LAM user

On Tue, Jan 17, 2023 at 11:17 AM Nick Desaulniers
<ndesaulniers@...gle.com> wrote:
>
> Perhaps that was compiler version or config specific?

Possible, but...

The clang code generation annoyed me enough that I actually ended up
rewriting the unlikely test to be outside the loop in commit
ae2a823643d7 ("dcache: move the DCACHE_OP_COMPARE case out of the
__d_lookup_rcu loop").

I think that then made clang no longer have the whole "rotate loop
with unlikely case in the middle" issue.

And then because clang *still* messed up by trying to be too clever (see

    https://lore.kernel.org/all/CAHk-=wjyOB66pofW0mfzDN7SO8zS1EMRZuR-_2aHeO+7kuSrAg@mail.gmail.com/

for details), I also ended up doing commit c4e34dd99f2e ("x86:
simplify load_unaligned_zeropad() implementation").

The end result is that now the compiler almost *cannot* mess up any more.

So the reason clang now does a good job on __d_lookup_rcu() is largely
that I took away all the places where it did badly ;)

That said, clang still generates more register pressure than gcc,
causing the function prologue and epilogue to be rather bigger
(pushing and popping six registers, as opposed to gcc that only needs
three)

Gcc is also better able to schedule the prologue and epilogue together
with the work of the function, which clang seems to always do it as a
"push all" and "pop all" sequence.

That scheduling doesn't matter in that particular place (although it
does make the unlikely case of calling __d_lookup_rcu_op_compare
pointlessly push all regs only to then pop them), but I've seen a few
other cases where it ends up meaning that it always does that full
function prologue even when the *likely* case then returns early and
doesn't actually need any of that work because it didn't use any of
those registers.

But yeah, the RCU pathname lookup looks fine these days. And I don't
actually think it was due to clang changes ;)

                Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ