lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wh1xHi-WeytuAK1-iSsR0wi=6e4-WgFq6ZPt8Z1mvqoNA@mail.gmail.com>
Date:   Sun, 14 Aug 2022 14:14:08 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Al Viro <viro@...iv.linux.org.uk>,
        Peter Zijlstra <peterz@...radead.org>
Cc:     Nathan Chancellor <nathan@...nel.org>,
        Nick Desaulniers <ndesaulniers@...gle.com>,
        Jeff Layton <jlayton@...nel.org>,
        Ilya Dryomov <idryomov@...il.com>, ceph-devel@...r.kernel.org,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Matthew Wilcox <willy@...radead.org>,
        clang-built-linux <llvm@...ts.linux.dev>
Subject: Simplify load_unaligned_zeropad() (was Re: [GIT PULL] Ceph updates
 for 5.20-rc1)

On Sun, Aug 14, 2022 at 1:29 PM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> I dug into it some more, and it is really "load_unaligned_zeropad()"
> that makes clang really uncomfortable.
>
> The problem ends up being that clang sees that it's inside that inner
> loop, and tries very hard to optimize the shift-and-mask that happens
> if the exception happens.
>
> The fact that the exception *never* happens unless DEBUG_PAGEALLOC is
> enabled - and very very seldom even then - is not something we can
> really explain to clang.

Hmm.

The solution to that is actually to move the 'zeropad' part into the
exception handler.

That makes both gcc and clang generate quite nice code for this all.
But perhaps equally importantly, it actually simplifies the code
noticeably:

 arch/x86/include/asm/extable_fixup_types.h |  2 ++
 arch/x86/include/asm/word-at-a-time.h      | 50 +++---------------------------
 arch/x86/mm/extable.c                      | 30 ++++++++++++++++++
 3 files changed, 37 insertions(+), 45 deletions(-)

and that's with 11 of those 37 new lines being a new block comment.

It's mainly because now there's no need to worry about
CONFIG_CC_HAS_ASM_GOTO_OUTPUT in load_unaligned_zeropad(), because the
asm becomes a simple "address in, data out" thing.

And to make the fixup code simple and straightforward, I just made
"load_unaligned_zeropad()" use fixed registers for address and output,
which doesn't bother the compilers at all, they'll happily adjust
their register allocation. The code generation ends up much better
than with the goto and the subtle address games that never trigger
anyway.

PeterZ - you've touched both the load_unaligned_zeropad() and the
exception code last, so let's run this past you, but this really does
seem to not only fix the code generation issue in fs/dcache.s, it just
looks simpler too. Comments?

             Linus

View attachment "patch.diff" of type "text/x-patch" (4076 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ