lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wgga5hrg5qgm4UwuOCjgBrobhZcdKTr1AFU7WSWgVKKZQ@mail.gmail.com>
Date:   Thu, 28 Jul 2022 14:49:24 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Nick Desaulniers <ndesaulniers@...gle.com>,
        Nathan Chancellor <nathan@...nel.org>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 0/5] lib/find: optimize find_bit() functions

On Thu, Jul 28, 2022 at 11:49 AM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> It builds for me and seems to generate reasonable code, although I
> notice that clang messes up the "__ffs()" inline asm and forces the
> source into memory.

I have created a llvm issue for this at

        https://github.com/llvm/llvm-project/issues/56789

and while I noticed this while looking at the rather odd code
generation for the bit finding functions, it seems to be a general
issue with clang inline asm.

It looks like any instruction that takes a mod/rm input (so a register
or memory) will always force the thing to be in memory. Which is very
pointless in itself, but it actually causes some functions to have a
stack frame that they wouldn't otherwise need or want. So it actually
has secondary downsides too.

And yes, that particular case could be solved with __builtin_ctzl(),
which seems to DTRT. But that uses plain bsf, and we seem to really
want tzcnt ("rep bsf") here, although I didn't check why (the comment
explicitly says "Undefined if no bit exists", which is the main
difference between bsf and tzcnt).

I _think_ it's because tzcnt is faster when it exists exactly because
it always writes the destination, so 'bsf' is actually the inferior
op, and clang shouldn't generate it.

But the "rm" thing exists elsewhere too, and I just checked - this
same issue seems to happen with "g" too (ie "any general integer
input").

                      Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ