lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 7 Nov 2022 21:19:17 +0900
From:   Vincent MAILHOL <mailhol.vincent@...adoo.fr>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     x86@...nel.org, Ingo Molnar <mingo@...hat.com>,
        Borislav Petkov <bp@...en8.de>,
        Nick Desaulniers <ndesaulniers@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        linux-kernel@...r.kernel.org, Yury Norov <yury.norov@...il.com>,
        llvm@...ts.linux.dev, Borislav Petkov <bp@...e.de>
Subject: Re: [PATCH v1 1/2] x86/asm/bitops: Replace __fls() by its generic
 builtin implementation

On Mon. 7 Nov. 2022 at 18:38, Peter Zijlstra <peterz@...radead.org> wrote:
> On Sun, Nov 06, 2022 at 06:51:05PM +0900, Vincent Mailhol wrote:
> > The builtin implementation is better for two reasons:
> >
> >   1/ it saves two instructions on clang (a push and a stack pointer
> >      decrement) because of a useless tentative to save rax.
>
> I'm thinking this is the same old clang-sucks-at-"rm" constraints and
> *really* should not be a reason to change things. Clang should get fixed
> already.
>
> >   2/ when used on constant expressions, the compiler is only able to
> >      fold the builtin version (c.f. [2]).
> >
> > For those two reasons, replace the assembly implementation by its
> > builtin counterpart.
> >
> > [1] https://elixir.bootlin.com/linux/v6.0/source/include/asm-generic/bitops/builtin-__fls.h
> >
> > [2] commit 146034fed6ee ("x86/asm/bitops: Use __builtin_ffs() to evaluate constant expressions")
>
> I would much prefer consistently with 146034fed6ee.

There is one big difference between 146034fed6ee and this patch. For
the ffs() functions, the x86 asm produces *better* code so there is a
reason to keep the x86 asm.
The clang missed optimization is not the core reason for me to send
this patch. The purpose of the x86 asm code is to be more performant
than the generic implementation, isn't it? Let's imagine for a moment
that the x86 asm and the builtin produced exactly the same output,
then what would be the reason for keeping the x86 asm version?

My point is not that x86 asm is worse, but that x86 asm isn't better.
The clang missed optimization is one additional reason for this patch,
not the main one.

Yours sincerely,
Vincent Mailhol

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ