[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMZ6Rq+Fo=-D5So=KQCvbfaD0XmcFooi9aUV=uhVAe-UfDa9aQ@mail.gmail.com>
Date: Mon, 7 Nov 2022 21:19:17 +0900
From: Vincent MAILHOL <mailhol.vincent@...adoo.fr>
To: Peter Zijlstra <peterz@...radead.org>
Cc: x86@...nel.org, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>,
Nick Desaulniers <ndesaulniers@...gle.com>,
Thomas Gleixner <tglx@...utronix.de>,
linux-kernel@...r.kernel.org, Yury Norov <yury.norov@...il.com>,
llvm@...ts.linux.dev, Borislav Petkov <bp@...e.de>
Subject: Re: [PATCH v1 1/2] x86/asm/bitops: Replace __fls() by its generic
builtin implementation
On Mon. 7 Nov. 2022 at 18:38, Peter Zijlstra <peterz@...radead.org> wrote:
> On Sun, Nov 06, 2022 at 06:51:05PM +0900, Vincent Mailhol wrote:
> > The builtin implementation is better for two reasons:
> >
> > 1/ it saves two instructions on clang (a push and a stack pointer
> > decrement) because of a useless tentative to save rax.
>
> I'm thinking this is the same old clang-sucks-at-"rm" constraints and
> *really* should not be a reason to change things. Clang should get fixed
> already.
>
> > 2/ when used on constant expressions, the compiler is only able to
> > fold the builtin version (c.f. [2]).
> >
> > For those two reasons, replace the assembly implementation by its
> > builtin counterpart.
> >
> > [1] https://elixir.bootlin.com/linux/v6.0/source/include/asm-generic/bitops/builtin-__fls.h
> >
> > [2] commit 146034fed6ee ("x86/asm/bitops: Use __builtin_ffs() to evaluate constant expressions")
>
> I would much prefer consistently with 146034fed6ee.
There is one big difference between 146034fed6ee and this patch. For
the ffs() functions, the x86 asm produces *better* code so there is a
reason to keep the x86 asm.
The clang missed optimization is not the core reason for me to send
this patch. The purpose of the x86 asm code is to be more performant
than the generic implementation, isn't it? Let's imagine for a moment
that the x86 asm and the builtin produced exactly the same output,
then what would be the reason for keeping the x86 asm version?
My point is not that x86 asm is worse, but that x86 asm isn't better.
The clang missed optimization is one additional reason for this patch,
not the main one.
Yours sincerely,
Vincent Mailhol
Powered by blists - more mailing lists