[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMuHMdXR0Nu+RENB8rFnJFiW=T0P7Kq_XAG7t1MF=fdyD6pUGw@mail.gmail.com>
Date: Tue, 7 Jun 2022 14:45:41 +0200
From: Geert Uytterhoeven <geert@...ux-m68k.org>
To: Alexander Lobakin <alexandr.lobakin@...el.com>
Cc: Arnd Bergmann <arnd@...db.de>, Yury Norov <yury.norov@...il.com>,
Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
Richard Henderson <rth@...ddle.net>,
Matt Turner <mattst88@...il.com>,
Brian Cain <bcain@...cinc.com>,
Yoshinori Sato <ysato@...rs.sourceforge.jp>,
Rich Felker <dalias@...c.org>,
"David S. Miller" <davem@...emloft.net>,
Kees Cook <keescook@...omium.org>,
"Peter Zijlstra (Intel)" <peterz@...radead.org>,
Marco Elver <elver@...gle.com>, Borislav Petkov <bp@...e.de>,
Tony Luck <tony.luck@...el.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
alpha <linux-alpha@...r.kernel.org>,
"open list:QUALCOMM HEXAGON..." <linux-hexagon@...r.kernel.org>,
"linux-ia64@...r.kernel.org" <linux-ia64@...r.kernel.org>,
linux-m68k <linux-m68k@...ts.linux-m68k.org>,
Linux-sh list <linux-sh@...r.kernel.org>,
sparclinux <sparclinux@...r.kernel.org>,
Linux-Arch <linux-arch@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 0/6] bitops: let optimize out non-atomic bitops on
compile-time constants
Hi Alexander,
On Mon, Jun 6, 2022 at 1:50 PM Alexander Lobakin
<alexandr.lobakin@...el.com> wrote:
> While I was working on converting some structure fields from a fixed
> type to a bitmap, I started observing code size increase not only in
> places where the code works with the converted structure fields, but
> also where the converted vars were on the stack. That said, the
> following code:
>
> DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
> unsigned long bar = BIT(BAR_BIT);
> unsigned long baz = 0;
>
> __set_bit(FOO_BIT, foo);
> baz |= BIT(BAZ_BIT);
>
> BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
> BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
> BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
>
> triggers the first assertion on x86_64, which means that the
> compiler is unable to evaluate it to a compile-time initializer
> when the architecture-specific bitop is used even if it's obvious.
> I found that this is due to that many architecture-specific
> non-atomic bitop implementations use inline asm or other hacks which
> are faster or more robust when working with "real" variables (i.e.
> fields from the structures etc.), but the compilers have no clue how
> to optimize them out when called on compile-time constants.
>
> So, in order to let the compiler optimize out such cases, expand the
> test_bit() and __*_bit() definitions with a compile-time condition
> check, so that they will pick the generic C non-atomic bitop
> implementations when all of the arguments passed are compile-time
> constants, which means that the result will be a compile-time
> constant as well and the compiler will produce more efficient and
> simple code in 100% cases (no changes when there's at least one
> non-compile-time-constant argument).
> The condition itself:
>
> if (
> __builtin_constant_p(nr) && /* <- bit position is constant */
> __builtin_constant_p(!!addr) && /* <- compiler knows bitmap addr is
> always either NULL or not */
> addr && /* <- bitmap addr is not NULL */
> __builtin_constant_p(*addr) /* <- compiler knows the value of
> the target bitmap */
> )
> /* then pick the generic C variant
> else
> /* old code path, arch-specific
>
> I also tried __is_constexpr() as suggested by Andy, but it was
> always returning 0 ('not a constant') for the 2,3 and 4th
> conditions.
>
> The savings on x86_64 with LLVM are insane (.text):
>
> $ scripts/bloat-o-meter -c vmlinux.{base,test}
> add/remove: 72/75 grow/shrink: 182/518 up/down: 53925/-137810 (-83885)
>
> $ scripts/bloat-o-meter -c vmlinux.{base,mod}
> add/remove: 7/1 grow/shrink: 1/19 up/down: 1135/-4082 (-2947)
>
> $ scripts/bloat-o-meter -c vmlinux.{base,all}
> add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
Thank you!
I gave it a try on m68k, and am a bit disappointed seeing an increase
in code size:
add/remove: 49/13 grow/shrink: 279/138 up/down: 6434/-3342 (3092)
This is atari_defconfig on a tree based on v5.19-rc1, with
m68k-linux-gnu-gcc (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0, GNU ld (GNU
Binutils for Ubuntu) 2.34).
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@...ux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
Powered by blists - more mailing lists