[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZpGVLl7NQZScd8aH@google.com>
Date: Fri, 12 Jul 2024 13:42:22 -0700
From: Brian Norris <briannorris@...omium.org>
To: Yury Norov <yury.norov@...il.com>
Cc: Nathan Chancellor <nathan@...nel.org>, llvm@...ts.linux.dev,
	linux-kernel@...r.kernel.org, Bill Wendling <morbo@...gle.com>,
	Justin Stitt <justinstitt@...gle.com>,
	Nick Desaulniers <ndesaulniers@...gle.com>,
	Rasmus Villemoes <linux@...musvillemoes.dk>,
	Kees Cook <kees@...nel.org>
Subject: Re: [PATCH v2] bitmap: Switch from inline to __always_inline
Hi Yury,
On Fri, Jul 12, 2024 at 10:02:55AM -0700, Yury Norov wrote:
> On Thu, Jul 11, 2024 at 09:37:11AM -0700, Brian Norris wrote:
> > Based on recent discussions, it seems like a good idea to inline the
> > bitmap functions which make up cpumask*() implementations, as well as a
> > few other bitmap users, to ensure the inlining doesn't break in the
> > future and produce the same problems, as well as to give the best chance
> > that the intended small_const_nbits() optimizations carry through.
> 
> small_const_nbits() optimization always carries through. As far as I
> understand this things, the problem is that inliner may make a (wrong)
> no-go decision before unwinding the small_const_nbits() part.
> 
> small_const_nbits() always leads to eliminating either inline or outline
> code, but inliner seemingly doesn't understand it. This leads to having
> those tiny functions uninlined.
I think we're more or less saying the same thing -- if for whatever
reason the compiler doesn't inline, then these will never be "small
consts", and therefore the machinery effectively "doesn't carry
through".
Per previous discussions, there are many reasons a compiler may ignore
the 'inline' hint.
> But I'm not sure about that and don't know how to check what happens
> under the compilers' hood. Can compiler gurus please clarify?
> 
> > This change has been previously proposed in the past as:
> > 
> >   Subject: [PATCH 1/3] bitmap: switch from inline to __always_inline
> >   https://lore.kernel.org/all/20221027043810.350460-2-yury.norov@gmail.com/
> > 
> > In the end, this looks like a rebase of Yury's work, although
> > technically it's a rewrite.
> 
> I like it more, when it's split onto several patches. We already have
> my bitmap.h rework and your cpumask.h one with their own reasonable
> motivations. Can you keep that patch structure please? Then, we can add
I'm not quite sure what you mean. I imitated the patch structure of your
patch 1 that I linked, which had the following file-diff list:
 include/linux/bitmap.h   |  46 ++++++-------
 include/linux/cpumask.h  | 144 +++++++++++++++++++--------------------
 include/linux/find.h     |  40 +++++------
 include/linux/nodemask.h |  86 +++++++++++------------
Are you suggesting I should split that into 2 (one for cpumask and one
for the rest)?
> separate patches for find.h and nodemask.h. If rebasing the old bitmap.h
> patch isn't clean and takes some effort, feel free to add your
> co-developed-by.
I don't really care who gets authorship, but I did manually rewrite it,
since there were enough conflicts, and it's basically scriptable. Feel
free to suggest which Author/S-o-b/Co-developed combo makes sense for
that, and I can adopt it in v3 :)
> > According to bloat-o-meter, my vmlinux decreased in size by a total of
> > 1538 bytes (-0.01%) with this change.
> > 
> > [0] CONFIG_HOTPLUG_PARALLEL=y ('select'ed for x86 as of [1]) and
> >     CONFIG_GCOV_PROFILE_ALL.
> > 
> > [1] commit 0c7ffa32dbd6 ("x86/smpboot/64: Implement
> >     arch_cpuhp_init_parallel_bringup() and enable it")
> > 
> > Cc: Yury Norov <yury.norov@...il.com>
> > Signed-off-by: Brian Norris <briannorris@...omium.org>
> 
> So... This whole optimization is only worth the reviewer's time if we
> can prove it's sustainable across different compilers and configurations.
> 
> Just saying that under some particular config it works, brings little
> value for those who are not interested in that config. Moreover, if
> one enables GCOV, he likely doesn't care much about the .text size.
> And for those using release configs, it only brings uncertainty.
I think I partially disagree. If it's neutral for one compiler but
helpful for the other, that's still a win. But otherwise, I agree we
should have some accounting of diversity -- not just a single test.
> Let's test it broader? We've got 2 main compilers - gcc and llvm, and
> at least two configurations that may be relevant: defconfig and your
> HOTPLUG_PARALLEL + GCOV_PROFILE_ALL. And it would be nice to prove
> that the optimization is sustainable across compiler versions.
> 
> I have the following table in mind:
> 
>            defconfig   your conf  
> GCC    9 |  -1900    |           |
> CLANG 13 |           |           |
> GCC   13 |           |           |
> CLANG 18 |  -100     |  -1538    |
> 
> The defconfig nubmers above are from my past experiments. And you can
> add more configs/compilers, of course.
Perhaps I wasn't too clear, but I was actually testing a few things:
* my ChromeOS-y build, with gcov and clang -- to ensure the section
  mismatch warning/error disappears
* a pure mainline / semi-defaults build (which happened to use my host
  distro's gcc 13), to do size comparisons. I also happened to enable
  gcov in the currently-posted experiments.
But agreed, I can get a larger matrix with a bit more specifity in my
commit log. TBD on the ease of getting a Clang 13 or GCC 9 toolchain,
but I think I can convince my distro to provide them.
> > --- a/include/linux/bitmap.h
> > +++ b/include/linux/bitmap.h
> > @@ -203,7 +203,7 @@ unsigned long bitmap_find_next_zero_area_off(unsigned long *map,
> >   * the bit offset of all zero areas this function finds is multiples of that
> >   * power of 2. A @align_mask of 0 means no alignment is required.
> >   */
> > -static inline unsigned long
> > +static __always_inline unsigned long
> >  bitmap_find_next_zero_area(unsigned long *map,
> >  			   unsigned long size,
> >  			   unsigned long start,
> 
> Let's split them such that return type would be at the same line as
> the function name. It would help to distinguish function declaration
> from invocations in grep output:
Ack. I thought I was being consistent with existing cpumask.h style, but
I got this set wrong. Will fix.
Brian
Powered by blists - more mailing lists
 
