lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZpFhv5VSYZ6jnsd4@yury-ThinkPad>
Date: Fri, 12 Jul 2024 10:02:55 -0700
From: Yury Norov <yury.norov@...il.com>
To: Brian Norris <briannorris@...omium.org>
Cc: Nathan Chancellor <nathan@...nel.org>, llvm@...ts.linux.dev,
	linux-kernel@...r.kernel.org, Bill Wendling <morbo@...gle.com>,
	Justin Stitt <justinstitt@...gle.com>,
	Nick Desaulniers <ndesaulniers@...gle.com>,
	Rasmus Villemoes <linux@...musvillemoes.dk>,
	Kees Cook <kees@...nel.org>
Subject: Re: [PATCH v2] bitmap: Switch from inline to __always_inline

Hi Brian,

Thanks for taking over this!

+ Kees Cook for GCC

On Thu, Jul 11, 2024 at 09:37:11AM -0700, Brian Norris wrote:
> On recent (v6.6+) builds with Clang (based on Clang 18.0.0) and certain
> configurations [0], I'm finding that (lack of) inlining decisions may
> lead to section mismatch warnings like the following:
> 
>   WARNING: modpost: vmlinux.o: section mismatch in reference:
>   cpumask_andnot (section: .text) ->
>   cpuhp_bringup_cpus_parallel.tmp_mask (section: .init.data) ERROR:
>   modpost: Section mismatches detected.
> 
> or more confusingly:
> 
>   WARNING: modpost: vmlinux: section mismatch in reference:
>   cpumask_andnot+0x5f (section: .text) -> efi_systab_phys (section:
>   .init.data)
> 
> The first warning makes a little sense, because
> cpuhp_bringup_cpus_parallel() (an __init function) calls
> cpumask_andnot() on tmp_mask (an __initdata symbol). If the compiler
> doesn't inline cpumask_andnot(), this may appear like a mismatch.
> 
> The second warning makes less sense, but might be because efi_systab_phys
> and cpuhp_bringup_cpus_parallel.tmp_mask are laid out near each other,
> and the latter isn't a proper C symbol definition.
> 
> In any case, it seems a reasonable solution to suggest more strongly to
> the compiler that these cpumask macros *must* be inlined, as 'inline' is
> just a recommendation.
> 
> Based on recent discussions, it seems like a good idea to inline the
> bitmap functions which make up cpumask*() implementations, as well as a
> few other bitmap users, to ensure the inlining doesn't break in the
> future and produce the same problems, as well as to give the best chance
> that the intended small_const_nbits() optimizations carry through.

small_const_nbits() optimization always carries through. As far as I
understand this things, the problem is that inliner may make a (wrong)
no-go decision before unwinding the small_const_nbits() part.

small_const_nbits() always leads to eliminating either inline or outline
code, but inliner seemingly doesn't understand it. This leads to having
those tiny functions uninlined.

But I'm not sure about that and don't know how to check what happens
under the compilers' hood. Can compiler gurus please clarify?

> This change has been previously proposed in the past as:
> 
>   Subject: [PATCH 1/3] bitmap: switch from inline to __always_inline
>   https://lore.kernel.org/all/20221027043810.350460-2-yury.norov@gmail.com/
> 
> In the end, this looks like a rebase of Yury's work, although
> technically it's a rewrite.

I like it more, when it's split onto several patches. We already have
my bitmap.h rework and your cpumask.h one with their own reasonable
motivations. Can you keep that patch structure please? Then, we can add
separate patches for find.h and nodemask.h. If rebasing the old bitmap.h
patch isn't clean and takes some effort, feel free to add your
co-developed-by.
 
> According to bloat-o-meter, my vmlinux decreased in size by a total of
> 1538 bytes (-0.01%) with this change.
> 
> [0] CONFIG_HOTPLUG_PARALLEL=y ('select'ed for x86 as of [1]) and
>     CONFIG_GCOV_PROFILE_ALL.
> 
> [1] commit 0c7ffa32dbd6 ("x86/smpboot/64: Implement
>     arch_cpuhp_init_parallel_bringup() and enable it")
> 
> Cc: Yury Norov <yury.norov@...il.com>
> Signed-off-by: Brian Norris <briannorris@...omium.org>

So... This whole optimization is only worth the reviewer's time if we
can prove it's sustainable across different compilers and configurations.

Just saying that under some particular config it works, brings little
value for those who are not interested in that config. Moreover, if
one enables GCOV, he likely doesn't care much about the .text size.
And for those using release configs, it only brings uncertainty.

Let's test it broader? We've got 2 main compilers - gcc and llvm, and
at least two configurations that may be relevant: defconfig and your
HOTPLUG_PARALLEL + GCOV_PROFILE_ALL. And it would be nice to prove
that the optimization is sustainable across compiler versions.

I have the following table in mind:

           defconfig   your conf  
GCC    9 |  -1900    |           |
CLANG 13 |           |           |
GCC   13 |           |           |
CLANG 18 |  -100     |  -1538    |

The defconfig nubmers above are from my past experiments. And you can
add more configs/compilers, of course.

> ---
> 
> Changes in v2:
>  - rebase, update cpumask.h based on recent additions and removals
>  - also convert bitmap.h, find.h, nodemask.h
> 
>  include/linux/bitmap.h   | 122 ++++++++++++----------
>  include/linux/cpumask.h  | 216 +++++++++++++++++++++------------------
>  include/linux/find.h     |  50 ++++-----
>  include/linux/nodemask.h |  86 ++++++++--------
>  4 files changed, 249 insertions(+), 225 deletions(-)
> 
> diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
> index 8c4768c44a01..7f44b4428d77 100644
> --- a/include/linux/bitmap.h
> +++ b/include/linux/bitmap.h
> @@ -203,7 +203,7 @@ unsigned long bitmap_find_next_zero_area_off(unsigned long *map,
>   * the bit offset of all zero areas this function finds is multiples of that
>   * power of 2. A @align_mask of 0 means no alignment is required.
>   */
> -static inline unsigned long
> +static __always_inline unsigned long
>  bitmap_find_next_zero_area(unsigned long *map,
>  			   unsigned long size,
>  			   unsigned long start,

Let's split them such that return type would be at the same line as
the function name. It would help to distinguish function declaration
from invocations in grep output:

   static __always_inline
   unsigned long bitmap_find_next_zero_area(unsigned long *map,
   			   unsigned long size,
   			   unsigned long start,

Thanks,
Yury

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ