lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <zjmiiapcr2zvie2iw4s72g27bzwadrzxkoawv2klqxiq6kofsx@352hdmgvcozt>
Date: Tue, 8 Apr 2025 09:46:52 -0700
From: Josh Poimboeuf <jpoimboe@...nel.org>
To: Uros Bizjak <ubizjak@...il.com>
Cc: Ingo Molnar <mingo@...nel.org>, x86@...nel.org, 
	linux-kernel@...r.kernel.org, Linus Torvalds <torvalds@...ux-foundation.org>, 
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH RFC 3/5] noinstr: Use asm_inline() in
 instrumentation_{begin,end}()

On Tue, Apr 08, 2025 at 01:10:14PM +0200, Uros Bizjak wrote:
> On Tue, Apr 8, 2025 at 10:30 AM Ingo Molnar <mingo@...nel.org> wrote:
> > * Josh Poimboeuf <jpoimboe@...nel.org> wrote:
> >
> > > Use asm_inline() to prevent the compiler from making poor inlining
> > > decisions based on the length of the objtool annotations.
> > >
> > > For a defconfig kernel built with GCC 14.2.1, bloat-o-meter reports a
> > > 0.18% text size increase:
> > >
> > >   add/remove: 88/433 grow/shrink: 967/487 up/down: 87579/-52630 (34949)
> > >   Total: Before=19448407, After=19483356, chg +0.18%
> > >
> > > Presumably the text growth is due to increased inlining.  A net total of
> > > 345 functions were removed.
> >
> > Since +0.18% puts this into the 'significant' category of .text size
> > increases, it would be nice to see a bit more details about the nature
> > of these function calls removed: were they really explicit calls to
> > __instrumentation_begin()/end(), or somehow tail-call optimized out, or
> > something else?

The instrumentation macros are used by WARN*() and lockdep, so this can
affect a lot of functions.

BTW, without the objtool annotations, each of those macros resolves to a
single NOP.  So using inline_asm() seems obviously correct here as it
accurately communicates the code size to the compiler.  I'm not sure if
that was clear from the description.

> > Also, I'm wondering where the 34,949 bytes bloat comes from: with 345
> > functions removed that's 100 bytes per function? Doesn't sound right.
> 
> Please note that removed functions can be inlined at several places. E.g.:
> 
> $ grep "<encode_string>"  objdump.old
> 
> 00000000004506e0 <encode_string>:
>  45113c:       e8 9f f5 ff ff          call   4506e0 <encode_string>
>  452bcb:       e9 10 db ff ff          jmp    4506e0 <encode_string>
>  453d33:       e8 a8 c9 ff ff          call   4506e0 <encode_string>
>  453ef7:       e8 e4 c7 ff ff          call   4506e0 <encode_string>
>  45549f:       e8 3c b2 ff ff          call   4506e0 <encode_string>
>  455843:       e8 98 ae ff ff          call   4506e0 <encode_string>
>  455b37:       e8 a4 ab ff ff          call   4506e0 <encode_string>
>  455b47:       e8 94 ab ff ff          call   4506e0 <encode_string>
>  4564fa:       e8 e1 a1 ff ff          call   4506e0 <encode_string>
>  456669:       e8 72 a0 ff ff          call   4506e0 <encode_string>
>  456691:       e8 4a a0 ff ff          call   4506e0 <encode_string>
>  4566a0:       e8 3b a0 ff ff          call   4506e0 <encode_string>
>  4569aa:       e8 31 9d ff ff          call   4506e0 <encode_string>
>  456e79:       e9 62 98 ff ff          jmp    4506e0 <encode_string>
>  456efe:       e9 dd 97 ff ff          jmp    4506e0 <encode_string>
> 
> all these calls now inline:
> 
> encode_string                                 58       -     -58
> 
> where for example encode_putfh() grows by:
> 
> encode_putfh                                  70     118     +48

Thanks for looking!  That makes sense: encode_string() uses
WARN_ON_ONCE() which uses instrumentation_begin()/end().
encode_string() is getting inlined now that GCC has more accurate
information about its size.

> > Also, is the bloat-o-meter output limited to the .text section, or does
> > it include growth in out-of-line sections too?
>
> bloat-o-meter by default looks at all sybol types ("tTdDbBrRvVwW" as
> returned by nm). Adding "-c" option will categorize the output by
> symbol type (this is x86_64 defconfig change with gcc-4.2.1):
> 
> add/remove: 72/350 grow/shrink: 918/348 up/down: 80532/-46857 (33675)
> Function                                     old     new   delta
> ...
> Total: Before=16685068, After=16718743, chg +0.20%
> add/remove: 0/0 grow/shrink: 0/0 up/down: 0/0 (0)
> Data                                         old     new   delta
> Total: Before=4471889, After=4471889, chg +0.00%
> add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-12 (-12)
> RO Data                                      old     new   delta
> icl_voltage_level_max_cdclk                   12       -     -12
> Total: Before=1783310, After=1783298, chg -0.00%

Right.  That means that bloat-o-meter's results include any
text.unlikely growth/shrinkage, because that code is contained by
symbols (typically .cold subfunctons).

I suppose it would be more helpful if .text.unlikely were excluded or
separated out.  .text.unlikely code growth is always a good thing, as
opposed to .text for which growth can be good or bad.

-- 
Josh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ