lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAFULd4bo0NGzZGLEs+pYoOJrDVLyKt2=Piug-LtU-WhFGwYTzQ@mail.gmail.com>
Date: Tue, 22 Apr 2025 15:22:06 +0200
From: Uros Bizjak <ubizjak@...il.com>
To: Ingo Molnar <mingo@...nel.org>
Cc: Josh Poimboeuf <jpoimboe@...nel.org>, x86@...nel.org, linux-kernel@...r.kernel.org, 
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH v2] noinstr: Use asm_inline() in instrumentation_{begin,end}()

On Tue, Apr 22, 2025 at 2:02 PM Ingo Molnar <mingo@...nel.org> wrote:
>
>
> * Josh Poimboeuf <jpoimboe@...nel.org> wrote:
>
> > Use asm_inline() in the instrumentation begin/end macros to prevent the
> > compiler from making poor inlining decisions based on the length of the
> > objtool annotations.
> >
> > Without the objtool annotations, each macro resolves to a single NOP.
> > Using inline_asm() seems obviously correct here as it accurately
> > communicates the actual code size to the compiler.
>
> s/inline_asm
>  /asm_inline
>
> >
> > These macros are used by WARN() and lockdep, so this change can affect a
> > lot of functions.
> >
> > For a defconfig kernel built with GCC 14.2.1, bloat-o-meter reports a
> > 0.17% increase in text size:
> >
> >   add/remove: 74/352 grow/shrink: 914/353 up/down: 80747/-47120 (33627)
> >   Total: Before=19460272, After=19493899, chg +0.17%
> >
> > The text growth is presumably due to increased inlining.  A net total of
> > 278 functions were removed (+74 / -352).  Each of the removed functions
> > is likely inlined at multiple sites which explains the somewhat
> > significant code growth.
>
> So:
>
>  - 353 function shrunk by 47120 bytes, that's -133 bytes per function
>    affected.
>
>  - 914 functions grew by 80747 bytes, that's +88 bytes per function,
>    but there's 3x of them.
>
> That's a lot of net text growth, isn't it? It's certainly not just a
> single instruction or two per inlining, as asm_inline() would suggest.
>
> > One example from Uros:
> >
> >     $ grep "<encode_string>"  objdump.old
> >
> >     00000000004506e0 <encode_string>:
> >      45113c:       e8 9f f5 ff ff          call   4506e0 <encode_string>
> >      452bcb:       e9 10 db ff ff          jmp    4506e0 <encode_string>
> >      453d33:       e8 a8 c9 ff ff          call   4506e0 <encode_string>
> >      453ef7:       e8 e4 c7 ff ff          call   4506e0 <encode_string>
> >      45549f:       e8 3c b2 ff ff          call   4506e0 <encode_string>
> >      455843:       e8 98 ae ff ff          call   4506e0 <encode_string>
> >      455b37:       e8 a4 ab ff ff          call   4506e0 <encode_string>
> >      455b47:       e8 94 ab ff ff          call   4506e0 <encode_string>
> >      4564fa:       e8 e1 a1 ff ff          call   4506e0 <encode_string>
> >      456669:       e8 72 a0 ff ff          call   4506e0 <encode_string>
> >      456691:       e8 4a a0 ff ff          call   4506e0 <encode_string>
> >      4566a0:       e8 3b a0 ff ff          call   4506e0 <encode_string>
> >      4569aa:       e8 31 9d ff ff          call   4506e0 <encode_string>
> >      456e79:       e9 62 98 ff ff          jmp    4506e0 <encode_string>
> >      456efe:       e9 dd 97 ff ff          jmp    4506e0 <encode_string>
> >
> >     All these are calls now inline:
> >
> >     encode_string                                 58       -     -58
> >
> >     ... where for example encode_putfh() grows by:
> >
> >     encode_putfh                                  70     118     +48
>
> That still doesn't make it clear where the apparently ~10 instructions
> per inlining come from, right?

The growth is actually from different inlining decisions, that cover
not only inlining of small functions, but other code blocks (hot vs.
cold, tail duplication, etc) too. The compiler uses certain thresholds
to estimate inlining gain (thresholds are different for -Os and -O2).
Artificially bloated functions that don't use asm_inline() fall under
this threshold (IOW, the inlining would increase size too much), so
they are not inlined; code blocks that enclose unfixed asm clauses are
treated differently than when they use asm_inline() instead of asm().
When asm_inline() is introduced, the size of the function (and
consequently inlining gain) is estimated more accurately, the
estimated size is lower, so there is more inlining happening.

I'd again remark that the code size is not the right metric when
compiling with -O2.

Uros.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ