linux-kernel - Re: [PATCH v2] noinstr: Use asm_inline() in instrumentation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAFULd4bo0NGzZGLEs+pYoOJrDVLyKt2=Piug-LtU-WhFGwYTzQ@mail.gmail.com>
Date: Tue, 22 Apr 2025 15:22:06 +0200
From: Uros Bizjak <ubizjak@...il.com>
To: Ingo Molnar <mingo@...nel.org>
Cc: Josh Poimboeuf <jpoimboe@...nel.org>, x86@...nel.org, linux-kernel@...r.kernel.org, 
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH v2] noinstr: Use asm_inline() in instrumentation_{begin,end}()

On Tue, Apr 22, 2025 at 2:02 PM Ingo Molnar <mingo@...nel.org> wrote:
>
>
> * Josh Poimboeuf <jpoimboe@...nel.org> wrote:
>
> > Use asm_inline() in the instrumentation begin/end macros to prevent the
> > compiler from making poor inlining decisions based on the length of the
> > objtool annotations.
> >
> > Without the objtool annotations, each macro resolves to a single NOP.
> > Using inline_asm() seems obviously correct here as it accurately
> > communicates the actual code size to the compiler.
>
> s/inline_asm
>  /asm_inline
>
> >
> > These macros are used by WARN() and lockdep, so this change can affect a
> > lot of functions.
> >
> > For a defconfig kernel built with GCC 14.2.1, bloat-o-meter reports a
> > 0.17% increase in text size:
> >
> >   add/remove: 74/352 grow/shrink: 914/353 up/down: 80747/-47120 (33627)
> >   Total: Before=19460272, After=19493899, chg +0.17%
> >
> > The text growth is presumably due to increased inlining.  A net total of
> > 278 functions were removed (+74 / -352).  Each of the removed functions
> > is likely inlined at multiple sites which explains the somewhat
> > significant code growth.
>
> So:
>
>  - 353 function shrunk by 47120 bytes, that's -133 bytes per function
>    affected.
>
>  - 914 functions grew by 80747 bytes, that's +88 bytes per function,
>    but there's 3x of them.
>
> That's a lot of net text growth, isn't it? It's certainly not just a
> single instruction or two per inlining, as asm_inline() would suggest.
>
> > One example from Uros:
> >
> >     $ grep "<encode_string>"  objdump.old
> >
> >     00000000004506e0 <encode_string>:
> >      45113c:       e8 9f f5 ff ff          call   4506e0 <encode_string>
> >      452bcb:       e9 10 db ff ff          jmp    4506e0 <encode_string>
> >      453d33:       e8 a8 c9 ff ff          call   4506e0 <encode_string>
> >      453ef7:       e8 e4 c7 ff ff          call   4506e0 <encode_string>
> >      45549f:       e8 3c b2 ff ff          call   4506e0 <encode_string>
> >      455843:       e8 98 ae ff ff          call   4506e0 <encode_string>
> >      455b37:       e8 a4 ab ff ff          call   4506e0 <encode_string>
> >      455b47:       e8 94 ab ff ff          call   4506e0 <encode_string>
> >      4564fa:       e8 e1 a1 ff ff          call   4506e0 <encode_string>
> >      456669:       e8 72 a0 ff ff          call   4506e0 <encode_string>
> >      456691:       e8 4a a0 ff ff          call   4506e0 <encode_string>
> >      4566a0:       e8 3b a0 ff ff          call   4506e0 <encode_string>
> >      4569aa:       e8 31 9d ff ff          call   4506e0 <encode_string>
> >      456e79:       e9 62 98 ff ff          jmp    4506e0 <encode_string>
> >      456efe:       e9 dd 97 ff ff          jmp    4506e0 <encode_string>
> >
> >     All these are calls now inline:
> >
> >     encode_string                                 58       -     -58
> >
> >     ... where for example encode_putfh() grows by:
> >
> >     encode_putfh                                  70     118     +48
>
> That still doesn't make it clear where the apparently ~10 instructions
> per inlining come from, right?

The growth is actually from different inlining decisions, that cover
not only inlining of small functions, but other code blocks (hot vs.
cold, tail duplication, etc) too. The compiler uses certain thresholds
to estimate inlining gain (thresholds are different for -Os and -O2).
Artificially bloated functions that don't use asm_inline() fall under
this threshold (IOW, the inlining would increase size too much), so
they are not inlined; code blocks that enclose unfixed asm clauses are
treated differently than when they use asm_inline() instead of asm().
When asm_inline() is introduced, the size of the function (and
consequently inlining gain) is estimated more accurately, the
estimated size is lower, so there is more inlining happening.

I'd again remark that the code size is not the right metric when
compiling with -O2.

Uros.