[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aAeFYB7E2QiRNeoM@gmail.com>
Date: Tue, 22 Apr 2025 14:02:40 +0200
From: Ingo Molnar <mingo@...nel.org>
To: Josh Poimboeuf <jpoimboe@...nel.org>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org,
Peter Zijlstra <peterz@...radead.org>,
Uros Bizjak <ubizjak@...il.com>
Subject: Re: [PATCH v2] noinstr: Use asm_inline() in
instrumentation_{begin,end}()
* Josh Poimboeuf <jpoimboe@...nel.org> wrote:
> Use asm_inline() in the instrumentation begin/end macros to prevent the
> compiler from making poor inlining decisions based on the length of the
> objtool annotations.
>
> Without the objtool annotations, each macro resolves to a single NOP.
> Using inline_asm() seems obviously correct here as it accurately
> communicates the actual code size to the compiler.
s/inline_asm
/asm_inline
>
> These macros are used by WARN() and lockdep, so this change can affect a
> lot of functions.
>
> For a defconfig kernel built with GCC 14.2.1, bloat-o-meter reports a
> 0.17% increase in text size:
>
> add/remove: 74/352 grow/shrink: 914/353 up/down: 80747/-47120 (33627)
> Total: Before=19460272, After=19493899, chg +0.17%
>
> The text growth is presumably due to increased inlining. A net total of
> 278 functions were removed (+74 / -352). Each of the removed functions
> is likely inlined at multiple sites which explains the somewhat
> significant code growth.
So:
- 353 function shrunk by 47120 bytes, that's -133 bytes per function
affected.
- 914 functions grew by 80747 bytes, that's +88 bytes per function,
but there's 3x of them.
That's a lot of net text growth, isn't it? It's certainly not just a
single instruction or two per inlining, as asm_inline() would suggest.
> One example from Uros:
>
> $ grep "<encode_string>" objdump.old
>
> 00000000004506e0 <encode_string>:
> 45113c: e8 9f f5 ff ff call 4506e0 <encode_string>
> 452bcb: e9 10 db ff ff jmp 4506e0 <encode_string>
> 453d33: e8 a8 c9 ff ff call 4506e0 <encode_string>
> 453ef7: e8 e4 c7 ff ff call 4506e0 <encode_string>
> 45549f: e8 3c b2 ff ff call 4506e0 <encode_string>
> 455843: e8 98 ae ff ff call 4506e0 <encode_string>
> 455b37: e8 a4 ab ff ff call 4506e0 <encode_string>
> 455b47: e8 94 ab ff ff call 4506e0 <encode_string>
> 4564fa: e8 e1 a1 ff ff call 4506e0 <encode_string>
> 456669: e8 72 a0 ff ff call 4506e0 <encode_string>
> 456691: e8 4a a0 ff ff call 4506e0 <encode_string>
> 4566a0: e8 3b a0 ff ff call 4506e0 <encode_string>
> 4569aa: e8 31 9d ff ff call 4506e0 <encode_string>
> 456e79: e9 62 98 ff ff jmp 4506e0 <encode_string>
> 456efe: e9 dd 97 ff ff jmp 4506e0 <encode_string>
>
> All these are calls now inline:
>
> encode_string 58 - -58
>
> ... where for example encode_putfh() grows by:
>
> encode_putfh 70 118 +48
That still doesn't make it clear where the apparently ~10 instructions
per inlining come from, right?
Thanks,
Ingo
Powered by blists - more mailing lists