[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87wncauslw.ffs@tglx>
Date: Mon, 18 Jul 2022 21:29:47 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: LKML <linux-kernel@...r.kernel.org>
Cc: x86@...nel.org, Linus Torvalds <torvalds@...ux-foundation.org>,
Tim Chen <tim.c.chen@...ux.intel.com>,
Josh Poimboeuf <jpoimboe@...nel.org>,
Andrew Cooper <Andrew.Cooper3@...rix.com>,
Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
Johannes Wikner <kwikner@...z.ch>,
Alyssa Milburn <alyssa.milburn@...ux.intel.com>,
Jann Horn <jannh@...gle.com>, "H.J. Lu" <hjl.tools@...il.com>,
Joao Moreira <joao.moreira@...el.com>,
Joseph Nuzman <joseph.nuzman@...el.com>,
Steven Rostedt <rostedt@...dmis.org>,
Juergen Gross <jgross@...e.com>,
"Peter Zijlstra (Intel)" <peterz@...radead.org>,
Masami Hiramatsu <mhiramat@...nel.org>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>
Subject: Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
On Sun, Jul 17 2022 at 01:17, Thomas Gleixner wrote:
> The function alignment option does not work for that because it just
> guarantees that the next function entry is aligned, but the padding size
> depends on the position of the last instruction of the previous function
> which might be anything between 0 and padsize-1 obviously, which is not a
> good starting point to put 10 bytes of accounting code into it reliably.
>
> I hacked up GCC to emit such padding and from first experimentation it
> brings quite some performance back.
>
> IBRS stuff stuff(pad)
> sockperf 14 bytes: -23.76% -19.26% -14.31%
> sockperf 1472 bytes: -22.51% -18.40% -12.25%
> microbench: +37.20% +18.46% +15.47%
> hackbench: +21.24% +10.94% +10.12%
>
> For FIO I don't have numbers yet, but I expect FIO to get a significant
> gain too.
>
>>>From a quick survey it seems to have no impact for the case where the
> thunks are not used. But that really needs some deep investigation and
> there is a potential conflict with the clang CFI efforts.
>
> The kernel text size increases with a Debian config from 9.9M to 10.4M, so
> about 5%. If the thunk is not 16 byte aligned, the text size increase is
> about 3%, but it turned out that 16 byte aligned is slightly faster.
>
> The 16 byte function alignment turned out to be beneficial in general even
> without the thunks. Not much of an improvement, but measurable. We should
> revisit this independent of these horrors.
>
> The implementation falls back to the allocated thunks when padding is not
> available. I'll send out the GCC patch and the required kernel patch as a
> reply to this series after polishing it a bit.
Here it goes. GCC hackery first.
---
Subject: gcc: Add padding in front of function entry points
From: Thomas Gleixner <tglx@...utronix.de>
Date: Fri, 15 Jul 2022 14:37:53 +0200
For testing purposes:
Add a 16 byte padding filled with int3 in front of each function entry
so the kernel can put call depth accounting into it.
Not-Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
---
gcc/config/i386/i386.cc | 11 +++++++++++
gcc/config/i386/i386.h | 7 +++++++
gcc/config/i386/i386.opt | 4 ++++
gcc/doc/invoke.texi | 6 ++++++
4 files changed, 28 insertions(+)
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -6182,6 +6182,17 @@ ix86_code_end (void)
file_end_indicate_split_stack ();
}
+void
+x86_asm_output_function_prefix (FILE *asm_out_file,
+ const char *fnname ATTRIBUTE_UNUSED)
+{
+ if (flag_force_function_padding)
+ {
+ fprintf (asm_out_file, "\t.align 16\n");
+ fprintf (asm_out_file, "\t.skip 16,0xcc\n");
+ }
+}
+
/* Emit code for the SET_GOT patterns. */
const char *
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2860,6 +2860,13 @@ extern enum attr_cpu ix86_schedule;
#define LIBGCC2_UNWIND_ATTRIBUTE __attribute__((target ("no-mmx,no-sse")))
#endif
+#include <stdio.h>
+extern void
+x86_asm_output_function_prefix (FILE *asm_out_file,
+ const char *fnname ATTRIBUTE_UNUSED);
+#undef ASM_OUTPUT_FUNCTION_PREFIX
+#define ASM_OUTPUT_FUNCTION_PREFIX x86_asm_output_function_prefix
+
/*
Local variables:
version-control: t
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1064,6 +1064,10 @@ mindirect-branch=
Target RejectNegative Joined Enum(indirect_branch) Var(ix86_indirect_branch) Init(indirect_branch_keep)
Convert indirect call and jump to call and return thunks.
+mforce-function-padding
+Target Var(flag_force_function_padding) Init(0)
+Put a 16 byte padding area before each function
+
mfunction-return=
Target RejectNegative Joined Enum(indirect_branch) Var(ix86_function_return) Init(indirect_branch_keep)
Convert function return to call and return thunk.
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1451,6 +1451,7 @@ See RS/6000 and PowerPC Options.
-mindirect-branch=@...{choice} -mfunction-return=@...{choice} @gol
-mindirect-branch-register -mharden-sls=@...{choice} @gol
-mindirect-branch-cs-prefix -mneeded -mno-direct-extern-access}
+-mforce-function-padding @gol
@emph{x86 Windows Options}
@gccoptlist{-mconsole -mcygwin -mno-cygwin -mdll @gol
@@ -32849,6 +32850,11 @@ Force all calls to functions to be indir
when using Intel Processor Trace where it generates more precise timing
information for function calls.
+@...m -mforce-function-padding
+@...ndex -mforce-function-padding
+Force a 16 byte padding are before each function which allows run-time
+code patching to put a special prologue before the function entry.
+
@item -mmanual-endbr
@opindex mmanual-endbr
Insert ENDBR instruction at function entry only via the @code{cf_check}
Powered by blists - more mailing lists