lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 22 Jul 2022 13:11:25 -0700
From:   Tim Chen <tim.c.chen@...ux.intel.com>
To:     Thomas Gleixner <tglx@...utronix.de>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        the arch/x86 maintainers <x86@...nel.org>,
        Josh Poimboeuf <jpoimboe@...nel.org>,
        Andrew Cooper <Andrew.Cooper3@...rix.com>,
        Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
        Johannes Wikner <kwikner@...z.ch>,
        Alyssa Milburn <alyssa.milburn@...ux.intel.com>,
        Jann Horn <jannh@...gle.com>, "H.J. Lu" <hjl.tools@...il.com>,
        Joao Moreira <joao.moreira@...el.com>,
        Joseph Nuzman <joseph.nuzman@...el.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Juergen Gross <jgross@...e.com>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>
Subject: Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation

On Mon, 2022-07-18 at 22:44 +0200, Thomas Gleixner wrote:
> On Mon, Jul 18 2022 at 12:51, Linus Torvalds wrote:
> > On Mon, Jul 18, 2022 at 12:30 PM Thomas Gleixner <tglx@...utronix.de> wrote:
> > > Let the compiler add a 16 byte padding in front of each function entry
> > > point and put the call depth accounting there. That avoids calling out
> > > into the module area and reduces ITLB pressure.
> > 
> > Ooh.
> > 
> > I actually like this a lot better.
> > 
> > Could we just say "use this instead if you have SKL and care about the issue?"
> > 
> > I don't hate your module thunk trick, but this does seem *so* much
> > simpler, and if it performs better anyway, it really does seem like
> > the better approach.
> 
> Yes, Peter and I came from avoiding a new compiler and the overhead for
> everyone when putting the padding into the code. We realized only when
> staring at the perf data that this padding in front of the function
> might be an acceptable solution. I did some more tests today on different
> machines with mitigations=off with kernels compiled with and without
> that padding. I couldn't find a single test case where the result was
> outside of the usual noise. But then my tests are definitely incomplete.
> 

Here are some performance numbers for FIO running on a SKX server with
Intel Cold Stream SSD. Padding improves performance significantly.

Tested latest depth tracking code from Thomas:
https://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git/log/?h=depthtracking
(SHA1 714d29e3e7e3faac27142424ae2533163ddd3a46)
latest gcc patch from Thomas is included at the end.


					Baseline	Baseline
read (kIOPs)		Mean	stdev	mitigations=off	retbleed=off	CPU util
================================================================================
mitigations=off		356.33	6.35	0.00%		7.11%		98.93%
retbleed=off		332.67	5.51	-6.64%		0.00%		99.16%
retbleed=ibrs		242.00	5.57	-32.09%		-27.25%		99.41%
retbleed=stuff (nopad)	281.67	4.62	-20.95%		-15.33%		99.35%
retbleed=stuff (pad)	310.67	0.58	-12.82%		-6.61%		99.29%
					
read/write 				Baseline 	Baseline 
70/30 (kIOPs)		Mean	stdev	mitigations=off	retbleed=off	CPU util
================================================================================
mitigations=off		340.60	8.12	0.00%		4.01%		96.80%
retbleed=off		327.47	8.03	-3.86%		0.00%		97.06%
retbleed=ibrs		239.47	0.75	-29.69%		-26.87%		98.23%
retbleed=stuff (nopad)	275.20	0.69	-19.20%		-15.96%		97.86%
retbleed=stuff (pad)	296.60	2.03	-12.92%		-9.43%		97.14%
					
					Baseline 	Baseline 
write (kIOPs)		Mean	stdev	mitigations=off	retbleed=off	CPU util
================================================================================
mitigations=off		299.33	4.04	0.00%		7.16%		93.51%
retbleed=off		279.33	7.51	-6.68%		0.00%		94.30%
retbleed=ibrs		231.33	0.58	-22.72%		-17.18%		95.84%
retbleed=stuff (nopad)	257.67	0.58	-13.92%		-7.76%		94.96%
retbleed=stuff (pad)	274.67	1.53	-8.24%		-1.67%		94.31%


Tim

gcc patch from Thomas:


---
 gcc/config/i386/i386.cc  |   13 +++++++++++++
 gcc/config/i386/i386.h   |    7 +++++++
 gcc/config/i386/i386.opt |    4 ++++
 gcc/doc/invoke.texi      |    6 ++++++
 4 files changed, 30 insertions(+)

--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -6182,6 +6182,19 @@ ix86_code_end (void)
     file_end_indicate_split_stack ();
 }
 
+void
+x86_asm_output_function_prefix (FILE *asm_out_file,
+				const char *fnname ATTRIBUTE_UNUSED)
+{
+  if (force_function_padding)
+    {
+      fprintf (asm_out_file, "\t.align %d\n",
+	       1 << force_function_padding);
+      fprintf (asm_out_file, "\t.skip %d,0xcc\n",
+	       1 << force_function_padding);
+    }
+}
+
 /* Emit code for the SET_GOT patterns.  */
 
 const char *
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2860,6 +2860,13 @@ extern enum attr_cpu ix86_schedule;
 #define LIBGCC2_UNWIND_ATTRIBUTE __attribute__((target ("no-mmx,no-sse")))
 #endif
 
+#include <stdio.h>
+extern void
+x86_asm_output_function_prefix (FILE *asm_out_file,
+				const char *fnname ATTRIBUTE_UNUSED);
+#undef ASM_OUTPUT_FUNCTION_PREFIX
+#define ASM_OUTPUT_FUNCTION_PREFIX x86_asm_output_function_prefix
+
 /*
 Local variables:
 version-control: t
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1064,6 +1064,10 @@ mindirect-branch=
 Target RejectNegative Joined Enum(indirect_branch) Var(ix86_indirect_branch) Init(indirect_branch_keep)
 Convert indirect call and jump to call and return thunks.
 
+mforce-function-padding=
+Target Joined UInteger Var(force_function_padding) Init(0) IntegerRange(0, 6)
+Put a 2^$N byte padding area before each function
+
 mfunction-return=
 Target RejectNegative Joined Enum(indirect_branch) Var(ix86_function_return) Init(indirect_branch_keep)
 Convert function return to call and return thunk.
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1451,6 +1451,7 @@ See RS/6000 and PowerPC Options.
 -mindirect-branch=@...{choice}  -mfunction-return=@...{choice} @gol
 -mindirect-branch-register -mharden-sls=@...{choice} @gol
 -mindirect-branch-cs-prefix -mneeded -mno-direct-extern-access}
+-mforce-function-padding @gol
 
 @emph{x86 Windows Options}
 @gccoptlist{-mconsole  -mcygwin  -mno-cygwin  -mdll @gol
@@ -32849,6 +32850,11 @@ Force all calls to functions to be indir
 when using Intel Processor Trace where it generates more precise timing
 information for function calls.
 
+@...m -mforce-function-padding
+@...ndex -mforce-function-padding
+Force a 16 byte padding are before each function which allows run-time
+code patching to put a special prologue before the function entry.
+
 @item -mmanual-endbr
 @opindex mmanual-endbr
 Insert ENDBR instruction at function entry only via the @code{cf_check}



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ