[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1515149738.29312.104.camel@infradead.org>
Date: Fri, 05 Jan 2018 10:55:38 +0000
From: David Woodhouse <dwmw2@...radead.org>
To: Paul Turner <pjt@...gle.com>
Cc: Alexei Starovoitov <alexei.starovoitov@...il.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andi Kleen <ak@...ux.intel.com>,
LKML <linux-kernel@...r.kernel.org>,
Greg Kroah-Hartman <gregkh@...ux-foundation.org>,
Tim Chen <tim.c.chen@...ux.intel.com>,
Dave Hansen <dave.hansen@...el.com>,
Thomas Gleixner <tglx@...utronix.de>,
Kees Cook <keescook@...gle.com>,
Rik van Riel <riel@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Andy Lutomirski <luto@...capital.net>,
Jiri Kosina <jikos@...nel.org>,
One Thousand Gnomes <gnomes@...rguk.ukuu.org.uk>
Subject: Re: [PATCH v3 01/13] x86/retpoline: Add initial retpoline support
On Fri, 2018-01-05 at 02:28 -0800, Paul Turner wrote:
> On Thu, Jan 04, 2018 at 07:27:58PM +0000, David Woodhouse wrote:
> > On Thu, 2018-01-04 at 10:36 -0800, Alexei Starovoitov wrote:
> > >
> > > Pretty much.
> > > Paul's writeup: https://support.google.com/faqs/answer/7625886
> > > tldr: jmp *%r11 gets converted to:
> > > call set_up_target;
> > > capture_spec:
> > > pause;
> > > jmp capture_spec;
> > > set_up_target:
> > > mov %r11, (%rsp);
> > > ret;
> > > where capture_spec part will be looping speculatively.
> >
> > That is almost identical to what's in my latest patch set, except that
> > the capture_spec loop has 'lfence' instead of 'pause'.
>
> When choosing this sequence I benchmarked several alternatives here, including
> (nothing, nops, fences, and other serializing instructions such as cpuid).
>
> The "pause; jmp" sequence proved minutely faster than "lfence;jmp" which is why
> it was chosen.
>
> "pause; jmp" 33.231 cycles/call 9.517 ns/call
> "lfence; jmp" 33.354 cycles/call 9.552 ns/call
>
> (Timings are for a complete retpolined indirect branch.)
Yeah, I studiously ignored you here and went with only what Intel had
*assured* me was correct and put into the GCC patches, rather than
chasing those 35 picoseconds ;)
The GCC patch set already had about four different variants over time,
with associated "oh shit, that one doesn't actually work; try this".
What we have in my patch set is precisely what GCC emits at the moment.
I'm all for optimising it further, but maybe not this week.
Other than that, is there any other development from your side that I
haven't captured in the latest (v4) series?
http://git.infradead.org/users/dwmw2/linux-retpoline.git/
Download attachment "smime.p7s" of type "application/x-pkcs7-signature" (5213 bytes)
Powered by blists - more mailing lists