[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1515467124.22302.8.camel@infradead.org>
Date: Tue, 09 Jan 2018 03:05:24 +0000
From: David Woodhouse <dwmw2@...radead.org>
To: Paul Turner <pjt@...gle.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
"Van De Ven, Arjan" <arjan.van.de.ven@...el.com>,
Andi Kleen <andi@...stfloor.org>,
LKML <linux-kernel@...r.kernel.org>,
Greg Kroah-Hartman <gregkh@...ux-foundation.org>,
Tim Chen <tim.c.chen@...ux.intel.com>,
Dave Hansen <dave.hansen@...el.com>,
Thomas Gleixner <tglx@...utronix.de>,
Andy Lutomirski <luto@...capital.net>,
Andi Kleen <ak@...ux.intel.com>
Subject: Re: [PATCH] x86/retpoline: Avoid return buffer underflows on
context switch
On Mon, 2018-01-08 at 18:48 -0800, Paul Turner wrote:
> On Mon, Jan 8, 2018 at 4:48 PM, David Woodhouse <dwmw2@...radead.org> wrote:
> >
> > On Tue, 2018-01-09 at 00:44 +0000, Woodhouse, David wrote:
> > >
> > > On IRC, Arjan assures me that 'pause' here really is sufficient as a
> > > speculation trap. If we do end up returning back here as a
> > > misprediction, that 'pause' will stop the speculative execution on
> > > affected CPUs even though it isn't *architecturally* documented to do
> > > so.
> > >
> > > Arjan, can you confirm that in email please?
> >
> > That actually doesn't make sense to me. If 'pause' alone is sufficient,
> > then why in $DEITY's name would we need a '1:pause;jmp 1b' loop in the
> > retpoline itself?
> >
> > Arjan?
> On further investigation, I don't understand any of the motivation for
> the changes here:
> - It micro-benchmarks several cycles slower than the suggested
> implementation on average (38 vs 44 cycles) [likely due to lost 16-byte call
> alignment]
> - It's much larger in terms of .text size (120 bytes @ 16 calls, 218
> bytes @ 30 calls) vs (61 bytes)
> - I'm not sure it's universally correct in preventing speculation:
>
> (1) I am able to observe a small timing difference between executing
> "1: pause; jmp 1b;" and "pause" in the speculative path.
> Given that alignment is otherwise identical, this should only
> occur if execution is non-identical, which would require speculative
> execution to proceed beyond the pause.
> (2) When we proposed and reviewed the sequence. This was not cited by
> architects as a way of presenting speculation. Indeed, as David
> points out, we'd consider using this within the sequence without the
> loop.
>
>
> If the claim above is true -- which (1) actually appears to contradict
> -- it seems to bear stronger validation. Particularly since that in
> the suggested sequences we can fit the jmps within the space we get
> for free by aligning the call targets.
Some of the discrepancies are because it's been filtered through Intel
and I may not have had your latest version.
I'm going to revert to your version from
https://support.google.com/faqs/answer/7625886 — for the retpoline
(i.e. adding back the alignment) and the RSB stuffing.
If Intel have a sequence which is simpler and guaranteed to work, I'll
let them post that with an authoritative statement from the CPU
architects. At this point, we really need to get on with rolling in the
other parts on top.
Download attachment "smime.p7s" of type "application/x-pkcs7-signature" (5213 bytes)
Powered by blists - more mailing lists