[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPM31RKpPzfqzisNerKDG=Y2yLXR82m1mFb2neDK=0cKUvz17g@mail.gmail.com>
Date: Mon, 8 Jan 2018 18:48:48 -0800
From: Paul Turner <pjt@...gle.com>
To: David Woodhouse <dwmw2@...radead.org>
Cc: Peter Zijlstra <peterz@...radead.org>,
"Van De Ven, Arjan" <arjan.van.de.ven@...el.com>,
Andi Kleen <andi@...stfloor.org>,
LKML <linux-kernel@...r.kernel.org>,
Greg Kroah-Hartman <gregkh@...ux-foundation.org>,
Tim Chen <tim.c.chen@...ux.intel.com>,
Dave Hansen <dave.hansen@...el.com>,
Thomas Gleixner <tglx@...utronix.de>,
Andy Lutomirski <luto@...capital.net>,
Andi Kleen <ak@...ux.intel.com>
Subject: Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch
On Mon, Jan 8, 2018 at 4:48 PM, David Woodhouse <dwmw2@...radead.org> wrote:
> On Tue, 2018-01-09 at 00:44 +0000, Woodhouse, David wrote:
>> On IRC, Arjan assures me that 'pause' here really is sufficient as a
>> speculation trap. If we do end up returning back here as a
>> misprediction, that 'pause' will stop the speculative execution on
>> affected CPUs even though it isn't *architecturally* documented to do
>> so.
>>
>> Arjan, can you confirm that in email please?
>
>
> That actually doesn't make sense to me. If 'pause' alone is sufficient,
> then why in $DEITY's name would we need a '1:pause;jmp 1b' loop in the
> retpoline itself?
>
> Arjan?
On further investigation, I don't understand any of the motivation for
the changes here:
- It micro-benchmarks several cycles slower than the suggested
implementation on average (38 vs 44 cycles) [likely due to lost 16-byte call
alignment]
- It's much larger in terms of .text size (120 bytes @ 16 calls, 218
bytes @ 30 calls) vs (61 bytes)
- I'm not sure it's universally correct in preventing speculation:
(1) I am able to observe a small timing difference between executing
"1: pause; jmp 1b;" and "pause" in the speculative path.
Given that alignment is otherwise identical, this should only
occur if execution is non-identical, which would require speculative
execution to proceed beyond the pause.
(2) When we proposed and reviewed the sequence. This was not cited by
architects as a way of presenting speculation. Indeed, as David
points out, we'd consider using this within the sequence without the
loop.
If the claim above is true -- which (1) actually appears to contradict
-- it seems to bear stronger validation. Particularly since that in
the suggested sequences we can fit the jmps within the space we get
for free by aligning the call targets.
Powered by blists - more mailing lists