linux-kernel - Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPM31RKpPzfqzisNerKDG=Y2yLXR82m1mFb2neDK=0cKUvz17g@mail.gmail.com>
Date:   Mon, 8 Jan 2018 18:48:48 -0800
From:   Paul Turner <pjt@...gle.com>
To:     David Woodhouse <dwmw2@...radead.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        "Van De Ven, Arjan" <arjan.van.de.ven@...el.com>,
        Andi Kleen <andi@...stfloor.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Greg Kroah-Hartman <gregkh@...ux-foundation.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Andy Lutomirski <luto@...capital.net>,
        Andi Kleen <ak@...ux.intel.com>
Subject: Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch

On Mon, Jan 8, 2018 at 4:48 PM, David Woodhouse <dwmw2@...radead.org> wrote:
> On Tue, 2018-01-09 at 00:44 +0000, Woodhouse, David wrote:
>> On IRC, Arjan assures me that 'pause' here really is sufficient as a
>> speculation trap. If we do end up returning back here as a
>> misprediction, that 'pause' will stop the speculative execution on
>> affected CPUs even though it isn't *architecturally* documented to do
>> so.
>>
>> Arjan, can you confirm that in email please?
>
>
> That actually doesn't make sense to me. If 'pause' alone is sufficient,
> then why in $DEITY's name would we need a '1:pause;jmp 1b' loop in the
> retpoline itself?
>
> Arjan?

On further investigation, I don't understand any of the motivation for
the changes here:
- It micro-benchmarks several cycles slower than the suggested
implementation on average (38 vs 44 cycles) [likely due to lost 16-byte call
alignment]
- It's much larger in terms of .text size (120 bytes @ 16 calls, 218
bytes @ 30 calls) vs (61 bytes)
- I'm not sure it's universally correct in preventing speculation:

(1) I am able to observe a small timing difference between executing
"1: pause; jmp 1b;" and "pause" in the speculative path.
     Given that alignment is otherwise identical, this should only
occur if execution is non-identical, which would require speculative
execution to proceed beyond the pause.
(2) When we proposed and reviewed the sequence.  This was not cited by
architects as a way of presenting speculation.  Indeed, as David
points out, we'd consider using this within the sequence without the
loop.


If the claim above is true -- which (1) actually appears to contradict
-- it seems to bear stronger validation.  Particularly since that in
the suggested sequences we can fit the jmps within the space we get
for free by aligning the call targets.