lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPM31RKpPzfqzisNerKDG=Y2yLXR82m1mFb2neDK=0cKUvz17g@mail.gmail.com>
Date:   Mon, 8 Jan 2018 18:48:48 -0800
From:   Paul Turner <pjt@...gle.com>
To:     David Woodhouse <dwmw2@...radead.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        "Van De Ven, Arjan" <arjan.van.de.ven@...el.com>,
        Andi Kleen <andi@...stfloor.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Greg Kroah-Hartman <gregkh@...ux-foundation.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Andy Lutomirski <luto@...capital.net>,
        Andi Kleen <ak@...ux.intel.com>
Subject: Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch

On Mon, Jan 8, 2018 at 4:48 PM, David Woodhouse <dwmw2@...radead.org> wrote:
> On Tue, 2018-01-09 at 00:44 +0000, Woodhouse, David wrote:
>> On IRC, Arjan assures me that 'pause' here really is sufficient as a
>> speculation trap. If we do end up returning back here as a
>> misprediction, that 'pause' will stop the speculative execution on
>> affected CPUs even though it isn't *architecturally* documented to do
>> so.
>>
>> Arjan, can you confirm that in email please?
>
>
> That actually doesn't make sense to me. If 'pause' alone is sufficient,
> then why in $DEITY's name would we need a '1:pause;jmp 1b' loop in the
> retpoline itself?
>
> Arjan?

On further investigation, I don't understand any of the motivation for
the changes here:
- It micro-benchmarks several cycles slower than the suggested
implementation on average (38 vs 44 cycles) [likely due to lost 16-byte call
alignment]
- It's much larger in terms of .text size (120 bytes @ 16 calls, 218
bytes @ 30 calls) vs (61 bytes)
- I'm not sure it's universally correct in preventing speculation:

(1) I am able to observe a small timing difference between executing
"1: pause; jmp 1b;" and "pause" in the speculative path.
     Given that alignment is otherwise identical, this should only
occur if execution is non-identical, which would require speculative
execution to proceed beyond the pause.
(2) When we proposed and reviewed the sequence.  This was not cited by
architects as a way of presenting speculation.  Indeed, as David
points out, we'd consider using this within the sequence without the
loop.


If the claim above is true -- which (1) actually appears to contradict
-- it seems to bear stronger validation.  Particularly since that in
the suggested sequences we can fit the jmps within the space we get
for free by aligning the call targets.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ