linux-kernel - Re: [PATCH v8 03/12] x86/retpoline: Add initial retpoline support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <79715cdf-3aef-33b2-772f-985b24fcd1ff@amd.com>
Date:   Fri, 12 Jan 2018 08:02:39 -0600
From:   Tom Lendacky <thomas.lendacky@....com>
To:     David Woodhouse <dwmw2@...radead.org>,
        Andi Kleen <ak@...ux.intel.com>
Cc:     Paul Turner <pjt@...gle.com>, LKML <linux-kernel@...r.kernel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Greg Kroah-Hartman <gregkh@...ux-foundation.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Dave Hansen <dave.hansen@...el.com>, tglx@...utronix.de,
        Kees Cook <keescook@...gle.com>,
        Rik van Riel <riel@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Andy Lutomirski <luto@...capital.net>,
        Jiri Kosina <jikos@...nel.org>, gnomes@...rguk.ukuu.org.uk,
        x86@...nel.org, Josh Poimboeuf <jpoimboe@...hat.com>
Subject: Re: [PATCH v8 03/12] x86/retpoline: Add initial retpoline support

On 1/12/2018 4:28 AM, David Woodhouse wrote:
> On Thu, 2018-01-11 at 17:58 -0600, Tom Lendacky wrote:
>>
>>> + * These are the bare retpoline primitives for indirect jmp and call.
>>> + * Do not use these directly; they only exist to make the ALTERNATIVE
>>> + * invocation below less ugly.
>>> + */
>>> +.macro RETPOLINE_JMP reg:req
>>> +     call    .Ldo_rop_\@
>>> +.Lspec_trap_\@:
>>> +     pause
> 
> Note that we never use that one on AMD. You just get 'lfence; jmp *reg'
> instead because you promised us that would work.... while Intel said it
> would work for a month or two and then said "er, oops, no it doesn't in
> all cases." — so we're half-waiting for you lot to do the same thing :)

In theory we never get that one on AMD.  But because of the case where
we could be running under a hypervisor and might not be able to verify
that lfence was made serializing, we would fall back to the generic
retpoline.

> 
> You *do* get the RSB-stuffing one though, which is the same. So...

Right.

> 
>> Talked with our engineers some more on using pause vs. lfence.  Pause is
>> not serializing on AMD, so the pause/jmp loop will use power as it is
>> speculated over waiting for return to mispredict to the correct target.
>> Can this be changed back to lfence?  It looked like a very small
>> difference in cycles/time.
> 
> That seems reasonable, although at this stage I'm also tempted to
> suggest we can do that kind of fine-tuning in a followup patch. Like
> the bikeshedding about numbers vs. readable labels. We really need the
> IBRS and IBPB patches to be landing on top of this as soon as possible.

Yup, I understand.

Thanks,
Tom

> 
> Paul, the lfence→pause change was only a tiny micro-optimisation on
> Intel, wasn't it? Are you happy with changing the implementations of
> the RSB stuffing code to use lfence again (or what about 'hlt')?
> 
> It currently looks like this... the capture loop is using 'jmp' to
> match the retpoline instead of 'call' as in your examples:
> 
> 
> #define __FILL_RETURN_BUFFER(reg, nr, sp, uniq)	\
> 	mov	$(nr/2), reg;			\
> .Ldo_call1_ ## uniq:				\
> 	call	.Ldo_call2_ ## uniq;		\
> .Ltrap1_ ## uniq:				\
> 	pause;					\
> 	jmp	.Ltrap1_ ## uniq;		\
> .Ldo_call2_ ## uniq:				\
> 	call	.Ldo_loop_ ## uniq;		\
> .Ltrap2_ ## uniq:				\
> 	pause;					\
> 	jmp	.Ltrap2_ ## uniq;		\
> .Ldo_loop_ ## uniq:				\
> 	dec	reg;				\
> 	jnz	.Ldo_call1_ ## uniq;		\
> 	add	$(BITS_PER_LONG/8) * nr, sp;
>