lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 12 Jan 2018 10:28:18 +0000
From:   David Woodhouse <dwmw2@...radead.org>
To:     Tom Lendacky <thomas.lendacky@....com>,
        Andi Kleen <ak@...ux.intel.com>
Cc:     Paul Turner <pjt@...gle.com>, LKML <linux-kernel@...r.kernel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Greg Kroah-Hartman <gregkh@...ux-foundation.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Dave Hansen <dave.hansen@...el.com>, tglx@...utronix.de,
        Kees Cook <keescook@...gle.com>,
        Rik van Riel <riel@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Andy Lutomirski <luto@...capital.net>,
        Jiri Kosina <jikos@...nel.org>, gnomes@...rguk.ukuu.org.uk,
        x86@...nel.org, Josh Poimboeuf <jpoimboe@...hat.com>
Subject: Re: [PATCH v8 03/12] x86/retpoline: Add initial retpoline support

On Thu, 2018-01-11 at 17:58 -0600, Tom Lendacky wrote:
> 
> > + * These are the bare retpoline primitives for indirect jmp and call.
> > + * Do not use these directly; they only exist to make the ALTERNATIVE
> > + * invocation below less ugly.
> > + */
> > +.macro RETPOLINE_JMP reg:req
> > +     call    .Ldo_rop_\@
> > +.Lspec_trap_\@:
> > +     pause

Note that we never use that one on AMD. You just get 'lfence; jmp *reg'
instead because you promised us that would work.... while Intel said it
would work for a month or two and then said "er, oops, no it doesn't in
all cases." — so we're half-waiting for you lot to do the same thing :)

You *do* get the RSB-stuffing one though, which is the same. So...

> Talked with our engineers some more on using pause vs. lfence.  Pause is
> not serializing on AMD, so the pause/jmp loop will use power as it is
> speculated over waiting for return to mispredict to the correct target.
> Can this be changed back to lfence?  It looked like a very small
> difference in cycles/time.

That seems reasonable, although at this stage I'm also tempted to
suggest we can do that kind of fine-tuning in a followup patch. Like
the bikeshedding about numbers vs. readable labels. We really need the
IBRS and IBPB patches to be landing on top of this as soon as possible.

Paul, the lfence→pause change was only a tiny micro-optimisation on
Intel, wasn't it? Are you happy with changing the implementations of
the RSB stuffing code to use lfence again (or what about 'hlt')?

It currently looks like this... the capture loop is using 'jmp' to
match the retpoline instead of 'call' as in your examples:


#define __FILL_RETURN_BUFFER(reg, nr, sp, uniq)	\
	mov	$(nr/2), reg;			\
.Ldo_call1_ ## uniq:				\
	call	.Ldo_call2_ ## uniq;		\
.Ltrap1_ ## uniq:				\
	pause;					\
	jmp	.Ltrap1_ ## uniq;		\
.Ldo_call2_ ## uniq:				\
	call	.Ldo_loop_ ## uniq;		\
.Ltrap2_ ## uniq:				\
	pause;					\
	jmp	.Ltrap2_ ## uniq;		\
.Ldo_loop_ ## uniq:				\
	dec	reg;				\
	jnz	.Ldo_call1_ ## uniq;		\
	add	$(BITS_PER_LONG/8) * nr, sp;


Download attachment "smime.p7s" of type "application/x-pkcs7-signature" (5213 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ