lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 22 Mar 2021 10:32:52 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     David Laight <David.Laight@...lab.com>
Cc:     "x86@...nel.org" <x86@...nel.org>,
        "jpoimboe@...hat.com" <jpoimboe@...hat.com>,
        "jgross@...e.com" <jgross@...e.com>,
        "mbenes@...e.com" <mbenes@...e.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 03/14] x86/retpoline: Simplify retpolines

On Fri, Mar 19, 2021 at 05:18:14PM +0000, David Laight wrote:
> From: Peter Zijlstra
> > Sent: 18 March 2021 17:11
> > 
> > Due to commit c9c324dc22aa ("objtool: Support stack layout changes
> > in alternatives"), it is possible to simplify the retpolines.
> > 
> ...
> > Notice that since the longest alternative sequence is now:
> > 
> >    0:   e8 07 00 00 00          callq  c <.altinstr_replacement+0xc>
> >    5:   f3 90                   pause
> >    7:   0f ae e8                lfence
> >    a:   eb f9                   jmp    5 <.altinstr_replacement+0x5>
> >    c:   48 89 04 24             mov    %rax,(%rsp)
> >   10:   c3                      retq
> > 
> > 17 bytes, we have 15 bytes NOP at the end of our 32 byte slot. (IOW,
> > if we can shrink the retpoline by 1 byte we can pack it more dense)
> 
> I'm intrigued about the lfence after the pause.
> Clearly this is for very warped cpu behaviour.
> To get to the pause you have to be speculating past an
> unconditional call.

Please read up on retpoline... That's the speculation trap. The warped
CPU behaviour is called Spectre-v2.

For others, the obvious alternative is the below; and possibly we could
then also remove the loop.

The original retpoline, as per Paul's article has: 1: pause; jmp 1b;.
That is, it lacks the LFENCE we have and would also fit 16 bytes.



---
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -15,8 +15,7 @@
 	call    .Ldo_rop_\@
 .Lspec_trap_\@:
 	UNWIND_HINT_EMPTY
-	pause
-	lfence
+	int3
 	jmp .Lspec_trap_\@
 .Ldo_rop_\@:
 	mov     %\reg, (%_ASM_SP)
@@ -27,7 +26,7 @@
 .macro THUNK reg
 	.section .text.__x86.indirect_thunk
 
-	.align 32
+	.align 16
 SYM_FUNC_START(__x86_indirect_thunk_\reg)
 
 	ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), \


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ