[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <27229f2320a446bf8342233c2555ea8d@AcuMS.aculab.com>
Date: Tue, 6 Apr 2021 08:56:50 +0000
From: David Laight <David.Laight@...LAB.COM>
To: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-tip-commits@...r.kernel.org"
<linux-tip-commits@...r.kernel.org>
CC: "Peter Zijlstra (Intel)" <peterz@...radead.org>,
Borislav Petkov <bp@...e.de>, Ingo Molnar <mingo@...nel.org>,
"x86@...nel.org" <x86@...nel.org>
Subject: RE: [tip: x86/core] x86/retpoline: Simplify retpolines
From: tip-bot2@...utronix.de
> Sent: 03 April 2021 12:11
...
> Notice that since the longest alternative sequence is now:
>
> 0: e8 07 00 00 00 callq c <.altinstr_replacement+0xc>
> 5: f3 90 pause
> 7: 0f ae e8 lfence
> a: eb f9 jmp 5 <.altinstr_replacement+0x5>
> c: 48 89 04 24 mov %rax,(%rsp)
> 10: c3 retq
>
> 17 bytes, we have 15 bytes NOP at the end of our 32 byte slot. (IOW, if
> we can shrink the retpoline by 1 byte we can pack it more densely).
Every time I see this I can't help feeling that doing something
(aka anything) to get the 'mov' and 'retq' into the same 16 byte
code fetch/decode block but be advantageous.
Even something like:
call 1f
pause
jmp 2f
1: mov %rax,(%rsp)
retq
2: pause
lfence
jmp 2b
Might meet all the requirements for the retpoline while
allowing the 'mov' and 'retq' be decoded in the same clock.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists