[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <61BCF405-8000-43EB-A6B1-2BF5677E4ADE@zytor.com>
Date: Mon, 27 Apr 2015 12:59:11 -0700
From: "H. Peter Anvin" <hpa@...or.com>
To: Borislav Petkov <bp@...en8.de>,
Linus Torvalds <torvalds@...ux-foundation.org>
CC: Andy Lutomirski <luto@...capital.net>,
Andy Lutomirski <luto@...nel.org>, X86 ML <x86@...nel.org>,
Denys Vlasenko <vda.linux@...glemail.com>,
Brian Gerst <brgerst@...il.com>,
Denys Vlasenko <dvlasenk@...hat.com>,
Ingo Molnar <mingo@...nel.org>,
Steven Rostedt <rostedt@...dmis.org>,
Oleg Nesterov <oleg@...hat.com>,
Frederic Weisbecker <fweisbec@...il.com>,
Alexei Starovoitov <ast@...mgrid.com>,
Will Drewry <wad@...omium.org>,
Kees Cook <keescook@...omium.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue
It really comes down to this: it seems older cores from both Intel and AMD perform better with 66 66 66 90, whereas the 0F 1F series is better on newer cores.
When I measured it, the differences were sometimes dramatic.
On April 27, 2015 11:53:44 AM PDT, Borislav Petkov <bp@...en8.de> wrote:
>On Mon, Apr 27, 2015 at 11:47:30AM -0700, Linus Torvalds wrote:
>> On Mon, Apr 27, 2015 at 11:38 AM, Borislav Petkov <bp@...en8.de>
>wrote:
>> >
>> > So our current NOP-infrastructure does ASM_NOP_MAX NOPs of 8 bytes
>so
>> > without more invasive changes, our longest NOPs are 8 byte long and
>then
>> > we have to repeat.
>>
>> Btw (and I'm too lazy to check) do we take alignment into account?
>>
>> Because if you have to split, and use multiple nops, it is *probably*
>> a good idea to try to avoid 16-byte boundaries, since that's can be
>> the I$ fetch granularity from L1 (although I guess 32B is getting
>more
>> common).
>
>Yeah, on F16h you have 32B fetch but the paths later in the machine
>gets narrower, so to speak.
>
>> So the exact split might depend on the alignment of the nop
>replacement..
>
>Yeah, no. Our add_nops() is trivial:
>
>/* Use this to add nops to a buffer, then text_poke the whole buffer.
>*/
>static void __init_or_module add_nops(void *insns, unsigned int len)
>{
> while (len > 0) {
> unsigned int noplen = len;
> if (noplen > ASM_NOP_MAX)
> noplen = ASM_NOP_MAX;
> memcpy(insns, ideal_nops[noplen], noplen);
> insns += noplen;
> len -= noplen;
> }
>}
>
>> Can we perhaps get rid of the distinction entirely, and just use one
>> set of 64-bit nops for both Intel/AMD?
>
>I *think* hpa would have an opinion here. I'm judging by looking at
>comments like this one in the code:
>
> /*
> * Due to a decoder implementation quirk, some
> * specific Intel CPUs actually perform better with
> * the "k8_nops" than with the SDM-recommended NOPs.
> */
>
>which is a fun one in itself. :-)
--
Sent from my mobile phone. Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists