linux-kernel - Re: [PATCH] x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <61BCF405-8000-43EB-A6B1-2BF5677E4ADE@zytor.com>
Date:	Mon, 27 Apr 2015 12:59:11 -0700
From:	"H. Peter Anvin" <hpa@...or.com>
To:	Borislav Petkov <bp@...en8.de>,
	Linus Torvalds <torvalds@...ux-foundation.org>
CC:	Andy Lutomirski <luto@...capital.net>,
	Andy Lutomirski <luto@...nel.org>, X86 ML <x86@...nel.org>,
	Denys Vlasenko <vda.linux@...glemail.com>,
	Brian Gerst <brgerst@...il.com>,
	Denys Vlasenko <dvlasenk@...hat.com>,
	Ingo Molnar <mingo@...nel.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Oleg Nesterov <oleg@...hat.com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Alexei Starovoitov <ast@...mgrid.com>,
	Will Drewry <wad@...omium.org>,
	Kees Cook <keescook@...omium.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue

It really comes down to this: it seems older cores from both Intel and AMD perform better with 66 66 66 90, whereas the 0F 1F series is better on newer cores.

When I measured it, the differences were sometimes dramatic.

On April 27, 2015 11:53:44 AM PDT, Borislav Petkov <bp@...en8.de> wrote:
>On Mon, Apr 27, 2015 at 11:47:30AM -0700, Linus Torvalds wrote:
>> On Mon, Apr 27, 2015 at 11:38 AM, Borislav Petkov <bp@...en8.de>
>wrote:
>> >
>> > So our current NOP-infrastructure does ASM_NOP_MAX NOPs of 8 bytes
>so
>> > without more invasive changes, our longest NOPs are 8 byte long and
>then
>> > we have to repeat.
>> 
>> Btw (and I'm too lazy to check) do we take alignment into account?
>> 
>> Because if you have to split, and use multiple nops, it is *probably*
>> a good idea to try to avoid 16-byte boundaries, since that's can be
>> the I$ fetch granularity from L1 (although I guess 32B is getting
>more
>> common).
>
>Yeah, on F16h you have 32B fetch but the paths later in the machine
>gets narrower, so to speak.
>
>> So the exact split might depend on the alignment of the nop
>replacement..
>
>Yeah, no. Our add_nops() is trivial:
>
>/* Use this to add nops to a buffer, then text_poke the whole buffer.
>*/
>static void __init_or_module add_nops(void *insns, unsigned int len)
>{
>        while (len > 0) {
>                unsigned int noplen = len;
>                if (noplen > ASM_NOP_MAX)
>                        noplen = ASM_NOP_MAX;
>                memcpy(insns, ideal_nops[noplen], noplen);
>                insns += noplen;
>                len -= noplen;
>        }
>}
>
>> Can we perhaps get rid of the distinction entirely, and just use one
>> set of 64-bit nops for both Intel/AMD?
>
>I *think* hpa would have an opinion here. I'm judging by looking at
>comments like this one in the code:
>
>        /*
>         * Due to a decoder implementation quirk, some
>         * specific Intel CPUs actually perform better with
>         * the "k8_nops" than with the SDM-recommended NOPs.
>         */
>
>which is a fun one in itself. :-)

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/