linux-kernel - Re: [PATCH] x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+55aFykHb8oyZquBq53MowOLzc5XXtNW4tad+4cTbO0YYFYNQ@mail.gmail.com>
Date:	Tue, 28 Apr 2015 09:28:52 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Borislav Petkov <bp@...en8.de>
Cc:	"H. Peter Anvin" <hpa@...or.com>,
	Andy Lutomirski <luto@...capital.net>,
	Andy Lutomirski <luto@...nel.org>, X86 ML <x86@...nel.org>,
	Denys Vlasenko <vda.linux@...glemail.com>,
	Brian Gerst <brgerst@...il.com>,
	Denys Vlasenko <dvlasenk@...hat.com>,
	Ingo Molnar <mingo@...nel.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Oleg Nesterov <oleg@...hat.com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Alexei Starovoitov <ast@...mgrid.com>,
	Will Drewry <wad@...omium.org>,
	Kees Cook <keescook@...omium.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Mel Gorman <mgorman@...e.com>
Subject: Re: [PATCH] x86_64, asm: Work around AMD SYSRET SS descriptor
 attribute issue

On Tue, Apr 28, 2015 at 8:55 AM, Borislav Petkov <bp@...en8.de> wrote:
>
> Provided it is correct, it shows that the 0x66-prefixed 3-byte NOPs are
> better than the 0F 1F 00 suggested by the manual (Haha!):

That's which AMD CPU?

On my intel i7-4770S, they are the same cost (I cut down your loop
numbers by an order of magnitude each because I couldn't be arsed to
wait for it, so it might be off by a cycle or two):

    Running 60 times, 1000000 loops per run.
    nop_0x90 average: 81.065681
    nop_3_byte average: 80.230101

That said, I think your benchmark tests the speed of "rdtsc" rather
than the no-ops. Putting the read_tsc inside the inner loop basically
makes it swamp everything else.

> $ taskset -c 3 ./nops
> Running 600 times, 10000000 loops per run.
> nop_0x90 average: 439.805220
> nop_3_byte average: 442.412915

I think that's in the noise, and could be explained by random
alignment of the loop too, or even random factors like "the CPU heated
up, so the later run was slightly slower". The difference between 439
and 442 doesn't strike me as all that significant.

It might be better to *not* inline, and instead make a real function
call to something that has a lot of no-ops (do some preprocessor magic
to make more no-ops in one go). At least that way the alignment is
likely the same for the two cases.

Or if not that, then I think you're better off with something like

                p1 = read_tsc();
                for (i = 0; i < LOOPS; i++) {
                        nop_0x90();

                }
                p2 = read_tsc();
                r = (p2 - p1);

because while you're now measuring the loop overhead too, that's
*much* smaller than the rdtsc overhead. So I get something like

    Running 600 times, 1000000 loops per run.
    nop_0x90 average: 3.786935
    nop_3_byte average: 3.677228

and notice the difference between "~80 cycles" and "~3.7 cycles".
Yeah, that's rdtsc. I bet your 440 is about the same thing too.

Btw, the whole thing about "averaging cycles" is not the right thing
to do either. You should probably take the *minimum* cycles count, not
the average, because anything non-minimal means "some perturbation"
(ie interrupt etc).

So I think something like the attached would be better. It gives an
approximate "cycles per one four-byte nop", and I get

    [torvalds@i7 ~]$ taskset -c 3 ./a.out
    Running 60 times, 1000000 loops per run.
    nop_0x90 average: 0.200479
    nop_3_byte average: 0.199694

which sounds suspiciously good to me (5 nops per cycle? uop cache and
nop compression, I guess).

                            Linus

View attachment "t.c" of type "text/x-csrc" (1893 bytes)