linux-kernel - Re: RFC: Petition Intel/AMD to add POPF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Thu, 18 Aug 2016 19:47:56 +0200
From:   Denys Vlasenko <dvlasenk@...hat.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Andy Lutomirski <luto@...capital.net>,
        Sara Sharon <sara.sharon@...el.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Christian König <christian.koenig@....com>,
        Vinod Koul <vinod.koul@...el.com>,
        Alex Deucher <alexander.deucher@....com>,
        Johannes Berg <johannes.berg@...el.com>,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        Andy Lutomirski <luto@...nel.org>,
        the arch/x86 maintainers <x86@...nel.org>,
        Ingo Molnar <mingo@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Adrian Hunter <adrian.hunter@...el.com>
Subject: Re: RFC: Petition Intel/AMD to add POPF_IF insn

On 08/18/2016 07:24 PM, Linus Torvalds wrote:
> That said, your numbers really aren't very convincing. If popf really
> is just 10 cycles on modern Intel hardware, it's already fast enough
> that I really don't think it matters.

It's 20 cycles. I was wrong in my email, I forgot that the insn count
also counts "push %ebx" insns.

Since I already made a mistake, let me double-check.

200 million iterations of this loop execute under 17 seconds:

   400100:	b8 00 c2 eb 0b       	mov    $0xbebc200,%eax # 1000*1000*1000
   400105:	9c                   	pushfq
   400106:	5b                   	pop    %rbx
   400107:	90                   	nop
....
0000000000400140 <loop>:
   400140:	53                   	push   %rbx
   400141:	9d                   	popfq
   400142:	53                   	push   %rbx
   400143:	9d                   	popfq
   400144:	53                   	push   %rbx
   400145:	9d                   	popfq
   400146:	53                   	push   %rbx
   400147:	9d                   	popfq
   400148:	53                   	push   %rbx
   400149:	9d                   	popfq
   40014a:	53                   	push   %rbx
   40014b:	9d                   	popfq
   40014c:	53                   	push   %rbx
   40014d:	9d                   	popfq
   40014e:	53                   	push   %rbx
   40014f:	9d                   	popfq
   400150:	53                   	push   %rbx
   400151:	9d                   	popfq
   400152:	53                   	push   %rbx
   400153:	9d                   	popfq
   400154:	53                   	push   %rbx
   400155:	9d                   	popfq
   400156:	53                   	push   %rbx
   400157:	9d                   	popfq
   400158:	53                   	push   %rbx
   400159:	9d                   	popfq
   40015a:	53                   	push   %rbx
   40015b:	9d                   	popfq
   40015c:	ff c8                	dec    %eax
   40015e:	75 e0                	jne    400140 <loop>

The loop is exactly 32 bytes, aligned.
There are 14 POPFs. Other insns are very fast.

No perf, just "time taskset 1 ./test".
My CPU frequency hovers around 3500 MHz when loaded.

17 seconds is 17*3500 million cycles.
17*3500 million cycles / 200*14 million cycles = 21.25

Thus, one POPF in CPL3 is ~20 cycles on Skylake.