[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d49cfa3e-f5d3-07fd-6e5d-573b37b66824@redhat.com>
Date: Thu, 18 Aug 2016 19:47:56 +0200
From: Denys Vlasenko <dvlasenk@...hat.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Andy Lutomirski <luto@...capital.net>,
Sara Sharon <sara.sharon@...el.com>,
Dan Williams <dan.j.williams@...el.com>,
Christian König <christian.koenig@....com>,
Vinod Koul <vinod.koul@...el.com>,
Alex Deucher <alexander.deucher@....com>,
Johannes Berg <johannes.berg@...el.com>,
"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
Andy Lutomirski <luto@...nel.org>,
the arch/x86 maintainers <x86@...nel.org>,
Ingo Molnar <mingo@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
Adrian Hunter <adrian.hunter@...el.com>
Subject: Re: RFC: Petition Intel/AMD to add POPF_IF insn
On 08/18/2016 07:24 PM, Linus Torvalds wrote:
> That said, your numbers really aren't very convincing. If popf really
> is just 10 cycles on modern Intel hardware, it's already fast enough
> that I really don't think it matters.
It's 20 cycles. I was wrong in my email, I forgot that the insn count
also counts "push %ebx" insns.
Since I already made a mistake, let me double-check.
200 million iterations of this loop execute under 17 seconds:
400100: b8 00 c2 eb 0b mov $0xbebc200,%eax # 1000*1000*1000
400105: 9c pushfq
400106: 5b pop %rbx
400107: 90 nop
....
0000000000400140 <loop>:
400140: 53 push %rbx
400141: 9d popfq
400142: 53 push %rbx
400143: 9d popfq
400144: 53 push %rbx
400145: 9d popfq
400146: 53 push %rbx
400147: 9d popfq
400148: 53 push %rbx
400149: 9d popfq
40014a: 53 push %rbx
40014b: 9d popfq
40014c: 53 push %rbx
40014d: 9d popfq
40014e: 53 push %rbx
40014f: 9d popfq
400150: 53 push %rbx
400151: 9d popfq
400152: 53 push %rbx
400153: 9d popfq
400154: 53 push %rbx
400155: 9d popfq
400156: 53 push %rbx
400157: 9d popfq
400158: 53 push %rbx
400159: 9d popfq
40015a: 53 push %rbx
40015b: 9d popfq
40015c: ff c8 dec %eax
40015e: 75 e0 jne 400140 <loop>
The loop is exactly 32 bytes, aligned.
There are 14 POPFs. Other insns are very fast.
No perf, just "time taskset 1 ./test".
My CPU frequency hovers around 3500 MHz when loaded.
17 seconds is 17*3500 million cycles.
17*3500 million cycles / 200*14 million cycles = 21.25
Thus, one POPF in CPL3 is ~20 cycles on Skylake.
Powered by blists - more mailing lists