[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1686aa11-152d-9416-34dd-17820de7a7b6@redhat.com>
Date: Thu, 18 Aug 2016 15:26:02 +0200
From: Denys Vlasenko <dvlasenk@...hat.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>,
Andy Lutomirski <luto@...capital.net>
Cc: Sara Sharon <sara.sharon@...el.com>,
Dan Williams <dan.j.williams@...el.com>,
Christian König <christian.koenig@....com>,
Vinod Koul <vinod.koul@...el.com>,
Alex Deucher <alexander.deucher@....com>,
Johannes Berg <johannes.berg@...el.com>,
"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
Andy Lutomirski <luto@...nel.org>,
the arch/x86 maintainers <x86@...nel.org>,
Ingo Molnar <mingo@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
Adrian Hunter <adrian.hunter@...el.com>
Subject: Re: RFC: Petition Intel/AMD to add POPF_IF insn
> Of course, somebody really should do timings on modern CPU's (in cpl0,
> comparing native_fl() that enables interrupts with a popf)
I didn't do CPL0 tests yet. Realized that cli/sti can be tested in userspace
if we set iopl(3) first.
Surprisingly, STI is slower than CLI. A loop with 27 CLI's and one STI
converges to about ~0.5 insn/cycle:
# compile with: gcc -nostartfiles -nostdlib
_start: .globl _start
mov $172, %eax #iopl
mov $3, %edi
syscall
mov $200*1000*1000, %eax
.balign 64
loop:
cli;cli;cli;cli
cli;cli;cli;cli
cli;cli;cli;cli
cli;cli;cli;cli
cli;cli;cli;cli
cli;cli;cli;cli
cli;cli;cli;sti
dec %eax
jnz loop
mov $231, %eax #exit_group
syscall
perf stat:
6,015,787,968 instructions # 0.52 insn per cycle
3.355474199 seconds time elapsed
With all CLIs replaced by STIs, it's ~0.25 insn/cycle:
6,030,530,328 instructions # 0.27 insn per cycle
6.547200322 seconds time elapsed
POPF which needs to enable interrupts is not measurably faster than
one which does not change .IF:
Loop with:
400158: fa cli
400159: 53 push %rbx #saved eflags with if=1
40015a: 9d popfq
shows:
8,908,857,324 instructions # 0.11 insn per cycle ( +- 0.00% )
Loop with:
400140: fb sti
400141: 53 push %rbx
400142: 9d popfq
shows:
8,920,243,701 instructions # 0.10 insn per cycle ( +- 0.01% )
Even loop with neither CLI nor STI, only with POPF:
400140: 53 push %rbx
400141: 9d popfq
shows:
6,079,936,714 instructions # 0.10 insn per cycle ( +- 0.00% )
This is on a Skylake CPU.
The gist of it:
CLI is 2 cycles,
STI is 4 cycles,
POPF is 10 cycles
seemingly regardless of prior value of EFLAGS.IF.
Powered by blists - more mailing lists