[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fea16f6c-8afc-a694-cd76-c385308e1f9b@redhat.com>
Date: Wed, 31 Aug 2016 13:12:00 +0200
From: Denys Vlasenko <dvlasenk@...hat.com>
To: Paolo Bonzini <pbonzini@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Andy Lutomirski <luto@...capital.net>,
Sara Sharon <sara.sharon@...el.com>,
Dan Williams <dan.j.williams@...el.com>,
Christian König <christian.koenig@....com>,
Vinod Koul <vinod.koul@...el.com>,
Alex Deucher <alexander.deucher@....com>,
Johannes Berg <johannes.berg@...el.com>,
"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
Andy Lutomirski <luto@...nel.org>,
the arch/x86 maintainers <x86@...nel.org>,
Ingo Molnar <mingo@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
Adrian Hunter <adrian.hunter@...el.com>
Subject: Re: RFC: Petition Intel/AMD to add POPF_IF insn
On 08/19/2016 12:54 PM, Paolo Bonzini wrote:
> On 18/08/2016 19:24, Linus Torvalds wrote:
>>>> I didn't do CPL0 tests yet. Realized that cli/sti can be tested in userspace
>>>> if we set iopl(3) first.
>> Yes, but it might not be the same. So the timings could be very
>> different from a cpl0 case.
>
> FWIW I recently measured around 20 cycles for a popf as well on
> Haswell-EP and CPL=0 (that was for commit f2485b3e0c6c, "KVM: x86: use
> guest_exit_irqoff", 2016-07-01).
Thanks for confirmation.
I revisited benchmarking of the
if (flags & X86_EFLAGS_IF)
native_irq_enable();
patch. In "make -j20" kernel compiles on a 8-way (HT) CPU, it shows some ~5 second
improvement during ~16 minute compile. That's 0.5% speedup. It's ok, but not
something to bee too excited.
80 e6 02 and $0x2,%dh
74 01 je ffffffff810101ae <intel_pt_handle_vmx+0x3e>
fb sti
41 f6 86 91 00 00 00 02 testb $0x2,0x91(%r14)
74 01 je ffffffff81013ce7 <math_error+0x77>
fb sti
f6 83 91 00 00 00 02 testb $0x2,0x91(%rbx)
74 01 je ffffffff81013efa <do_int3+0xba>
fb sti
41 f7 c4 00 02 00 00 test $0x200,%r12d
74 01 je ffffffff8101615d <oops_end+0x5d>
fb sti
Here we trade 20-cycle POPF for either 4-cycle STI, or a branch (which is either
~1 cycle if predicted, or ~20 cycles if mispredicted). The disassembly of
vmlinux shows that gcc generates these asm patterns:
I still think a dedicated instruction for a conditional STI is worth asking for.
Along the lines of "If bit 9 in the r/m argument is set, then STI, else nothing".
What do people from CPU companies say?
Powered by blists - more mailing lists