lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Wed, 31 Aug 2016 13:12:00 +0200
From:   Denys Vlasenko <dvlasenk@...hat.com>
To:     Paolo Bonzini <pbonzini@...hat.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Andy Lutomirski <luto@...capital.net>,
        Sara Sharon <sara.sharon@...el.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Christian König <christian.koenig@....com>,
        Vinod Koul <vinod.koul@...el.com>,
        Alex Deucher <alexander.deucher@....com>,
        Johannes Berg <johannes.berg@...el.com>,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        Andy Lutomirski <luto@...nel.org>,
        the arch/x86 maintainers <x86@...nel.org>,
        Ingo Molnar <mingo@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Adrian Hunter <adrian.hunter@...el.com>
Subject: Re: RFC: Petition Intel/AMD to add POPF_IF insn

On 08/19/2016 12:54 PM, Paolo Bonzini wrote:
> On 18/08/2016 19:24, Linus Torvalds wrote:
>>>> I didn't do CPL0 tests yet. Realized that cli/sti can be tested in userspace
>>>> if we set iopl(3) first.
>> Yes, but it might not be the same. So the timings could be very
>> different from a cpl0 case.
>
> FWIW I recently measured around 20 cycles for a popf as well on
> Haswell-EP and CPL=0 (that was for commit f2485b3e0c6c, "KVM: x86: use
> guest_exit_irqoff", 2016-07-01).

Thanks for confirmation.

I revisited benchmarking of the

	if (flags & X86_EFLAGS_IF)
		native_irq_enable();

patch. In "make -j20" kernel compiles on a 8-way (HT) CPU, it shows some ~5 second
improvement during ~16 minute compile. That's 0.5% speedup. It's ok, but not
something to bee too excited.

80 e6 02                and    $0x2,%dh
74 01                   je     ffffffff810101ae <intel_pt_handle_vmx+0x3e>
fb                      sti

41 f6 86 91 00 00 00 02 testb  $0x2,0x91(%r14)
74 01                   je     ffffffff81013ce7 <math_error+0x77>
fb                      sti

f6 83 91 00 00 00 02    testb  $0x2,0x91(%rbx)
74 01                   je     ffffffff81013efa <do_int3+0xba>
fb                      sti

41 f7 c4 00 02 00 00    test   $0x200,%r12d
74 01                   je     ffffffff8101615d <oops_end+0x5d>
fb                      sti

Here we trade 20-cycle POPF for either 4-cycle STI, or a branch (which is either
~1 cycle if predicted, or ~20 cycles if mispredicted). The disassembly of
vmlinux shows that gcc generates these asm patterns:

I still think a dedicated instruction for a conditional STI is worth asking for.

Along the lines of "If bit 9 in the r/m argument is set, then STI, else nothing".

What do people from CPU companies say?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ