linux-kernel - Re: RFC: Petition Intel/AMD to add POPF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1686aa11-152d-9416-34dd-17820de7a7b6@redhat.com>
Date:	Thu, 18 Aug 2016 15:26:02 +0200
From:	Denys Vlasenko <dvlasenk@...hat.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andy Lutomirski <luto@...capital.net>
Cc:	Sara Sharon <sara.sharon@...el.com>,
	Dan Williams <dan.j.williams@...el.com>,
	Christian König <christian.koenig@....com>,
	Vinod Koul <vinod.koul@...el.com>,
	Alex Deucher <alexander.deucher@....com>,
	Johannes Berg <johannes.berg@...el.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
	Andy Lutomirski <luto@...nel.org>,
	the arch/x86 maintainers <x86@...nel.org>,
	Ingo Molnar <mingo@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Adrian Hunter <adrian.hunter@...el.com>
Subject: Re: RFC: Petition Intel/AMD to add POPF_IF insn

> Of course, somebody really should do timings on modern CPU's (in cpl0,
> comparing native_fl() that enables interrupts with a popf)

I didn't do CPL0 tests yet. Realized that cli/sti can be tested in userspace
if we set iopl(3) first.

Surprisingly, STI is slower than CLI. A loop with 27 CLI's and one STI
converges to about ~0.5 insn/cycle:

# compile with: gcc -nostartfiles -nostdlib
_start:         .globl  _start
                 mov     $172, %eax #iopl
                 mov     $3, %edi
                 syscall
                 mov     $200*1000*1000, %eax
                 .balign 64
loop:
                 cli;cli;cli;cli
                 cli;cli;cli;cli
                 cli;cli;cli;cli
                 cli;cli;cli;cli

                 cli;cli;cli;cli
                 cli;cli;cli;cli
                 cli;cli;cli;sti
                 dec     %eax
                 jnz     loop

                 mov     $231, %eax #exit_group
                 syscall

perf stat:
      6,015,787,968      instructions              #    0.52  insn per cycle
        3.355474199 seconds time elapsed

With all CLIs replaced by STIs, it's ~0.25 insn/cycle:

      6,030,530,328      instructions              #    0.27  insn per cycle
        6.547200322 seconds time elapsed


POPF which needs to enable interrupts is not measurably faster than
one which does not change .IF:

Loop with:
   400158:	fa                   	cli
   400159:	53                   	push   %rbx  #saved eflags with if=1
   40015a:	9d                   	popfq
shows:
      8,908,857,324      instructions              #    0.11  insn per cycle           ( +-  0.00% )

Loop with:
   400140:	fb                   	sti
   400141:	53                   	push   %rbx
   400142:	9d                   	popfq
shows:
      8,920,243,701      instructions              #    0.10  insn per cycle           ( +-  0.01% )

Even loop with neither CLI nor STI, only with POPF:
   400140:	53                   	push   %rbx
   400141:	9d                   	popfq
shows:
      6,079,936,714      instructions              #    0.10  insn per cycle           ( +-  0.00% )

This is on a Skylake CPU.


The gist of it:
CLI is 2 cycles,
STI is 4 cycles,
POPF is 10 cycles
seemingly regardless of prior value of EFLAGS.IF.