lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue,  5 May 2015 19:58:31 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	linux-kernel@...r.kernel.org
Cc:	Andy Lutomirski <luto@...capital.net>,
	Borislav Petkov <bp@...en8.de>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	Fenghua Yu <fenghua.yu@...el.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Oleg Nesterov <oleg@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: [PATCH 207/208] x86/fpu: Add FPU performance measurement subsystem

Add a short FPU performance suite that runs once during bootup.

It can be enabled via CONFIG_X86_DEBUG_FPU_PERFORMANCE=y.

  x86/fpu:##################################################################
  x86/fpu: Running FPU performance measurement suite (cache hot):
  x86/fpu: Cost of: null                                      :   108 cycles
  x86/fpu:########  CPU instructions:           ############################
  x86/fpu: Cost of: NOP                         insn          :     0 cycles
  x86/fpu: Cost of: RDTSC                       insn          :    12 cycles
  x86/fpu: Cost of: RDMSR                       insn          :   100 cycles
  x86/fpu: Cost of: WRMSR                       insn          :   396 cycles
  x86/fpu: Cost of: CLI                         insn  same-IF :     0 cycles
  x86/fpu: Cost of: CLI                         insn  flip-IF :     0 cycles
  x86/fpu: Cost of: STI                         insn  same-IF :     0 cycles
  x86/fpu: Cost of: STI                         insn  flip-IF :     0 cycles
  x86/fpu: Cost of: PUSHF                       insn          :     0 cycles
  x86/fpu: Cost of: POPF                        insn  same-IF :    20 cycles
  x86/fpu: Cost of: POPF                        insn  flip-IF :    28 cycles
  x86/fpu:########  IRQ save/restore APIs:      ############################
  x86/fpu: Cost of: local_irq_save()            fn            :    20 cycles
  x86/fpu: Cost of: local_irq_restore()         fn    same-IF :    24 cycles
  x86/fpu: Cost of: local_irq_restore()         fn    flip-IF :    28 cycles
  x86/fpu: Cost of: irq_save()+restore()        fn    same-IF :    48 cycles
  x86/fpu: Cost of: irq_save()+restore()        fn    flip-IF :    48 cycles
  x86/fpu:########  locking APIs:               ############################
  x86/fpu: Cost of: smp_mb()                    fn            :    40 cycles
  x86/fpu: Cost of: cpu_relax()                 fn            :     8 cycles
  x86/fpu: Cost of: spin_lock()+unlock()        fn            :    64 cycles
  x86/fpu: Cost of: read_lock()+unlock()        fn            :    76 cycles
  x86/fpu: Cost of: write_lock()+unlock()       fn            :    52 cycles
  x86/fpu: Cost of: rcu_read_lock()+unlock()    fn            :    16 cycles
  x86/fpu: Cost of: preempt_disable()+enable()  fn            :    20 cycles
  x86/fpu: Cost of: mutex_lock()+unlock()       fn            :    56 cycles
  x86/fpu:########  MM instructions:            ############################
  x86/fpu: Cost of: __flush_tlb()               fn            :   132 cycles
  x86/fpu: Cost of: __flush_tlb_global()        fn            :   920 cycles
  x86/fpu: Cost of: __flush_tlb_one()           fn            :   288 cycles
  x86/fpu: Cost of: __flush_tlb_range()         fn            :   412 cycles
  x86/fpu:########  FPU instructions:           ############################
  x86/fpu: Cost of: CR0                         read          :     4 cycles
  x86/fpu: Cost of: CR0                         write         :   208 cycles
  x86/fpu: Cost of: CR0::TS                     fault         :  1156 cycles
  x86/fpu: Cost of: FNINIT                      insn          :    76 cycles
  x86/fpu: Cost of: FWAIT                       insn          :     0 cycles
  x86/fpu: Cost of: FSAVE                       insn          :   168 cycles
  x86/fpu: Cost of: FRSTOR                      insn          :   160 cycles
  x86/fpu: Cost of: FXSAVE                      insn          :    84 cycles
  x86/fpu: Cost of: FXRSTOR                     insn          :    44 cycles
  x86/fpu: Cost of: FXRSTOR                     fault         :   688 cycles
  x86/fpu: Cost of: XSAVE                       insn          :   104 cycles
  x86/fpu: Cost of: XRSTOR                      insn          :    80 cycles
  x86/fpu: Cost of: XRSTOR                      fault         :   884 cycles
  x86/fpu:##################################################################

on an AMD system:

  x86/fpu:##################################################################
  x86/fpu: Running FPU performance measurement suite (cache hot):
  x86/fpu: Cost of: null                                      :   144 cycles
  x86/fpu:########  CPU instructions:           ############################
  x86/fpu: Cost of: NOP                         insn          :     4 cycles
  x86/fpu: Cost of: RDTSC                       insn          :    71 cycles
  x86/fpu: Cost of: RDMSR                       insn          :    43 cycles
  x86/fpu: Cost of: WRMSR                       insn          :   148 cycles
  x86/fpu: Cost of: CLI                         insn  same-IF :     8 cycles
  x86/fpu: Cost of: CLI                         insn  flip-IF :     5 cycles
  x86/fpu: Cost of: STI                         insn  same-IF :    28 cycles
  x86/fpu: Cost of: STI                         insn  flip-IF :     0 cycles
  x86/fpu: Cost of: PUSHF                       insn          :    15 cycles
  x86/fpu: Cost of: POPF                        insn  same-IF :     8 cycles
  x86/fpu: Cost of: POPF                        insn  flip-IF :    12 cycles
  x86/fpu:########  IRQ save/restore APIs:      ############################
  x86/fpu: Cost of: local_irq_save()            fn            :     0 cycles
  x86/fpu: Cost of: local_irq_restore()         fn    same-IF :     7 cycles
  x86/fpu: Cost of: local_irq_restore()         fn    flip-IF :    20 cycles
  x86/fpu: Cost of: irq_save()+restore()        fn    same-IF :    20 cycles
  x86/fpu: Cost of: irq_save()+restore()        fn    flip-IF :    20 cycles
  x86/fpu:########  locking APIs:               ############################
  x86/fpu: Cost of: smp_mb()                    fn            :    38 cycles
  x86/fpu: Cost of: cpu_relax()                 fn            :     7 cycles
  x86/fpu: Cost of: spin_lock()+unlock()        fn            :    89 cycles
  x86/fpu: Cost of: read_lock()+unlock()        fn            :    91 cycles
  x86/fpu: Cost of: write_lock()+unlock()       fn            :    85 cycles
  x86/fpu: Cost of: rcu_read_lock()+unlock()    fn            :    30 cycles
  x86/fpu: Cost of: preempt_disable()+enable()  fn            :    38 cycles
  x86/fpu: Cost of: mutex_lock()+unlock()       fn            :    64 cycles
  x86/fpu:########  MM instructions:            ############################
  x86/fpu: Cost of: __flush_tlb()               fn            :   134 cycles
  x86/fpu: Cost of: __flush_tlb_global()        fn            :   547 cycles
  x86/fpu: Cost of: __flush_tlb_one()           fn            :   128 cycles
  x86/fpu: Cost of: __flush_tlb_range()         fn            :   539 cycles
  x86/fpu:########  FPU instructions:           ############################
  x86/fpu: Cost of: CR0                         read          :    16 cycles
  x86/fpu: Cost of: CR0                         write         :    83 cycles
  x86/fpu: Cost of: CR0::TS                     fault         :   691 cycles
  x86/fpu: Cost of: FNINIT                      insn          :   118 cycles
  x86/fpu: Cost of: FWAIT                       insn          :     4 cycles
  x86/fpu: Cost of: FSAVE                       insn          :   156 cycles
  x86/fpu: Cost of: FRSTOR                      insn          :   151 cycles
  x86/fpu: Cost of: FXSAVE                      insn          :    73 cycles
  x86/fpu: Cost of: FXRSTOR                     insn          :    86 cycles
  x86/fpu: Cost of: FXRSTOR                     fault         :   441 cycles
  x86/fpu:##################################################################

Note that there can be some jitter in the results between bootups.
The measurement takes the shortest of all runs, which is relatively
but not completely stable. So for example in the above test,
NOPs obviously don't take 3 cycles. Results are expected to be
relatively accurate for more complex instructions.

Cc: Andy Lutomirski <luto@...capital.net>
Cc: Borislav Petkov <bp@...en8.de>
Cc: Dave Hansen <dave.hansen@...ux.intel.com>
Cc: Fenghua Yu <fenghua.yu@...el.com>
Cc: H. Peter Anvin <hpa@...or.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Oleg Nesterov <oleg@...hat.com>
Cc: Thomas Gleixner <tglx@...utronix.de>
Signed-off-by: Ingo Molnar <mingo@...nel.org>
---
 arch/x86/Kconfig.debug             |  15 ++
 arch/x86/include/asm/fpu/measure.h |  13 ++
 arch/x86/kernel/cpu/bugs.c         |   2 +
 arch/x86/kernel/cpu/bugs_64.c      |   2 +
 arch/x86/kernel/fpu/Makefile       |   8 +-
 arch/x86/kernel/fpu/measure.c      | 509 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 548 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 2fd3ebbb4e33..8329635101f8 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -344,4 +344,19 @@ config X86_DEBUG_FPU
 
 	  If unsure, say N.
 
+config X86_DEBUG_FPU_PERFORMANCE
+	bool "Measure x86 FPU performance"
+	depends on DEBUG_KERNEL
+	---help---
+	  If this option is enabled then the kernel will run a short
+	  FPU (Floating Point Unit) benchmarking suite during bootup,
+	  to measure the cost of various FPU hardware operations and
+	  other kernel APIs.
+
+	  The results are printed to the kernel log.
+
+	  This extra benchmarking code will be freed after bootup.
+
+	  If unsure, say N.
+
 endmenu
diff --git a/arch/x86/include/asm/fpu/measure.h b/arch/x86/include/asm/fpu/measure.h
new file mode 100644
index 000000000000..d003809491c2
--- /dev/null
+++ b/arch/x86/include/asm/fpu/measure.h
@@ -0,0 +1,13 @@
+/*
+ * x86 FPU performance measurement methods:
+ */
+#ifndef _ASM_X86_FPU_MEASURE_H
+#define _ASM_X86_FPU_MEASURE_H
+
+#ifdef CONFIG_X86_DEBUG_FPU_PERFORMANCE
+extern void fpu__measure(void);
+#else
+static inline void fpu__measure(void) { }
+#endif
+
+#endif /* _ASM_X86_FPU_MEASURE_H */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index bd17db15a2c1..1b947415d903 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -13,6 +13,7 @@
 #include <asm/processor.h>
 #include <asm/processor-flags.h>
 #include <asm/fpu/internal.h>
+#include <asm/fpu/measure.h>
 #include <asm/msr.h>
 #include <asm/paravirt.h>
 #include <asm/alternative.h>
@@ -37,6 +38,7 @@ void __init check_bugs(void)
 
 	init_utsname()->machine[1] =
 		'0' + (boot_cpu_data.x86 > 6 ? 6 : boot_cpu_data.x86);
+	fpu__measure();
 	alternative_instructions();
 
 	fpu__init_check_bugs();
diff --git a/arch/x86/kernel/cpu/bugs_64.c b/arch/x86/kernel/cpu/bugs_64.c
index 04f0fe5af83e..846c24aa14cf 100644
--- a/arch/x86/kernel/cpu/bugs_64.c
+++ b/arch/x86/kernel/cpu/bugs_64.c
@@ -8,6 +8,7 @@
 #include <asm/alternative.h>
 #include <asm/bugs.h>
 #include <asm/processor.h>
+#include <asm/fpu/measure.h>
 #include <asm/mtrr.h>
 #include <asm/cacheflush.h>
 
@@ -18,6 +19,7 @@ void __init check_bugs(void)
 	printk(KERN_INFO "CPU: ");
 	print_cpu_info(&boot_cpu_data);
 #endif
+	fpu__measure();
 	alternative_instructions();
 
 	/*
diff --git a/arch/x86/kernel/fpu/Makefile b/arch/x86/kernel/fpu/Makefile
index 68279efb811a..e7676c20bdde 100644
--- a/arch/x86/kernel/fpu/Makefile
+++ b/arch/x86/kernel/fpu/Makefile
@@ -2,4 +2,10 @@
 # Build rules for the FPU support code:
 #
 
-obj-y				+= init.o bugs.o core.o regset.o signal.o xstate.o
+obj-y					+= init.o bugs.o core.o regset.o signal.o xstate.o
+
+# Make the measured functions as simple as possible:
+CFLAGS_measure.o += -fomit-frame-pointer
+CFLAGS_REMOVE_measure.o = -pg
+
+obj-$(CONFIG_X86_DEBUG_FPU_PERFORMANCE) += measure.o
diff --git a/arch/x86/kernel/fpu/measure.c b/arch/x86/kernel/fpu/measure.c
new file mode 100644
index 000000000000..6232cdf240d8
--- /dev/null
+++ b/arch/x86/kernel/fpu/measure.c
@@ -0,0 +1,509 @@
+/*
+ * FPU performance measurement routines
+ */
+#include <asm/fpu/internal.h>
+#include <asm/tlbflush.h>
+
+#include <linux/kernel.h>
+
+/*
+ * Number of repeated measurements we do. We pick the fastest one:
+ */
+static int loops = 1000;
+
+/*
+ * Various small functions, whose overhead we measure:
+ */
+
+typedef void (*bench_fn_t)(void) __aligned(32);
+
+static void fn_empty(void)
+{
+}
+
+/* Basic instructions: */
+
+static void fn_nop(void)
+{
+	asm volatile ("nop");
+}
+
+static void fn_rdtsc(void)
+{
+	u32 low, high;
+
+	asm volatile ("rdtsc": "=a"(low), "=d"(high));
+}
+
+static void fn_rdmsr(void)
+{
+	u64 efer;
+
+	rdmsrl_safe(MSR_EFER, &efer);
+}
+
+static void fn_wrmsr(void)
+{
+	u64 efer;
+
+	if (!rdmsrl_safe(MSR_EFER, &efer))
+		wrmsrl_safe(MSR_EFER, efer);
+}
+
+static void fn_cli_same(void)
+{
+	asm volatile ("cli");
+}
+
+static void fn_cli_flip(void)
+{
+	asm volatile ("sti");
+	asm volatile ("cli");
+}
+
+static void fn_sti_same(void)
+{
+	asm volatile ("sti");
+}
+
+static void fn_sti_flip(void)
+{
+	asm volatile ("cli");
+	asm volatile ("sti");
+}
+
+static void fn_pushf(void)
+{
+	arch_local_save_flags();
+}
+
+static void fn_popf_baseline(void)
+{
+	arch_local_save_flags();
+	asm volatile ("cli");
+}
+
+static void fn_popf_flip(void)
+{
+	unsigned long flags = arch_local_save_flags();
+	asm volatile ("cli");
+
+	arch_local_irq_restore(flags);
+}
+
+static void fn_popf_same(void)
+{
+	unsigned long flags = arch_local_save_flags();
+
+	arch_local_irq_restore(flags);
+}
+
+/* Basic IRQ save/restore APIs: */
+
+static void fn_irq_save_baseline(void)
+{
+	local_irq_enable();
+}
+
+static void fn_irq_save(void)
+{
+	unsigned long flags;
+
+	local_irq_enable();
+	local_irq_save(flags);
+}
+
+static void fn_irq_restore_flip(void)
+{
+	unsigned long flags;
+
+	local_irq_enable();
+	local_irq_save(flags);
+	local_irq_restore(flags);
+}
+
+static void fn_irq_restore_same(void)
+{
+	unsigned long flags;
+
+	local_irq_disable();
+	local_irq_save(flags);
+	local_irq_restore(flags);
+}
+
+static void fn_irq_save_restore_flip(void)
+{
+	unsigned long flags;
+
+	local_irq_enable();
+
+	local_irq_save(flags);
+	local_irq_restore(flags);
+}
+
+static void fn_irq_save_restore_same(void)
+{
+	unsigned long flags;
+
+	local_irq_enable();
+
+	local_irq_save(flags);
+	local_irq_restore(flags);
+}
+
+/* Basic locking primitives: */
+
+static void fn_smp_mb(void)
+{
+	smp_mb();
+}
+
+static void fn_cpu_relax(void)
+{
+	cpu_relax();
+}
+
+static DEFINE_SPINLOCK(test_spinlock);
+
+static void fn_spin_lock_unlock(void)
+{
+	spin_lock(&test_spinlock);
+	spin_unlock(&test_spinlock);
+}
+
+static DEFINE_RWLOCK(test_rwlock);
+
+static void fn_read_lock_unlock(void)
+{
+	read_lock(&test_rwlock);
+	read_unlock(&test_rwlock);
+}
+
+static void fn_write_lock_unlock(void)
+{
+	write_lock(&test_rwlock);
+	write_unlock(&test_rwlock);
+}
+
+static void fn_rcu_read_lock_unlock(void)
+{
+	rcu_read_lock();
+	rcu_read_unlock();
+}
+
+static void fn_preempt_disable_enable(void)
+{
+	preempt_disable();
+	preempt_enable();
+}
+
+static DEFINE_MUTEX(test_mutex);
+
+static void fn_mutex_lock_unlock(void)
+{
+	local_irq_enable();
+
+	mutex_lock(&test_mutex);
+	mutex_unlock(&test_mutex);
+}
+
+/* MM instructions: */
+
+static void fn_flush_tlb(void)
+{
+	__flush_tlb();
+}
+
+static void fn_flush_tlb_global(void)
+{
+	__flush_tlb_global();
+}
+
+static char tlb_flush_target[PAGE_SIZE] __aligned(4096);
+
+static void fn_flush_tlb_one(void)
+{
+	unsigned long addr = (unsigned long)&tlb_flush_target;
+
+	tlb_flush_target[0]++;
+	__flush_tlb_one(addr);
+}
+
+static void fn_flush_tlb_range(void)
+{
+	unsigned long start = (unsigned long)&tlb_flush_target;
+	unsigned long end = start+PAGE_SIZE;
+	struct mm_struct *mm_saved;
+
+	tlb_flush_target[0]++;
+
+	mm_saved = current->mm;
+	current->mm = current->active_mm;
+
+	flush_tlb_mm_range(current->active_mm, start, end, 0);
+
+	current->mm = mm_saved;
+}
+
+/* FPU instructions: */
+/* FPU instructions: */
+
+static void fn_read_cr0(void)
+{
+	read_cr0();
+}
+
+static void fn_rw_cr0(void)
+{
+	write_cr0(read_cr0());
+}
+
+static void fn_cr0_fault(void)
+{
+	struct fpu *fpu = &current->thread.fpu;
+	u32 cr0 = read_cr0();
+
+	write_cr0(cr0 | X86_CR0_TS);
+
+	asm volatile("fwait");
+
+	/* Zap the FP state we created via the fault: */
+	fpu->fpregs_active = 0;
+	fpu->fpstate_active = 0;
+
+	write_cr0(cr0);
+}
+
+static void fn_fninit(void)
+{
+	asm volatile ("fninit");
+}
+
+static void fn_fwait(void)
+{
+	asm volatile("fwait");
+}
+
+static void fn_fsave(void)
+{
+	static struct fregs_state fstate __aligned(32);
+
+	copy_fregs_to_user(&fstate);
+}
+
+static void fn_frstor(void)
+{
+	static struct fregs_state fstate __aligned(32);
+
+	copy_fregs_to_user(&fstate);
+	copy_user_to_fregs(&fstate);
+}
+
+static void fn_fxsave(void)
+{
+	struct fxregs_state fxstate __aligned(32);
+
+	copy_fxregs_to_user(&fxstate);
+}
+
+static void fn_fxrstor(void)
+{
+	static struct fxregs_state fxstate __aligned(32);
+
+	copy_fxregs_to_user(&fxstate);
+	copy_user_to_fxregs(&fxstate);
+}
+
+/*
+ * Provoke #GP on invalid FXRSTOR:
+ */
+static void fn_fxrstor_fault(void)
+{
+	static struct fxregs_state fxstate __aligned(32);
+	struct fpu *fpu = &current->thread.fpu;
+
+	copy_fxregs_to_user(&fxstate);
+
+	/* Set invalid MXCSR value, this will generate a #GP: */
+	fxstate.mxcsr = -1;
+
+	copy_user_to_fxregs(&fxstate);
+
+	/* Zap any FP state we created via the fault: */
+	fpu->fpregs_active = 0;
+	fpu->fpstate_active = 0;
+}
+
+static void fn_xsave(void)
+{
+	static struct xregs_state x __aligned(32);
+
+	copy_xregs_to_kernel_booting(&x);
+}
+
+static void fn_xrstor(void)
+{
+	static struct xregs_state x __aligned(32);
+
+	copy_xregs_to_kernel_booting(&x);
+	copy_kernel_to_xregs_booting(&x, -1);
+}
+
+/*
+ * Provoke #GP on invalid XRSTOR:
+ */
+static void fn_xrstor_fault(void)
+{
+	static struct xregs_state x __aligned(32);
+
+	copy_xregs_to_kernel_booting(&x);
+
+	/* Set invalid MXCSR value, this will generate a #GP: */
+	x.i387.mxcsr = -1;
+
+	copy_kernel_to_xregs_booting(&x, -1);
+}
+
+static s64
+measure(s64 null_overhead, bench_fn_t bench_fn,
+	const char *txt_1, const char *txt_2, const char *txt_3)
+{
+	unsigned long flags;
+	u32 cr0_saved;
+	int eager_saved;
+	u64 t0, t1;
+	s64 delta, delta_min;
+	int i;
+
+	delta_min = LONG_MAX;
+
+	/* Disable eagerfpu, so that we can provoke CR0::TS faults: */
+	eager_saved = boot_cpu_has(X86_FEATURE_EAGER_FPU);
+	setup_clear_cpu_cap(X86_FEATURE_EAGER_FPU);
+
+	/* Save CR0 so that we can freely set it to any value during measurement: */
+	cr0_saved = read_cr0();
+	/* Clear TS, so that we can measure FPU ops by default: */
+	write_cr0(cr0_saved & ~X86_CR0_TS);
+
+	local_irq_save(flags);
+
+	asm volatile (".align 32\n");
+
+	for (i = 0; i < loops; i++) {
+		rdtscll(t0);
+		mb();
+
+		bench_fn();
+
+		mb();
+		rdtscll(t1);
+		delta = t1-t0;
+		if (delta <= 0)
+			continue;
+
+		delta_min = min(delta_min, delta);
+	}
+
+	local_irq_restore(flags);
+	write_cr0(cr0_saved);
+
+	if (eager_saved)
+		setup_force_cpu_cap(X86_FEATURE_EAGER_FPU);
+
+	delta_min = max(0LL, delta_min-null_overhead);
+
+	if (txt_1) {
+		if (!txt_2)
+			txt_2 = "";
+		if (!txt_3)
+			txt_3 = "";
+		pr_info("x86/fpu: Cost of: %-27s %-5s %-8s: %5Ld cycles\n", txt_1, txt_2, txt_3, delta_min);
+	}
+
+	return delta_min;
+}
+
+/*
+ * Measure all the above primitives:
+ */
+void __init fpu__measure(void)
+{
+	s64 cost;
+	s64 rdmsr_cost;
+	s64 cli_cost, sti_cost, popf_cost, irq_save_cost;
+	s64 cr0_read_cost, cr0_write_cost;
+	s64 save_cost;
+
+	pr_info("x86/fpu:##################################################################\n");
+	pr_info("x86/fpu: Running FPU performance measurement suite (cache hot):\n");
+
+	cost = measure(0, fn_empty, "null", NULL, NULL);
+
+	pr_info("x86/fpu:########  CPU instructions:           ############################\n");
+	measure(cost, fn_nop, "NOP", "insn", NULL);
+	measure(cost, fn_rdtsc, "RDTSC", "insn", NULL);
+
+	rdmsr_cost = measure(cost, fn_rdmsr, "RDMSR", "insn", NULL);
+	measure(cost+rdmsr_cost, fn_wrmsr,"WRMSR", "insn", NULL);
+
+	cli_cost = measure(cost, fn_cli_same, "CLI", "insn", "same-IF");
+	measure(cost+cli_cost, fn_cli_flip, "CLI", "insn", "flip-IF");
+
+	sti_cost = measure(cost, fn_sti_same, "STI", "insn", "same-IF");
+	measure(cost+sti_cost, fn_sti_flip, "STI", "insn", "flip-IF");
+
+	measure(cost, fn_pushf,	"PUSHF", "insn", NULL);
+
+	popf_cost = measure(cost, fn_popf_baseline, NULL, NULL, NULL);
+	measure(cost+popf_cost, fn_popf_same, "POPF", "insn", "same-IF");
+	measure(cost+popf_cost, fn_popf_flip, "POPF", "insn", "flip-IF");
+
+	pr_info("x86/fpu:########  IRQ save/restore APIs:      ############################\n");
+	irq_save_cost = measure(cost, fn_irq_save_baseline, NULL, NULL, NULL);
+	irq_save_cost += measure(cost+irq_save_cost, fn_irq_save, "local_irq_save()", "fn", NULL);
+	measure(cost+irq_save_cost, fn_irq_restore_same, "local_irq_restore()", "fn", "same-IF");
+	measure(cost+irq_save_cost, fn_irq_restore_flip, "local_irq_restore()", "fn", "flip-IF");
+	measure(cost+sti_cost, fn_irq_save_restore_same, "irq_save()+restore()", "fn", "same-IF");
+	measure(cost+sti_cost, fn_irq_save_restore_flip, "irq_save()+restore()", "fn", "flip-IF");
+
+	pr_info("x86/fpu:########  locking APIs:               ############################\n");
+	measure(cost, fn_smp_mb, "smp_mb()", "fn", NULL);
+	measure(cost, fn_cpu_relax, "cpu_relax()", "fn", NULL);
+	measure(cost, fn_spin_lock_unlock, "spin_lock()+unlock()", "fn", NULL);
+	measure(cost, fn_read_lock_unlock, "read_lock()+unlock()", "fn", NULL);
+	measure(cost, fn_write_lock_unlock, "write_lock()+unlock()", "fn", NULL);
+	measure(cost, fn_rcu_read_lock_unlock, "rcu_read_lock()+unlock()", "fn", NULL);
+	measure(cost, fn_preempt_disable_enable, "preempt_disable()+enable()", "fn", NULL);
+	measure(cost+sti_cost, fn_mutex_lock_unlock, "mutex_lock()+unlock()", "fn", NULL);
+
+	pr_info("x86/fpu:########  MM instructions:            ############################\n");
+	measure(cost, fn_flush_tlb, "__flush_tlb()", "fn", NULL);
+	measure(cost, fn_flush_tlb_global, "__flush_tlb_global()", "fn", NULL);
+	measure(cost, fn_flush_tlb_one, "__flush_tlb_one()", "fn", NULL);
+	measure(cost, fn_flush_tlb_range, "__flush_tlb_range()", "fn", NULL);
+
+	pr_info("x86/fpu:########  FPU instructions:           ############################\n");
+	cr0_read_cost = measure(cost, fn_read_cr0, "CR0", "read", NULL);
+	cr0_write_cost = measure(cost+cr0_read_cost, fn_rw_cr0,	"CR0", "write", NULL);
+
+	measure(cost+cr0_read_cost+cr0_write_cost, fn_cr0_fault, "CR0::TS", "fault", NULL);
+
+	measure(cost, fn_fninit, "FNINIT", "insn", NULL);
+	measure(cost, fn_fwait,	"FWAIT", "insn", NULL);
+
+	save_cost = measure(cost, fn_fsave, "FSAVE", "insn", NULL);
+	measure(cost+save_cost, fn_frstor, "FRSTOR", "insn", NULL);
+
+	if (cpu_has_fxsr) {
+		save_cost = measure(cost, fn_fxsave, "FXSAVE", "insn", NULL);
+		measure(cost+save_cost, fn_fxrstor, "FXRSTOR", "insn", NULL);
+		measure(cost+save_cost, fn_fxrstor_fault,"FXRSTOR", "fault", NULL);
+	}
+	if (cpu_has_xsaveopt) {
+		save_cost = measure(cost, fn_xsave, "XSAVE", "insn", NULL);
+		measure(cost+save_cost, fn_xrstor, "XRSTOR", "insn", NULL);
+		measure(cost+save_cost, fn_xrstor_fault, "XRSTOR", "fault", NULL);
+	}
+	pr_info("x86/fpu:##################################################################\n");
+}
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ