lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20190403164156.19645-27-bigeasy@linutronix.de>
Date:   Wed,  3 Apr 2019 18:41:55 +0200
From:   Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To:     linux-kernel@...r.kernel.org
Cc:     x86@...nel.org, Andy Lutomirski <luto@...nel.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Radim Krčmář <rkrcmar@...hat.com>,
        kvm@...r.kernel.org, "Jason A. Donenfeld" <Jason@...c4.com>,
        Rik van Riel <riel@...riel.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Subject: [PATCH 26/27] x86/fpu: Restore FPU register in copy_fpstate_to_sigframe() in order to use the fastpath

If a task is scheduled out and receives a signal then it won't be able to take
the fastpath because the register aren't available. The slowpath is more
expensive compared to xrstor + xsave which usually succeeds.

Some clock_gettime() numbers from a bigger box with AVX512 during bootup:
- __fpregs_load_activate() takes 140ns - 350ns. If it was the most recent FPU
  context on the CPU then the optimisation in __fpregs_load_activate() will
  skip the load (which was disabled during the test).

- copy_fpregs_to_sigframe() takes 200ns - 450ns if it succeeds. On a
  pagefault it is 1.8us - 3us usually in the 2.6us area.

- The slowpath takes 1.5 - 6us. Usually in the 2.6us area.

My testcases (including lat_sig) take the fastpath without
__fpregs_load_activate(). I expect this to be the majority.

Since the slowpath is in the >1us area it makes sense to load the
registers and attempt to save them directly. The direct save may fail
but should only happen on the first invocation or after fork() while the
page is RO.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
---
 arch/x86/kernel/fpu/signal.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index baf1588d7060c..16f700d5b3a47 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -176,19 +176,20 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 
 	fpregs_lock();
 	/*
-	 * If we do not need to load the FPU registers at return to userspace
-	 * then the CPU has the current state. Try to save it directly to
-	 * userland's stack frame if it does not cause a pagefault. If it does,
-	 * try the slowpath.
+	 * Load the FPU register if they are not valid for the current task.
+	 * With a valid FPU state we can attempt to save the state directly to
+	 * userland's stack frame which will likely succeed. If it does not, do
+	 * the slowpath.
 	 */
-	if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
-		pagefault_disable();
-		ret = copy_fpregs_to_sigframe(buf_fx);
-		pagefault_enable();
-		if (ret)
-			copy_fpregs_to_fpstate(fpu);
-		set_thread_flag(TIF_NEED_FPU_LOAD);
-	}
+	if (test_thread_flag(TIF_NEED_FPU_LOAD))
+		__fpregs_load_activate();
+
+	pagefault_disable();
+	ret = copy_fpregs_to_sigframe(buf_fx);
+	pagefault_enable();
+	if (ret && !test_thread_flag(TIF_NEED_FPU_LOAD))
+		copy_fpregs_to_fpstate(fpu);
+	set_thread_flag(TIF_NEED_FPU_LOAD);
 	fpregs_unlock();
 
 	if (ret) {
-- 
2.20.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ