lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 30 Nov 2017 22:29:44 -0800
From:   Andy Lutomirski <luto@...nel.org>
To:     x86@...nel.org
Cc:     linux-kernel@...r.kernel.org, Borislav Petkov <bp@...en8.de>,
        Brian Gerst <brgerst@...il.com>,
        David Laight <David.Laight@...lab.com>,
        Kees Cook <keescook@...omium.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Andy Lutomirski <luto@...nel.org>
Subject: [PATCH 4/6] Unsuck "x86/entry/64: Create a percpu SYSCALL entry trampoline"

This fixes a huge performance regression.

Please add to the changelog:

This patch actually seems to be a small speedup.  With this patch,
SYSCALL touches an extra cache line and an extra virtual page, but
the pipeline no longer stalls waiting for SWAPGS.  It seems that, at
least in a tight loop, the latter outweights the former.

Thanks to David Laight for an optimization tip.

[end addition to changelog]

Signed-off-by: Andy Lutomirski <luto@...nel.org>
---
 arch/x86/entry/entry_64.S | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index caf74a1bb3de..28f4e7553c26 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -180,14 +180,24 @@ ENTRY(entry_SYSCALL_64_trampoline)
 
 	/*
 	 * x86 lacks a near absolute jump, and we can't jump to the real
-	 * entry text with a relative jump, so we fake it using retq.
+	 * entry text with a relative jump.  We could push the target
+	 * address and then use retq, but this destroys the pipeline on
+	 * many CPUs (wasting over 20 cycles on Sandy Bridge).  Instead,
+	 * spill RDI and restore it in a second-stage trampoline.
 	 */
-	pushq	$entry_SYSCALL_64_after_hwframe
-	retq
+	pushq	%rdi
+	movq	$entry_SYSCALL_64_stage2, %rdi
+	jmp	*%rdi
 END(entry_SYSCALL_64_trampoline)
 
 	.popsection
 
+ENTRY(entry_SYSCALL_64_stage2)
+	UNWIND_HINT_EMPTY
+	popq	%rdi
+	jmp	entry_SYSCALL_64_after_hwframe
+END(entry_SYSCALL_64_stage2)
+
 ENTRY(entry_SYSCALL_64)
 	UNWIND_HINT_EMPTY
 	/*
-- 
2.13.6

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ