[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1452715911-12067-5-git-send-email-mst@redhat.com>
Date: Wed, 13 Jan 2016 22:12:44 +0200
From: "Michael S. Tsirkin" <mst@...hat.com>
To: linux-kernel@...r.kernel.org,
Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Davidlohr Bueso <dave@...olabs.net>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
the arch/x86 maintainers <x86@...nel.org>,
Davidlohr Bueso <dbueso@...e.de>,
"H. Peter Anvin" <hpa@...or.com>,
virtualization <virtualization@...ts.linux-foundation.org>,
Borislav Petkov <bp@...en8.de>,
Andy Lutomirski <luto@...capital.net>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...e.de>,
Arnd Bergmann <arnd@...db.de>,
Andrey Konovalov <andreyknvl@...gle.com>,
Andy Lutomirski <luto@...nel.org>
Subject: [PATCH v3 4/4] x86: drop mfence in favor of lock+addl
mfence appears to be way slower than a locked instruction - let's use
lock+add unconditionally, as we always did on old 32-bit.
Just poking at SP would be the most natural, but if we
then read the value from SP, we get a false dependency
which will slow us down.
This was noted in this article:
http://shipilev.net/blog/2014/on-the-fence-with-dependencies/
And is easy to reproduce by sticking a barrier in a small non-inline
function.
So let's use a negative offset - which avoids this problem since we
build with the red zone disabled.
Update rmb/wmb on 32 bit to use the negative offset, too, for
consistency.
Suggested-by: Andy Lutomirski <luto@...capital.net>
Signed-off-by: Michael S. Tsirkin <mst@...hat.com>
---
arch/x86/include/asm/barrier.h | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index bfb28ca..9a2d257 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -11,16 +11,15 @@
*/
#ifdef CONFIG_X86_32
-#define mb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "mfence", \
- X86_FEATURE_XMM2) ::: "memory", "cc")
-#define rmb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "lfence", \
+#define mb() asm volatile("lock; addl $0,-4(%%esp)" ::: "memory", "cc")
+#define rmb() asm volatile(ALTERNATIVE("lock; addl $0,-4(%%esp)", "lfence", \
X86_FEATURE_XMM2) ::: "memory", "cc")
-#define wmb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "sfence", \
+#define wmb() asm volatile(ALTERNATIVE("lock; addl $0,-4(%%esp)", "sfence", \
X86_FEATURE_XMM2) ::: "memory", "cc")
#else
-#define mb() asm volatile("mfence":::"memory")
-#define rmb() asm volatile("lfence":::"memory")
-#define wmb() asm volatile("sfence" ::: "memory")
+#define mb() asm volatile("lock; addl $0,-4(%%rsp)" ::: "memory", "cc")
+#define rmb() asm volatile("lfence" ::: "memory")
+#define wmb() asm volatile("sfence" ::: "memory")
#endif
#ifdef CONFIG_X86_PPRO_FENCE
--
MST
Powered by blists - more mailing lists