[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141113192723.12579.25343.stgit@ahduyck-server>
Date: Thu, 13 Nov 2014 11:27:23 -0800
From: Alexander Duyck <alexander.h.duyck@...hat.com>
To: linux-arch@...r.kernel.org, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org
Cc: mikey@...ling.org, tony.luck@...el.com,
mathieu.desnoyers@...ymtl.ca, donald.c.skidmore@...el.com,
peterz@...radead.org, benh@...nel.crashing.org,
heiko.carstens@...ibm.com, oleg@...hat.com, will.deacon@....com,
davem@...emloft.net, michael@...erman.id.au,
matthew.vick@...el.com, nic_swsd@...ltek.com, geert@...ux-m68k.org,
jeffrey.t.kirsher@...el.com, fweisbec@...il.com,
schwidefsky@...ibm.com, linux@....linux.org.uk,
paulmck@...ux.vnet.ibm.com, torvalds@...ux-foundation.org,
mingo@...nel.org
Subject: [PATCH 1/3] arch: Introduce load_acquire() and store_release()
It is common for device drivers to make use of acquire/release semantics
when dealing with descriptors stored in device memory. On reviewing the
documentation and code for smp_load_acquire() and smp_store_release() as
well as reviewing an IBM website that goes over the use of PowerPC barriers
at http://www.ibm.com/developerworks/systems/articles/powerpc.html it
occurred to me that the same code could likely be applied to device drivers.
As a result this patch introduces load_acquire() and store_release(). The
load_acquire() function can be used in the place of situations where a test
for ownership must be followed by a memory barrier. The below example is
from ixgbe:
if (!rx_desc->wb.upper.status_error)
break;
/* This memory barrier is needed to keep us from reading
* any other fields out of the rx_desc until we know the
* descriptor has been written back
*/
rmb();
With load_acquire() this can be changed to:
if (!load_acquire(&rx_desc->wb.upper.status_error))
break;
A similar change can be made in the release path of many drivers. For
example in the Realtek r8169 driver there are a number of flows that
consist of something like the following:
wmb();
status = opts[0] | len | (RingEnd * !((entry + 1) % NUM_TX_DESC));
txd->opts1 = cpu_to_le32(status);
tp->cur_tx += frags + 1;
wmb();
With store_release() this can be changed to the following:
status = opts[0] | len | (RingEnd * !((entry + 1) % NUM_TX_DESC));
store_release(&txd->opts1, cpu_to_le32(status));
tp->cur_tx += frags + 1;
wmb();
The resulting assembler code generated as a result can be significantly
less expensive on architectures such as x86 and s390 that support strong
ordering. On architectures that are able to use different primitives than
their rmb/wmb() such as powerpc, ia64, and arm64 we should see gains as we
are able to use less expensive barriers, and for other architectures we end
up using a mb() which may come at the same amount of overhead or more than
a rmb/wmb() as we must ensure Load/Store ordering.
Cc: Benjamin Herrenschmidt <benh@...nel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@...il.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
Cc: Michael Ellerman <michael@...erman.id.au>
Cc: Michael Neuling <mikey@...ling.org>
Cc: Russell King <linux@....linux.org.uk>
Cc: Geert Uytterhoeven <geert@...ux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@...ibm.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@...ibm.com>
Cc: Tony Luck <tony.luck@...el.com>
Cc: Oleg Nesterov <oleg@...hat.com>
Cc: Will Deacon <will.deacon@....com>
Cc: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...nel.org>
Cc: David Miller <davem@...emloft.net>
Signed-off-by: Alexander Duyck <alexander.h.duyck@...hat.com>
---
arch/arm/include/asm/barrier.h | 15 +++++++++
arch/arm64/include/asm/barrier.h | 59 ++++++++++++++++++-----------------
arch/ia64/include/asm/barrier.h | 7 +++-
arch/metag/include/asm/barrier.h | 15 +++++++++
arch/mips/include/asm/barrier.h | 15 +++++++++
arch/powerpc/include/asm/barrier.h | 24 +++++++++++---
arch/s390/include/asm/barrier.h | 7 +++-
arch/sparc/include/asm/barrier_64.h | 6 ++--
arch/x86/include/asm/barrier.h | 22 ++++++++++++-
include/asm-generic/barrier.h | 15 +++++++++
10 files changed, 144 insertions(+), 41 deletions(-)
diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h
index c6a3e73..bbdcd34 100644
--- a/arch/arm/include/asm/barrier.h
+++ b/arch/arm/include/asm/barrier.h
@@ -59,6 +59,21 @@
#define smp_wmb() dmb(ishst)
#endif
+#define store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ mb(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+#define load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = ACCESS_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ mb(); \
+ ___p1; \
+})
+
#define smp_store_release(p, v) \
do { \
compiletime_assert_atomic_type(*p); \
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 6389d60..c91571c 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -32,33 +32,7 @@
#define rmb() dsb(ld)
#define wmb() dsb(st)
-#ifndef CONFIG_SMP
-#define smp_mb() barrier()
-#define smp_rmb() barrier()
-#define smp_wmb() barrier()
-
-#define smp_store_release(p, v) \
-do { \
- compiletime_assert_atomic_type(*p); \
- barrier(); \
- ACCESS_ONCE(*p) = (v); \
-} while (0)
-
-#define smp_load_acquire(p) \
-({ \
- typeof(*p) ___p1 = ACCESS_ONCE(*p); \
- compiletime_assert_atomic_type(*p); \
- barrier(); \
- ___p1; \
-})
-
-#else
-
-#define smp_mb() dmb(ish)
-#define smp_rmb() dmb(ishld)
-#define smp_wmb() dmb(ishst)
-
-#define smp_store_release(p, v) \
+#define store_release(p, v) \
do { \
compiletime_assert_atomic_type(*p); \
switch (sizeof(*p)) { \
@@ -73,7 +47,7 @@ do { \
} \
} while (0)
-#define smp_load_acquire(p) \
+#define load_acquire(p) \
({ \
typeof(*p) ___p1; \
compiletime_assert_atomic_type(*p); \
@@ -90,6 +64,35 @@ do { \
___p1; \
})
+#ifndef CONFIG_SMP
+#define smp_mb() barrier()
+#define smp_rmb() barrier()
+#define smp_wmb() barrier()
+
+#define smp_store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ barrier(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+#define smp_load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = ACCESS_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ barrier(); \
+ ___p1; \
+})
+
+#else
+
+#define smp_mb() dmb(ish)
+#define smp_rmb() dmb(ishld)
+#define smp_wmb() dmb(ishst)
+
+#define smp_store_release(p, v) store_release(p, v)
+#define smp_load_acquire(p) load_acquire(p)
+
#endif
#define read_barrier_depends() do { } while(0)
diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h
index a48957c..d7fe208 100644
--- a/arch/ia64/include/asm/barrier.h
+++ b/arch/ia64/include/asm/barrier.h
@@ -63,14 +63,14 @@
* need for asm trickery!
*/
-#define smp_store_release(p, v) \
+#define store_release(p, v) \
do { \
compiletime_assert_atomic_type(*p); \
barrier(); \
ACCESS_ONCE(*p) = (v); \
} while (0)
-#define smp_load_acquire(p) \
+#define load_acquire(p) \
({ \
typeof(*p) ___p1 = ACCESS_ONCE(*p); \
compiletime_assert_atomic_type(*p); \
@@ -78,6 +78,9 @@ do { \
___p1; \
})
+#define smp_store_release(p, v) store_release(p, v)
+#define smp_load_acquire(p) load_acquire(p)
+
/*
* XXX check on this ---I suspect what Linus really wants here is
* acquire vs release semantics but we can't discuss this stuff with
diff --git a/arch/metag/include/asm/barrier.h b/arch/metag/include/asm/barrier.h
index c7591e8..9beb687 100644
--- a/arch/metag/include/asm/barrier.h
+++ b/arch/metag/include/asm/barrier.h
@@ -85,6 +85,21 @@ static inline void fence(void)
#define smp_read_barrier_depends() do { } while (0)
#define set_mb(var, value) do { var = value; smp_mb(); } while (0)
+#define store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ mb(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+#define load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = ACCESS_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ mb(); \
+ ___p1; \
+})
+
#define smp_store_release(p, v) \
do { \
compiletime_assert_atomic_type(*p); \
diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index d0101dd..fc7323c 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -180,6 +180,21 @@
#define nudge_writes() mb()
#endif
+#define store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ mb(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+#define load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = ACCESS_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ mb(); \
+ ___p1; \
+})
+
#define smp_store_release(p, v) \
do { \
compiletime_assert_atomic_type(*p); \
diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
index bab79a1..f2a0d73 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -37,6 +37,23 @@
#define set_mb(var, value) do { var = value; mb(); } while (0)
+#define __lwsync() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory")
+
+#define store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ __lwsync(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+#define load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = ACCESS_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ __lwsync(); \
+ ___p1; \
+})
+
#ifdef CONFIG_SMP
#ifdef __SUBARCH_HAS_LWSYNC
@@ -45,15 +62,12 @@
# define SMPWMB eieio
#endif
-#define __lwsync() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory")
#define smp_mb() mb()
#define smp_rmb() __lwsync()
#define smp_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory")
#define smp_read_barrier_depends() read_barrier_depends()
#else
-#define __lwsync() barrier()
-
#define smp_mb() barrier()
#define smp_rmb() barrier()
#define smp_wmb() barrier()
@@ -72,7 +86,7 @@
#define smp_store_release(p, v) \
do { \
compiletime_assert_atomic_type(*p); \
- __lwsync(); \
+ smp_rmb(); \
ACCESS_ONCE(*p) = (v); \
} while (0)
@@ -80,7 +94,7 @@ do { \
({ \
typeof(*p) ___p1 = ACCESS_ONCE(*p); \
compiletime_assert_atomic_type(*p); \
- __lwsync(); \
+ smp_rmb(); \
___p1; \
})
diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
index b5dce65..637d7a9 100644
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -35,14 +35,14 @@
#define set_mb(var, value) do { var = value; mb(); } while (0)
-#define smp_store_release(p, v) \
+#define store_release(p, v) \
do { \
compiletime_assert_atomic_type(*p); \
barrier(); \
ACCESS_ONCE(*p) = (v); \
} while (0)
-#define smp_load_acquire(p) \
+#define load_acquire(p) \
({ \
typeof(*p) ___p1 = ACCESS_ONCE(*p); \
compiletime_assert_atomic_type(*p); \
@@ -50,4 +50,7 @@ do { \
___p1; \
})
+#define smp_store_release(p, v) store_release(p, v)
+#define smp_load_acquire(p) load_acquire(p)
+
#endif /* __ASM_BARRIER_H */
diff --git a/arch/sparc/include/asm/barrier_64.h b/arch/sparc/include/asm/barrier_64.h
index 305dcc3..7de3c69 100644
--- a/arch/sparc/include/asm/barrier_64.h
+++ b/arch/sparc/include/asm/barrier_64.h
@@ -53,14 +53,14 @@ do { __asm__ __volatile__("ba,pt %%xcc, 1f\n\t" \
#define smp_read_barrier_depends() do { } while(0)
-#define smp_store_release(p, v) \
+#define store_release(p, v) \
do { \
compiletime_assert_atomic_type(*p); \
barrier(); \
ACCESS_ONCE(*p) = (v); \
} while (0)
-#define smp_load_acquire(p) \
+#define load_acquire(p) \
({ \
typeof(*p) ___p1 = ACCESS_ONCE(*p); \
compiletime_assert_atomic_type(*p); \
@@ -68,6 +68,8 @@ do { \
___p1; \
})
+#define smp_store_release(p, v) store_release(p, v)
+#define smp_load_acquire(p) load_acquire(p)
#define smp_mb__before_atomic() barrier()
#define smp_mb__after_atomic() barrier()
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 0f4460b..3d2aa18 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -103,6 +103,21 @@
* model and we should fall back to full barriers.
*/
+#define store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ mb(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+#define load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = ACCESS_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ mb(); \
+ ___p1; \
+})
+
#define smp_store_release(p, v) \
do { \
compiletime_assert_atomic_type(*p); \
@@ -120,14 +135,14 @@ do { \
#else /* regular x86 TSO memory ordering */
-#define smp_store_release(p, v) \
+#define store_release(p, v) \
do { \
compiletime_assert_atomic_type(*p); \
barrier(); \
ACCESS_ONCE(*p) = (v); \
} while (0)
-#define smp_load_acquire(p) \
+#define load_acquire(p) \
({ \
typeof(*p) ___p1 = ACCESS_ONCE(*p); \
compiletime_assert_atomic_type(*p); \
@@ -135,6 +150,9 @@ do { \
___p1; \
})
+#define smp_store_release(p, v) store_release(p, v)
+#define smp_load_acquire(p) load_acquire(p)
+
#endif
/* Atomic operations are already serializing on x86 */
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 1402fa8..c6e4b99 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -70,6 +70,21 @@
#define smp_mb__after_atomic() smp_mb()
#endif
+#define store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ mb(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+#define load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = ACCESS_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ mb(); \
+ ___p1; \
+})
+
#define smp_store_release(p, v) \
do { \
compiletime_assert_atomic_type(*p); \
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists