[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20181019115612.GT2674@linux.ibm.com>
Date: Fri, 19 Oct 2018 04:56:12 -0700
From: "Paul E. McKenney" <paulmck@...ux.ibm.com>
To: Will Deacon <will.deacon@....com>
Cc: Alexei Starovoitov <alexei.starovoitov@...il.com>,
Daniel Borkmann <daniel@...earbox.net>,
Peter Zijlstra <peterz@...radead.org>, acme@...hat.com,
yhs@...com, john.fastabend@...il.com, netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next 2/3] tools, perf: use smp_{rmb,mb} barriers
instead of {rmb,mb}
On Fri, Oct 19, 2018 at 12:02:43PM +0100, Will Deacon wrote:
> On Thu, Oct 18, 2018 at 08:53:42PM -0700, Alexei Starovoitov wrote:
> > On Thu, Oct 18, 2018 at 09:00:46PM +0200, Daniel Borkmann wrote:
> > > On 10/18/2018 05:33 PM, Alexei Starovoitov wrote:
> > > > On Thu, Oct 18, 2018 at 05:04:34PM +0200, Daniel Borkmann wrote:
> > > >> #endif /* _TOOLS_LINUX_ASM_IA64_BARRIER_H */
> > > >> diff --git a/tools/arch/powerpc/include/asm/barrier.h b/tools/arch/powerpc/include/asm/barrier.h
> > > >> index a634da0..905a2c6 100644
> > > >> --- a/tools/arch/powerpc/include/asm/barrier.h
> > > >> +++ b/tools/arch/powerpc/include/asm/barrier.h
> > > >> @@ -27,4 +27,20 @@
> > > >> #define rmb() __asm__ __volatile__ ("sync" : : : "memory")
> > > >> #define wmb() __asm__ __volatile__ ("sync" : : : "memory")
> > > >>
> > > >> +#if defined(__powerpc64__)
> > > >> +#define smp_lwsync() __asm__ __volatile__ ("lwsync" : : : "memory")
> > > >> +
> > > >> +#define smp_store_release(p, v) \
> > > >> +do { \
> > > >> + smp_lwsync(); \
> > > >> + WRITE_ONCE(*p, v); \
> > > >> +} while (0)
> > > >> +
> > > >> +#define smp_load_acquire(p) \
> > > >> +({ \
> > > >> + typeof(*p) ___p1 = READ_ONCE(*p); \
> > > >> + smp_lwsync(); \
> > > >> + ___p1; \
> > > >
> > > > I don't like this proliferation of asm.
> > > > Why do we think that we can do better job than compiler?
> > > > can we please use gcc builtins instead?
> > > > https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
> > > > __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
> > > > __atomic_store_n(ptr, val, __ATOMIC_RELEASE);
> > > > are done specifically for this use case if I'm not mistaken.
> > > > I think it pays to learn what compiler provides.
> > >
> > > But are you sure the C11 memory model matches exact same model as kernel?
> > > Seems like last time Will looked into it [0] it wasn't the case ...
> >
> > I'm only suggesting equivalence of __atomic_load_n(ptr, __ATOMIC_ACQUIRE)
> > with kernel's smp_load_acquire().
> > I've seen a bunch of user space ring buffer implementations implemented
> > with __atomic_load_n() primitives.
> > But let's ask experts who live in both worlds.
>
> One thing to be wary of is if there is an implementation choice between
> how to implement load-acquire and store-release for a given architecture.
> In these situations, it's often important that concurrent software agrees
> on the "mapping", so we'd need to be sure that (a) All userspace compilers
> that we care about have compatible mappings and (b) These mappings are
> compatible with the kernel code.
Agreed! Mixing and matching can be done, but it does require quite a
bit of care.
Thanx, Paul
Powered by blists - more mailing lists