[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20160822091541.GA28226@tardis.cn.ibm.com>
Date: Mon, 22 Aug 2016 17:15:41 +0800
From: Boqun Feng <boqun.feng@...il.com>
To: Manfred Spraul <manfred@...orfullife.com>
Cc: Davidlohr Bueso <dave@...olabs.net>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Michael Ellerman <mpe@...erman.id.au>,
Andrew Morton <akpm@...ux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Susanne Spraul <1vier1@....de>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Peter Zijlstra <peterz@...radead.org>
Subject: Re: spin_lock implicit/explicit memory barrier
On Fri, Aug 12, 2016 at 08:43:55PM +0200, Manfred Spraul wrote:
> Hi Boqun,
>
> On 08/12/2016 04:47 AM, Boqun Feng wrote:
> > > We should not be doing an smp_mb() right after a spin_lock(), makes no sense. The
> > > spinlock machinery should guarantee us the barriers in the unorthodox locking cases,
> > > such as this.
> > >
> Do we really want to go there?
> Trying to handle all unorthodox cases will end up as an endless list of
> patches, and guaranteed to be stale architectures.
>
To be honest, the only unorthodox case here is we try to use
spin_unlock_wait() to achieve some kind of ordering with a lock
critical section, like we do in sem code and do_exit(). Spinlocks are
mostly used for exclusive lock critical sections, which are what we care
about most and what we should make as fast as possible.
> > Right.
> >
> > If you have:
> >
> > 6262db7c088b ("powerpc/spinlock: Fix spin_unlock_wait()")
> >
> > you don't need smp_mb() after spin_lock() on PPC.
> >
> > And, IIUC, if you have:
> >
> > 3a5facd09da8 ("arm64: spinlock: fix spin_unlock_wait for LSE atomics")
> > d86b8da04dfa ("arm64: spinlock: serialise spin_unlock_wait against
> > concurrent lockers")
> >
> > you don't need smp_mb() after spin_lock() on ARM64.
> >
> > And, IIUC, if you have:
> >
> > 2c6100227116 ("locking/qspinlock: Fix spin_unlock_wait() some more")
> >
> > you don't need smp_mb() after spin_lock() on x86 with qspinlock.
>
> I would really prefer the other approach:
> - spin_lock() is an acquire, that's it. No further guarantees, e.g. ordering
> of writing the lock.
> - spin_unlock() is a release, that's it.
No objection to these. That's how we implement and document locks now.
> - generic smp_mb__after_before_whatever(). And architectures can override
> the helpers.
> E.g. if qspinlocks on x86 can implement the smp_mb__after_spin_lock() for
> free, then the helper can be a nop.
>
If you are going to use smp_mb__after_before_whatever() to fix the
spin_unlock_wait() problem, the helper for qspinlocks on x86 can not be
free, because qspinlocks on x86 use separate READs and WRITEs in
spin_lock(), and we need a full barrier to order the WRITEs of the lock
acquisition with READs in critical sections to let spin_unlock_wait()
work.
> Right now, we start to hardcode something into the architectures - for some
> callers.
> Other callers use solutions such as smp_mb__after_unlock_lock(), i.e. arch
> dependent workarounds in arch independent code.
>
Please note that fixes above in spin_unlock_wait() and
smp_mb__after_unlock_lock() are for two different problems,
smp_mb__after_unlock_lock() is to upgrade a lock+unlock pair into a full
barrier, which is useful to RCpc archs, e.g. PowerPC.
> And: We unnecessarily add overhead.
> Both ipc/sem and netfilter do loops over many spinlocks:
> > for (i = 0; i < CONNTRACK_LOCKS; i++) {
> > spin_unlock_wait(&nf_conntrack_locks[i]);
> > }
> One memory barrier would be sufficient, but due to embedding we end up with
> CONNTRACK_LOCKS barriers.
>
We can move the smp_mb() out of the spin_unlock_wait(), if we make sure
all the users are using the proper barriers, so this overhead is easily
fixed.
> Should I create a patch?
> (i.e. documentation and generic helpers)
>
There have been some discussions in this thread:
http://marc.info/?l=linux-arm-kernel&m=144862480822027
, which may be helpful ;-)
Regards,
Boqun
> --
> Manfred
Download attachment "signature.asc" of type "application/pgp-signature" (474 bytes)
Powered by blists - more mailing lists