lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20160822091541.GA28226@tardis.cn.ibm.com>
Date:   Mon, 22 Aug 2016 17:15:41 +0800
From:   Boqun Feng <boqun.feng@...il.com>
To:     Manfred Spraul <manfred@...orfullife.com>
Cc:     Davidlohr Bueso <dave@...olabs.net>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Michael Ellerman <mpe@...erman.id.au>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Susanne Spraul <1vier1@....de>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: spin_lock implicit/explicit memory barrier

On Fri, Aug 12, 2016 at 08:43:55PM +0200, Manfred Spraul wrote:
> Hi Boqun,
> 
> On 08/12/2016 04:47 AM, Boqun Feng wrote:
> > > We should not be doing an smp_mb() right after a spin_lock(), makes no sense. The
> > > spinlock machinery should guarantee us the barriers in the unorthodox locking cases,
> > > such as this.
> > > 
> Do we really want to go there?
> Trying to handle all unorthodox cases will end up as an endless list of
> patches, and guaranteed to be stale architectures.
> 

To be honest, the only unorthodox case here is we try to use
spin_unlock_wait() to achieve some kind of ordering with a lock
critical section, like we do in sem code and do_exit(). Spinlocks are
mostly used for exclusive lock critical sections, which are what we care
about most and what we should make as fast as possible.

> > Right.
> > 
> > If you have:
> > 
> > 6262db7c088b ("powerpc/spinlock: Fix spin_unlock_wait()")
> > 
> > you don't need smp_mb() after spin_lock() on PPC.
> > 
> > And, IIUC, if you have:
> > 
> > 3a5facd09da8 ("arm64: spinlock: fix spin_unlock_wait for LSE atomics")
> > d86b8da04dfa ("arm64: spinlock: serialise spin_unlock_wait against
> > concurrent lockers")
> > 
> > you don't need smp_mb() after spin_lock() on ARM64.
> > 
> > And, IIUC, if you have:
> > 
> > 2c6100227116 ("locking/qspinlock: Fix spin_unlock_wait() some more")
> > 
> > you don't need smp_mb() after spin_lock() on x86 with qspinlock.
> 
> I would really prefer the other approach:
> - spin_lock() is an acquire, that's it. No further guarantees, e.g. ordering
> of writing the lock.
> - spin_unlock() is a release, that's it.

No objection to these. That's how we implement and document locks now.

> - generic smp_mb__after_before_whatever(). And architectures can override
> the helpers.
> E.g. if qspinlocks on x86 can implement the smp_mb__after_spin_lock() for
> free, then the helper can be a nop.
> 

If you are going to use smp_mb__after_before_whatever() to fix the
spin_unlock_wait() problem, the helper for qspinlocks on x86 can not be
free, because qspinlocks on x86 use separate READs and WRITEs in
spin_lock(), and we need a full barrier to order the WRITEs of the lock
acquisition with READs in critical sections to let spin_unlock_wait()
work.

> Right now, we start to hardcode something into the architectures - for some
> callers.
> Other callers use solutions such as smp_mb__after_unlock_lock(), i.e. arch
> dependent workarounds in arch independent code.
> 

Please note that fixes above in spin_unlock_wait() and
smp_mb__after_unlock_lock() are for two different problems,
smp_mb__after_unlock_lock() is to upgrade a lock+unlock pair into a full
barrier, which is useful to RCpc archs, e.g. PowerPC.

> And: We unnecessarily add overhead.
> Both ipc/sem and netfilter do loops over many spinlocks:
> >        for (i = 0; i < CONNTRACK_LOCKS; i++) {
> >                 spin_unlock_wait(&nf_conntrack_locks[i]);
> >         }
> One memory barrier would be sufficient, but due to embedding we end up with
> CONNTRACK_LOCKS barriers.
> 

We can move the smp_mb() out of the spin_unlock_wait(), if we make sure
all the users are using the proper barriers, so this overhead is easily
fixed.

> Should I create a patch?
> (i.e. documentation and generic helpers)
> 

There have been some discussions in this thread:

http://marc.info/?l=linux-arm-kernel&m=144862480822027

, which may be helpful ;-)

Regards,
Boqun

> --
>     Manfred

Download attachment "signature.asc" of type "application/pgp-signature" (474 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ