lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180409171959.GB9661@arm.com>
Date:   Mon, 9 Apr 2018 18:19:59 +0100
From:   Will Deacon <will.deacon@....com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Waiman Long <longman@...hat.com>, linux-kernel@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org, mingo@...nel.org,
        boqun.feng@...il.com, paulmck@...ux.vnet.ibm.com,
        catalin.marinas@....com
Subject: Re: [PATCH 02/10] locking/qspinlock: Remove unbounded cmpxchg loop
 from locking slowpath

On Mon, Apr 09, 2018 at 05:54:20PM +0200, Peter Zijlstra wrote:
> On Mon, Apr 09, 2018 at 03:54:09PM +0100, Will Deacon wrote:
> > @@ -289,18 +315,26 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
> >  		return;
> >  
> >  	/*
> > -	 * If we observe any contention; queue.
> > +	 * If we observe queueing, then queue ourselves.
> >  	 */
> > -	if (val & ~_Q_LOCKED_MASK)
> > +	if (val & _Q_TAIL_MASK)
> >  		goto queue;
> >  
> >  	/*
> > +	 * We didn't see any queueing, so have one more try at snatching
> > +	 * the lock in case it became available whilst we were taking the
> > +	 * slow path.
> > +	 */
> > +	if (queued_spin_trylock(lock))
> > +		return;
> > +
> > +	/*
> >  	 * trylock || pending
> >  	 *
> >  	 * 0,0,0 -> 0,0,1 ; trylock
> >  	 * 0,0,1 -> 0,1,1 ; pending
> >  	 */
> > +	val = set_pending_fetch_acquire(lock);
> >  	if (!(val & ~_Q_LOCKED_MASK)) {
> 
> So, if I remember that partial paper correctly, the atomc_read_acquire()
> can see 'arbitrary' old values for everything except the pending byte,
> which it just wrote and will fwd into our load, right?
> 
> But I think coherence requires the read to not be older than the one
> observed by the trylock before (since it uses c-cas its acquire can be
> elided).
> 
> I think this means we can miss a concurrent unlock vs the fetch_or. And
> I think that's fine, if we still see the lock set we'll needlessly 'wait'
> for it go become unlocked.

Ah, but there is a related case that doesn't work. If the lock becomes
free just before we set pending, then another CPU can succeed on the
fastpath. We'll then set pending, but the lockword we get back may still
have the locked byte of 0, so two people end up holding the lock.

I think it's worth giving this a go with the added trylock, but I can't
see a way to avoid the atomic_fetch_or at the moment.

Will

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ