linux-kernel - Re: sched: softlockups in multi_cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1425668223.2475.94.camel@j-VirtualBox>
Date:	Fri, 06 Mar 2015 10:57:03 -0800
From:	Jason Low <jason.low2@...com>
To:	Davidlohr Bueso <dave@...olabs.net>
Cc:	Ingo Molnar <mingo@...nel.org>,
	Sasha Levin <sasha.levin@...cle.com>,
	Peter Zijlstra <peterz@...radead.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Dave Jones <davej@...emonkey.org.uk>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	jason.low2@...com
Subject: Re: sched: softlockups in multi_cpu_stop

On Fri, 2015-03-06 at 09:19 -0800, Davidlohr Bueso wrote:
> On Fri, 2015-03-06 at 13:32 +0100, Ingo Molnar wrote:
> > * Sasha Levin <sasha.levin@...cle.com> wrote:
> > 
> > > I've bisected this to "locking/rwsem: Check for active lock before bailing on spinning". Relevant parties Cc'ed.
> > 
> > That would be:
> > 
> >   1a99367023f6 ("locking/rwsem: Check for active lock before bailing on spinning")
> 
> > diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> > index 1c0d11e8ce34..e4ad019e23f5 100644
> > --- a/kernel/locking/rwsem-xadd.c
> > +++ b/kernel/locking/rwsem-xadd.c
> > @@ -298,23 +298,30 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
> >  static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
> >  {
> >  	struct task_struct *owner;
> > -	bool on_cpu = false;
> > +	bool ret = true;
> >  
> >  	if (need_resched())
> >  		return false;
> >  
> >  	rcu_read_lock();
> >  	owner = ACCESS_ONCE(sem->owner);
> > -	if (owner)
> > -		on_cpu = owner->on_cpu;
> > -	rcu_read_unlock();
> > +	if (!owner) {
> > +		long count = ACCESS_ONCE(sem->count);
> > +		/*
> > +		 * If sem->owner is not set, yet we have just recently entered the
> > +		 * slowpath with the lock being active, then there is a possibility
> > +		 * reader(s) may have the lock. To be safe, bail spinning in these
> > +		 * situations.
> > +		 */
> > +		if (count & RWSEM_ACTIVE_MASK)
> > +			ret = false;
> > +		goto done;
> 
> Hmmm so the lockup would be due to this (when owner is non-nil the patch
> has no effect), telling users to spin instead of sleep -- _except_ for
> this condition. And when spinning we're always checking for need_resched
> to be safe. So even if this function was completely bogus, we'd end up
> needlessly spinning but I'm surprised about the lockup. Maybe coffee
> will make things clearer.

Right, the can_spin_on_owner() was originally added to the mutex
spinning code for optimization purposes, particularly so that we can
avoid adding the spinner to the OSQ only to find that it doesn't need to
spin. This function needing to return a correct value should really only
affect performance, so yes, lockups due to this seems surprising.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/