linux-kernel - Re: [PATCH 1/2] locking/mutex: Don't hog RCU read lock while optimistically spinning

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20200914003624.GA3944@sultan-book.localdomain>
Date:   Sun, 13 Sep 2020 17:36:24 -0700
From:   Sultan Alsawaf <sultan@...neltoast.com>
To:     Will Deacon <will@...nel.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] locking/mutex: Don't hog RCU read lock while
 optimistically spinning

On Mon, Sep 07, 2020 at 05:20:31PM +0100, Will Deacon wrote:
> On Fri, Aug 07, 2020 at 12:16:35PM -0700, Sultan Alsawaf wrote:
> > From: Sultan Alsawaf <sultan@...neltoast.com>
> > 
> > There's no reason to hold an RCU read lock the entire time while
> > optimistically spinning for a mutex lock. This can needlessly lengthen
> > RCU grace periods and slow down synchronize_rcu() when it doesn't brute
> > force the RCU grace period via rcupdate.rcu_expedited=1.
> 
> Would be good to demonstrate this with numbers if you can.

I could simulate the worst possible case, which would stall synchronize_rcu() by
one jiffy, which could be 10ms with CONFIG_HZ=100. The way that would happen is
when the mutex owner does a lot of non-sleeping work while the lock is held, and
while another CPU tries to acquire the lock.

This is a dummy example of the scenario I have in mind:
      CPU0                         CPU1
----------------------------------------------
mutex_lock(locky)
mdelay(100)                  mutex_lock(locky)
mutex_unlock(locky)

In this case, CPU1 could spin in mutex_lock() for up to a jiffy (until CPU0
releases locky, which won't happen for 100ms, or until CPU1's task needs to
reschedule). While the spinning occurs, the RCU read lock will be held the whole
time, and then synchronize_rcu() will be stalled.

One could argue that most mutex-locked critical sections probably wouldn't spend
so long working on something without scheduling (at least, not intentionally),
but on slower SMP CPUs I suspect that this is common.

> > Signed-off-by: Sultan Alsawaf <sultan@...neltoast.com>
> > ---
> >  kernel/locking/mutex.c | 25 +++++++++++++++++--------
> >  1 file changed, 17 insertions(+), 8 deletions(-)
> > 
> > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> > index 5352ce50a97e..cc5676712458 100644
> > --- a/kernel/locking/mutex.c
> > +++ b/kernel/locking/mutex.c
> > @@ -552,21 +552,31 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner,
> >  {
> >  	bool ret = true;
> >  
> > -	rcu_read_lock();
> > -	while (__mutex_owner(lock) == owner) {
> > +	for (;;) {
> > +		unsigned int cpu;
> > +		bool same_owner;
> > +
> >  		/*
> > -		 * Ensure we emit the owner->on_cpu, dereference _after_
> > -		 * checking lock->owner still matches owner. If that fails,
> > +		 * Ensure lock->owner still matches owner. If that fails,
> >  		 * owner might point to freed memory. If it still matches,
> >  		 * the rcu_read_lock() ensures the memory stays valid.
> >  		 */
> > -		barrier();
> > +		rcu_read_lock();
> > +		same_owner = __mutex_owner(lock) == owner;
> > +		if (same_owner) {
> > +			ret = owner->on_cpu;
> > +			if (ret)
> > +				cpu = task_cpu(owner);
> > +		}
> > +		rcu_read_unlock();
> 
> Are you sure this doesn't break the ww mutex spinning? That thing also goes
> and looks at the owner, but now it's called outside of the read-side
> critical section.

Yes, it's safe because it's not dereferencing the owner pointer.

Sultan