linux-kernel - Re: [locks] 6d390e4b5d: will-it-scale.per_process

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c4ef31a663fbf7a3de349696e9f00f2f5c4ec89a.camel@kernel.org>
Date:   Tue, 10 Mar 2020 17:14:11 -0400
From:   Jeff Layton <jlayton@...nel.org>
To:     NeilBrown <neilb@...e.de>, yangerkun <yangerkun@...wei.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     kernel test robot <rong.a.chen@...el.com>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
        Bruce Fields <bfields@...ldses.org>,
        Al Viro <viro@...iv.linux.org.uk>
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6%
 regression

On Wed, 2020-03-11 at 08:01 +1100, NeilBrown wrote:
> On Tue, Mar 10 2020, Jeff Layton wrote:
> 
> > On Tue, 2020-03-10 at 08:52 -0400, Jeff Layton wrote:
> > 
> > [snip]
> > 
> > > On Tue, 2020-03-10 at 11:24 +0800, yangerkun wrote:
> > > > Something others. I think there is no need to call locks_delete_block 
> > > > for all case in function like flock_lock_inode_wait. What we should do 
> > > > as the patch '16306a61d3b7 ("fs/locks: always delete_block after 
> > > > waiting.")' describes is that we need call locks_delete_block not only 
> > > > for error equal to -ERESTARTSYS(please point out if I am wrong). And 
> > > > this patch may fix the regression too since simple lock that success or 
> > > > unlock will not try to acquire blocked_lock_lock.
> > > > 
> > > > 
> > > 
> > > Nice! This looks like it would work too, and it's a simpler fix.
> > > 
> > > I'd be inclined to add a WARN_ON_ONCE(fl->fl_blocker) after the if
> > > statements to make sure we never exit with one still queued. Also, I
> > > think we can do a similar optimization in __break_lease.
> > > 
> > > There are some other callers of locks_delete_block:
> > > 
> > > cifs_posix_lock_set: already only calls it in these cases
> > > 
> > > nlmsvc_unlink_block: I think we need to call this in most cases, and
> > > they're not going to be high-performance codepaths in general
> > > 
> > > nfsd4 callback handling: Several calls here, most need to always be
> > > called. find_blocked_lock could be reworked to take the
> > > blocked_lock_lock only once (I'll do that in a separate patch).
> > > 
> > > How about something like this (
> > > 
> > > ----------------------8<---------------------
> > > 
> > > From: yangerkun <yangerkun@...wei.com>
> > > 
> > > [PATCH] filelock: fix regression in unlock performance
> > > 
> > > '6d390e4b5d48 ("locks: fix a potential use-after-free problem when
> > > wakeup a waiter")' introduces a regression since we will acquire
> > > blocked_lock_lock every time locks_delete_block is called.
> > > 
> > > In many cases we can just avoid calling locks_delete_block at all,
> > > when we know that the wait was awoken by the condition becoming true.
> > > Change several callers of locks_delete_block to only call it when
> > > waking up due to signal or other error condition.
> > > 
> > > [ jlayton: add similar optimization to __break_lease, reword changelog,
> > > 	   add WARN_ON_ONCE calls ]
> > > 
> > > Fixes: 16306a61d3b7 ("fs/locks: always delete_block after waiting.")
> > > Fixes: 6d390e4b5d48 ("locks: fix a potential use-after-free problem when wakeup a waiter")
> > > Signed-off-by: yangerkun <yangerkun@...wei.com>
> > > Signed-off-by: Jeff Layton <jlayton@...nel.org>
> > > ---
> > >  fs/locks.c | 29 ++++++++++++++++++++++-------
> > >  1 file changed, 22 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/fs/locks.c b/fs/locks.c
> > > index 426b55d333d5..b88a5b11c464 100644
> > > --- a/fs/locks.c
> > > +++ b/fs/locks.c
> > > @@ -1354,7 +1354,10 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl)
> > >  		if (error)
> > >  			break;
> > >  	}
> > > -	locks_delete_block(fl);
> > > +	if (error)
> > > +		locks_delete_block(fl);
> > > +	WARN_ON_ONCE(fl->fl_blocker);
> > > +
> > >  	return error;
> > >  }
> > >  
> > > @@ -1447,7 +1450,9 @@ int locks_mandatory_area(struct inode *inode, struct file *filp, loff_t start,
> > >  
> > >  		break;
> > >  	}
> > > -	locks_delete_block(&fl);
> > > +	if (error)
> > > +		locks_delete_block(&fl);
> > > +	WARN_ON_ONCE(fl.fl_blocker);
> > >  
> > >  	return error;
> > >  }
> > > @@ -1638,23 +1643,28 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
> > >  
> > >  	locks_dispose_list(&dispose);
> > >  	error = wait_event_interruptible_timeout(new_fl->fl_wait,
> > > -						!new_fl->fl_blocker, break_time);
> > > +						 !new_fl->fl_blocker,
> > > +						 break_time);
> > >  
> > >  	percpu_down_read(&file_rwsem);
> > >  	spin_lock(&ctx->flc_lock);
> > >  	trace_break_lease_unblock(inode, new_fl);
> > > -	locks_delete_block(new_fl);
> > >  	if (error >= 0) {
> > >  		/*
> > >  		 * Wait for the next conflicting lease that has not been
> > >  		 * broken yet
> > >  		 */
> > > -		if (error == 0)
> > > +		if (error == 0) {
> > > +			locks_delete_block(new_fl);
> > >  			time_out_leases(inode, &dispose);
> > > +		}
> > >  		if (any_leases_conflict(inode, new_fl))
> > >  			goto restart;
> > >  		error = 0;
> > > +	} else {
> > > +		locks_delete_block(new_fl);
> > >  	}
> > > +	WARN_ON_ONCE(fl->fl_blocker);
> > >  out:
> > >  	spin_unlock(&ctx->flc_lock);
> > >  	percpu_up_read(&file_rwsem);
> > > @@ -2126,7 +2136,10 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl)
> > >  		if (error)
> > >  			break;
> > >  	}
> > > -	locks_delete_block(fl);
> > > +	if (error)
> > > +		locks_delete_block(fl);
> > > +	WARN_ON_ONCE(fl->fl_blocker);
> > > +
> > >  	return error;
> > >  }
> > >  
> > > @@ -2403,7 +2416,9 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
> > >  		if (error)
> > >  			break;
> > >  	}
> > > -	locks_delete_block(fl);
> > > +	if (error)
> > > +		locks_delete_block(fl);
> > > +	WARN_ON_ONCE(fl->fl_blocker);
> > >  
> > >  	return error;
> > >  }
> > 
> > I've gone ahead and added the above patch to linux-next. Linus, Neil,
> > are you ok with this one? I think this is probably the simplest
> > approach.
> 
> I think this patch contains an assumption which is not justified.  It
> assumes that if a wait_event completes without error, then the wake_up()
> must have happened.  I don't think that is correct.
> 
> In the patch that caused the recent regression, the race described
> involved a signal arriving just as __locks_wake_up_blocks() was being
> called on another thread.
> So the waiting process was woken by a signal *after* ->fl_blocker was set
> to NULL, and *before* the wake_up().  If wait_event_interruptible()
> finds that the condition is true, it will report success whether there
> was a signal or not.
> 
> If you skip the locks_delete_block() after a wait, you get exactly the
> same race as the optimization - which only skipped most of
> locks_delete_block().
> 
> I have a better solution.  I did like your patch except that it changed
> too much code.  So I revised it to change less code.  See below.
> 
> NeilBrown
> 
> From: NeilBrown <neilb@...e.de>
> Date: Wed, 11 Mar 2020 07:39:04 +1100
> Subject: [PATCH] locks: restore locks_delete_lock optimization
> 
> A recent patch (see Fixes: below) removed an optimization which is
> important as it avoids taking a lock in a common case.
> 
> The comment justifying the optimisation was correct as far as it went,
> in that if the tests succeeded, then the values would remain stable and
> the test result will remain valid even without a lock.
> 
> However after the test succeeds the lock can be freed while some other
> thread might have only just set ->blocker to NULL (thus allowing the
> test to succeed) but has not yet called wake_up() on the wq in the lock.
> If the wake_up happens after the lock is freed, a use-after-free error
> occurs.
> 
> This patch restores the optimization and reorders code to avoid the
> use-after-free.  Specifically we move the list_del_init on
> fl_blocked_member to *after* the wake_up(), and add an extra test on
> fl_block_member() to locks_delete_lock() before deciding to avoid taking
> the spinlock.
> 
> As this involves breaking code out of __locks_delete_block(), we discard
> the function completely and open-code it in the two places it was
> called.
> 
> These lockless accesses do not require any memory barriers.  The failure
> mode from possible memory access reordering is that the test at the top
> of locks_delete_lock() will fail, and in that case we fall through into
> the locked region which provides sufficient memory barriers implicitly.
> 
> Fixes: 6d390e4b5d48 ("locks: fix a potential use-after-free problem when wakeup a waiter")
> Signed-off-by: NeilBrown <neilb@...e.de>
> ---
>  fs/locks.c | 42 ++++++++++++++++++++++++++++--------------
>  1 file changed, 28 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index 426b55d333d5..dc99ab2262ea 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -716,18 +716,6 @@ static void locks_delete_global_blocked(struct file_lock *waiter)
>  	hash_del(&waiter->fl_link);
>  }
>  
> -/* Remove waiter from blocker's block list.
> - * When blocker ends up pointing to itself then the list is empty.
> - *
> - * Must be called with blocked_lock_lock held.
> - */
> -static void __locks_delete_block(struct file_lock *waiter)
> -{
> -	locks_delete_global_blocked(waiter);
> -	list_del_init(&waiter->fl_blocked_member);
> -	waiter->fl_blocker = NULL;
> -}
> -
>  static void __locks_wake_up_blocks(struct file_lock *blocker)
>  {
>  	while (!list_empty(&blocker->fl_blocked_requests)) {
> @@ -735,11 +723,13 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
>  
>  		waiter = list_first_entry(&blocker->fl_blocked_requests,
>  					  struct file_lock, fl_blocked_member);
> -		__locks_delete_block(waiter);
> +		locks_delete_global_blocked(waiter);
> +		waiter->fl_blocker = NULL;
>  		if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
>  			waiter->fl_lmops->lm_notify(waiter);
>  		else
>  			wake_up(&waiter->fl_wait);
> +		list_del_init(&waiter->fl_blocked_member);

Are you sure you don't need a memory barrier here? Could the
list_del_init be hoisted just above the if condition?

>  	}
>  }
>  
> @@ -753,11 +743,35 @@ int locks_delete_block(struct file_lock *waiter)
>  {
>  	int status = -ENOENT;
>  
> +	/*
> +	 * If fl_blocker is NULL, it won't be set again as this thread
> +	 * "owns" the lock and is the only one that might try to claim
> +	 * the lock.  So it is safe to test fl_blocker locklessly.
> +	 * Also if fl_blocker is NULL, this waiter is not listed on
> +	 * fl_blocked_requests for some lock, so no other request can
> +	 * be added to the list of fl_blocked_requests for this
> +	 * request.  So if fl_blocker is NULL, it is safe to
> +	 * locklessly check if fl_blocked_requests is empty.  If both
> +	 * of these checks succeed, there is no need to take the lock.
> +	 * We also check fl_blocked_member is empty.  This is logically
> +	 * redundant with the test of fl_blocker, but it ensure that
> +	 * __locks_wake_up_blocks() has finished the wakeup and will not
> +	 * access the lock again, so it is safe to return and free.
> +	 * There is no need for any memory barriers with these lockless
> +	 * tests as is the reads happen before the corresponding writes are
> +	 * seen, we fall through to the locked code.
> +	 */
> +	if (waiter->fl_blocker == NULL &&
> +	    list_empty(&waiter->fl_blocked_member) &&
> +	    list_empty(&waiter->fl_blocked_requests))
> +		return status;
>  	spin_lock(&blocked_lock_lock);
>  	if (waiter->fl_blocker)
>  		status = 0;
>  	__locks_wake_up_blocks(waiter);
> -	__locks_delete_block(waiter);
> +	locks_delete_global_blocked(waiter);
> +	list_del_init(&waiter->fl_blocked_member);
> +	waiter->fl_blocker = NULL;
>  	spin_unlock(&blocked_lock_lock);
>  	return status;
>  }

-- 
Jeff Layton <jlayton@...nel.org>