linux-kernel - Re: [PATCH v3 06/13] locks: protect most of the file_lock handling with i

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20130618134346.79edf0a7@tlielax.poochiereds.net>
Date:	Tue, 18 Jun 2013 13:43:46 -0400
From:	Jeff Layton <jlayton@...hat.com>
To:	Jeff Layton <jlayton@...hat.com>
Cc:	viro@...iv.linux.org.uk, matthew@....cx, bfields@...ldses.org,
	dhowells@...hat.com, sage@...tank.com, smfrench@...il.com,
	swhiteho@...hat.com, Trond.Myklebust@...app.com,
	akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
	linux-afs@...ts.infradead.org, ceph-devel@...r.kernel.org,
	linux-cifs@...r.kernel.org, samba-technical@...ts.samba.org,
	cluster-devel@...hat.com, linux-nfs@...r.kernel.org,
	linux-fsdevel@...r.kernel.org, piastryyy@...il.com
Subject: Re: [PATCH v3 06/13] locks: protect most of the file_lock handling
 with i_lock

On Mon, 17 Jun 2013 11:49:38 -0400
Jeff Layton <jlayton@...hat.com> wrote:

> On Mon, 17 Jun 2013 11:46:09 -0400
> Jeff Layton <jlayton@...hat.com> wrote:
> 
> > On Mon, 17 Jun 2013 11:13:49 -0400
> > Jeff Layton <jlayton@...hat.com> wrote:
> > 
> > > Having a global lock that protects all of this code is a clear
> > > scalability problem. Instead of doing that, move most of the code to be
> > > protected by the i_lock instead. The exceptions are the global lists
> > > that the ->fl_link sits on, and the ->fl_block list.
> > > 
> > > ->fl_link is what connects these structures to the
> > > global lists, so we must ensure that we hold those locks when iterating
> > > over or updating these lists.
> > > 
> > > Furthermore, sound deadlock detection requires that we hold the
> > > blocked_list state steady while checking for loops. We also must ensure
> > > that the search and update to the list are atomic.
> > > 
> > > For the checking and insertion side of the blocked_list, push the
> > > acquisition of the global lock into __posix_lock_file and ensure that
> > > checking and update of the  blocked_list is done without dropping the
> > > lock in between.
> > > 
> > > On the removal side, when waking up blocked lock waiters, take the
> > > global lock before walking the blocked list and dequeue the waiters from
> > > the global list prior to removal from the fl_block list.
> > > 
> > > With this, deadlock detection should be race free while we minimize
> > > excessive file_lock_lock thrashing.
> > > 
> > > Finally, in order to avoid a lock inversion problem when handling
> > > /proc/locks output we must ensure that manipulations of the fl_block
> > > list are also protected by the file_lock_lock.
> > > 
> > > Signed-off-by: Jeff Layton <jlayton@...hat.com>
> > > ---
> > >  Documentation/filesystems/Locking |   21 ++++--
> > >  fs/afs/flock.c                    |    5 +-
> > >  fs/ceph/locks.c                   |    2 +-
> > >  fs/ceph/mds_client.c              |    8 +-
> > >  fs/cifs/cifsfs.c                  |    2 +-
> > >  fs/cifs/file.c                    |   13 ++--
> > >  fs/gfs2/file.c                    |    2 +-
> > >  fs/lockd/svcsubs.c                |   12 ++--
> > >  fs/locks.c                        |  151 ++++++++++++++++++++++---------------
> > >  fs/nfs/delegation.c               |   10 +-
> > >  fs/nfs/nfs4state.c                |    8 +-
> > >  fs/nfsd/nfs4state.c               |    8 +-
> > >  include/linux/fs.h                |   11 ---
> > >  13 files changed, 140 insertions(+), 113 deletions(-)
> > > 
> > 
> > [...]
> > 
> > > @@ -1231,7 +1254,7 @@ int __break_lease(struct inode *inode, unsigned int mode)
> > >  	if (IS_ERR(new_fl))
> > >  		return PTR_ERR(new_fl);
> > >  
> > > -	lock_flocks();
> > > +	spin_lock(&inode->i_lock);
> > >  
> > >  	time_out_leases(inode);
> > >  
> > > @@ -1281,11 +1304,11 @@ restart:
> > >  			break_time++;
> > >  	}
> > >  	locks_insert_block(flock, new_fl);
> > > -	unlock_flocks();
> > > +	spin_unlock(&inode->i_lock);
> > >  	error = wait_event_interruptible_timeout(new_fl->fl_wait,
> > >  						!new_fl->fl_next, break_time);
> > > -	lock_flocks();
> > > -	__locks_delete_block(new_fl);
> > > +	spin_lock(&inode->i_lock);
> > > +	locks_delete_block(new_fl);
> > 
> > Doh -- bug here. This should not have been changed to
> > locks_delete_block(). My apologies.
> > 
> > >  	if (error >= 0) {
> > >  		if (error == 0)
> > >  			time_out_leases(inode);
> > 
> > [...]
> > 
> > >  posix_unblock_lock(struct file *filp, struct file_lock *waiter)
> > >  {
> > > +	struct inode *inode = file_inode(filp);
> > >  	int status = 0;
> > >  
> > > -	lock_flocks();
> > > +	spin_lock(&inode->i_lock);
> > >  	if (waiter->fl_next)
> > > -		__locks_delete_block(waiter);
> > > +		locks_delete_block(waiter);
> > 
> > 
> > Ditto here...
> > 
> > >  	else
> > >  		status = -ENOENT;
> > > -	unlock_flocks();
> > > +	spin_unlock(&inode->i_lock);
> > >  	return status;
> > >  }
> > >  
> > 
> 
> Bah, scratch that -- this patch is actually fine. We hold the i_lock
> here and locks_delete_block takes the file_lock_lock, which is correct.
> There is a potential race in patch 7 though. I'll reply to that patch
> to point it out in a minute.
> 

...and scratch that again...

There is a bug in the posix_unblock_lock delta above. We don't need to
hold the i_lock there, but we should probably be taking the
file_lock_lock before checking fl_next.

In truth, I have a hard time seeing how that race would manifest itself
in practice, but that's how the existing code works so it's probably
best to preserve that as closely as possible.

Fixed in my git repo and the next spin of this set will have it.

Sorry again for the noise... ;)

-- 
Jeff Layton <jlayton@...hat.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/