linux-kernel - Re: [PATCH 0/4] change sb_writers to use percpu_rw

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20150811132905.GB2659@quack.suse.cz>
Date:	Tue, 11 Aug 2015 15:29:05 +0200
From:	Jan Kara <jack@...e.cz>
To:	Oleg Nesterov <oleg@...hat.com>
Cc:	Dave Chinner <david@...morbit.com>, Jan Kara <jack@...e.cz>,
	Al Viro <viro@...iv.linux.org.uk>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Peter Zijlstra <peterz@...radead.org>,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/4] change sb_writers to use percpu_rw_semaphore

On Tue 11-08-15 15:16:26, Oleg Nesterov wrote:
> On 08/11, Dave Chinner wrote:
> >
> > On Mon, Aug 10, 2015 at 04:59:42PM +0200, Jan Kara wrote:
> > >
> > > One would like to construct the lock chain as:
> > >
> > > CPU0 (chown foo dir)	CPU1 (readdir dir)	CPU2 (page fault)
> > > process Y		process X, thread 0	process X, thread 1
> > >
> > > 			get ILOCK for dir
> > > gets freeze protection
> > > starts transaction in xfs_setattr_nonsize
> > > waits to get ILOCK on 'dir'
> > > 						get mmap_sem for X
> > > 			wait for mmap_sem for process X
> > > 			  in filldir()
> > > 						wait for freeze protection in
> > > 						  xfs_page_mkwrite
> > >
> > > and CPU3 then being in freeze_super() blocking CPU2 and waiting for CPU0 to
> > > finish it's freeze-protected section. But this cannot happen. The reason is
> > > that we block writers level-by-level and thus while there are writers at
> > > level X, we do not block writers at level X+1. So in this particular case
> > > freeze_super() will block waiting for CPU0 to finish its freeze protected
> > > section while CPU2 is free to continue.
> > >
> > > In general we have a chain like
> > >
> > > freeze L0 -> freeze L1 -> freeze L2 -> ILOCK -> mmap_sem --\
> > >                 A                                          |
> > >                 \------------------------------------------/
> > >
> > > But since ILOCK is always acquired with freeze protection at L0 and we can
> > > block at L1 only after there are no writers at L0, this loop can never
> > > happen.
> > >
> > > Note that if we use the property of freezing that lock at level X+1 cannot
> > > block when we hold lock at level X, we can as well simplify the dependency
> > > graph and track in it only the lowest level of freeze lock that is
> > > currently acquired (since the levels above it cannot block and do not in
> > > any way influence blocking of other processes either and thus are
> > > irrelevant for the purpose of deadlock detection). Then the dependency
> > > graph we'd get would be:
> > >
> > > freeze L0 -> ILOCK -> mmap_sem -> freeze L1
> > >
> > > and we have a nice acyclic graph we like to see... So probably we have to
> > > hack the lockdep instrumentation some more and just don't tell lockdep
> > > about freeze locks at higher levels if we already hold a lock at lower
> > > level. Thoughts?
> >
> > The XFS directory ilock->filldir->might_fault locking path has been
> > generating false positives in quite a lot of places because of
> > things we do on one side of the mmap_sem in filesystem paths vs
> > thigs we do on the other side of the mmap_sem in the page fault
> > path.
> 
> OK. Dave, Jan, thanks a lot.
> 
> I was also confused because I didn't know that "Chain exists of" part
> of print_circular_bug() only prints the _partial_ chain, and I have
> to admit that I do not even understand which part it actually shows...
> 
> I'll drop
> 
> 	move rwsem_release() from sb_wait_write() to freeze_super()
> 	change thaw_super() to re-acquire s_writers.lock_map
> 
> from the previous series and resend everything. Lets change sb_writers to
> use percpu_rw_semaphore first, then try to improve the lockdep annotations.

Yeah, that sounds like a good plan.

> See the interdiff below. With this change I have
> 
> 	TEST_DEV=/dev/loop0 TEST_DIR=TEST SCRATCH_DEV=/dev/loop1 SCRATCH_MNT=SCRATCH \
> 	./check `grep -il freeze tests/*/???`
> 
> 	...
> 
> 	Ran: generic/068 generic/085 generic/280 generic/311 xfs/011 xfs/119 xfs/297
> 	Passed all 7 tests
> 
> anything else I should test?

The diff looks good and if these tests pass without a warning then we can
be reasonably confident things are fine.

> this needs a comment in sb_wait_write() to explain that this is not what
> we want.

Yup, would be nice.

								Honza

> 
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1215,27 +1215,15 @@ EXPORT_SYMBOL(__sb_start_write);
>  static void sb_wait_write(struct super_block *sb, int level)
>  {
>  	percpu_down_write(sb->s_writers.rw_sem + level-1);
> +	percpu_rwsem_release(sb->s_writers.rw_sem + level-1, 0, _THIS_IP_);
>  }
>  
> -static void sb_freeze_release(struct super_block *sb)
> -{
> -	int level;
> -	/* Avoid the warning from lockdep_sys_exit() */
> -	for (level = 0; level < SB_FREEZE_LEVELS; ++level)
> -		percpu_rwsem_release(sb->s_writers.rw_sem + level, 0, _THIS_IP_);
> -}
> -
> -static void sb_freeze_acquire(struct super_block *sb)
> +static void sb_freeze_unlock(struct super_block *sb)
>  {
>  	int level;
>  
>  	for (level = 0; level < SB_FREEZE_LEVELS; ++level)
>  		percpu_rwsem_acquire(sb->s_writers.rw_sem + level, 0, _THIS_IP_);
> -}
> -
> -static void sb_freeze_unlock(struct super_block *sb)
> -{
> -	int level;
>  
>  	for (level = SB_FREEZE_LEVELS; --level >= 0; )
>  		percpu_up_write(sb->s_writers.rw_sem + level);
> @@ -1331,7 +1319,6 @@ int freeze_super(struct super_block *sb)
>  	 * sees write activity when frozen is set to SB_FREEZE_COMPLETE.
>  	 */
>  	sb->s_writers.frozen = SB_FREEZE_COMPLETE;
> -	sb_freeze_release(sb);
>  	up_write(&sb->s_umount);
>  	return 0;
>  }
> @@ -1358,14 +1345,11 @@ int thaw_super(struct super_block *sb)
>  		goto out;
>  	}
>  
> -	sb_freeze_acquire(sb);
> -
>  	if (sb->s_op->unfreeze_fs) {
>  		error = sb->s_op->unfreeze_fs(sb);
>  		if (error) {
>  			printk(KERN_ERR
>  				"VFS:Filesystem thaw failed\n");
> -			sb_freeze_release(sb);
>  			up_write(&sb->s_umount);
>  			return error;
>  		}
> 
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/