linux-kernel - Re: __sb_start_write() && force

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150819031158.GO714@dastard>
Date:	Wed, 19 Aug 2015 13:11:58 +1000
From:	Dave Chinner <david@...morbit.com>
To:	Oleg Nesterov <oleg@...hat.com>
Cc:	Jan Kara <jack@...e.cz>, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: __sb_start_write() && force_trylock hack

On Tue, Aug 18, 2015 at 04:49:00PM +0200, Oleg Nesterov wrote:
> Jan, Dave, perhaps you can take a look...
> 
> On 08/14, Oleg Nesterov wrote:
> >
> > Plus another patch which removes the "trylock"
> > hack in __sb_start_write().
> 
> I meant the patch we already discussed (attached at the end). And yes,
> previously I reported it passed the tests. However, I only ran the same
> 'grep -il freeze tests/*/???' tests. When I tried to run all tests, I
> got the new reports from lockdep.
> 
> 	[ 2098.281171]  May be due to missing lock nesting notation

<groan>

> 	[ 2098.288744] 4 locks held by fsstress/50971:
> 	[ 2098.293408]  #0:  (sb_writers#13){++++++}, at: [<ffffffff81248d32>] __sb_start_write+0x32/0x40
> 	[ 2098.303085]  #1:  (&type->i_mutex_dir_key#4/1){+.+.+.}, at: [<ffffffff8125685f>] filename_create+0x7f/0x170
> 	[ 2098.314038]  #2:  (sb_internal#2){++++++}, at: [<ffffffff81248d32>] __sb_start_write+0x32/0x40
> 	[ 2098.323711]  #3:  (&type->s_umount_key#54){++++++}, at: [<ffffffffa05a638c>] xfs_flush_inodes+0x1c/0x40 [xfs]
> 	[ 2098.334898]
> 		       stack backtrace:
> 	[ 2098.339762] CPU: 3 PID: 50971 Comm: fsstress Not tainted 4.2.0-rc6+ #27
> 	[ 2098.347143] Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.R3.27.D685.1305151734 05/15/2013
> 	[ 2098.358303]  0000000000000000 00000000e70ee864 ffff880c05a2b9c8 ffffffff817ee692
> 	[ 2098.366603]  0000000000000000 ffffffff826f8030 ffff880c05a2bab8 ffffffff810f45be
> 	[ 2098.374900]  0000000000000000 ffff880c05a2bb20 0000000000000000 0000000000000000
> 	[ 2098.383197] Call Trace:
> 	[ 2098.385930]  [<ffffffff817ee692>] dump_stack+0x45/0x57
> 	[ 2098.391667]  [<ffffffff810f45be>] __lock_acquire+0x1e9e/0x2040
> 	[ 2098.398177]  [<ffffffff810f310d>] ? __lock_acquire+0x9ed/0x2040
> 	[ 2098.404787]  [<ffffffff811d4702>] ? pagevec_lookup_entries+0x22/0x30
> 	[ 2098.411879]  [<ffffffff811d5142>] ? truncate_inode_pages_range+0x2b2/0x7e0
> 	[ 2098.419551]  [<ffffffff810f542e>] lock_acquire+0xbe/0x150
> 	[ 2098.425566]  [<ffffffff81248d32>] ? __sb_start_write+0x32/0x40
> 	[ 2098.432079]  [<ffffffff810ede91>] percpu_down_read+0x51/0xa0
> 	[ 2098.438396]  [<ffffffff81248d32>] ? __sb_start_write+0x32/0x40
> 	[ 2098.444905]  [<ffffffff81248d32>] __sb_start_write+0x32/0x40
> 	[ 2098.451244]  [<ffffffffa05a7f84>] xfs_trans_alloc+0x24/0x40 [xfs]
> 	[ 2098.458076]  [<ffffffffa059e872>] xfs_inactive_truncate+0x32/0x110 [xfs]
> 	[ 2098.465580]  [<ffffffffa059f662>] xfs_inactive+0x102/0x120 [xfs]
> 	[ 2098.472308]  [<ffffffffa05a475b>] xfs_fs_evict_inode+0x7b/0xc0 [xfs]
> 	[ 2098.479401]  [<ffffffff812642ab>] evict+0xab/0x170
> 	[ 2098.484748]  [<ffffffff81264568>] iput+0x1a8/0x230
> 	[ 2098.490100]  [<ffffffff8127701c>] sync_inodes_sb+0x14c/0x1d0
> 	[ 2098.496439]  [<ffffffffa05a6398>] xfs_flush_inodes+0x28/0x40 [xfs]
> 	[ 2098.503361]  [<ffffffffa059e213>] xfs_create+0x5c3/0x6d0 [xfs]

Another false positive.  We have a transaction handle here, but
we've failed to reserve space for it so it's not an active
transaction yet. i.e. the filesystem is operating at ENOSPC.

On an ENOSPC error, we flush out dirty data to free up reserved
delalloc space, which obviously is obviously then racing with an
unlink of an inode in another thread and so drops the final
reference to the inode, and it goes down the eviction path that runs
transactions.

But there is no deadlock here - while we have a transaction handle
allocated at xfs_create() context, it is not yet active and holds no
reservation, so it is safe for eviction here to run new transactions
as there is no active transaction in progress.

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/