[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <170fa0d20908070657x1206f4ebp5f434ed63386b073@mail.gmail.com>
Date: Fri, 7 Aug 2009 09:57:06 -0400
From: Mike Snitzer <snitzer@...il.com>
To: Neil Brown <neilb@...e.de>
Cc: "H. Peter Anvin" <hpa@...or.com>, linux-raid@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [md PATCH 8/8] md: Use revalidate_disk to effect changes in size
of device.
On Wed, Aug 5, 2009 at 11:13 PM, Neil Brown<neilb@...e.de> wrote:
> On Wednesday August 5, hpa@...or.com wrote:
>> On 08/05/2009 06:03 PM, Mike Snitzer wrote:
>> > On Wed, Aug 5, 2009 at 6:07 PM, H. Peter Anvin<hpa@...or.com> wrote:
>> >> On 08/02/2009 02:58 PM, NeilBrown wrote:
>> >>> As revalidate_disk calls check_disk_size_change, it will cause
>> >>> any capacity change of a gendisk to be propagated to the blockdev
>> >>> inode. So use that instead of mucking about with locks and
>> >>> i_size_write.
>> >>>
>> >>> Also add a call to revalidate_disk in do_md_run and a few other places
>> >>> where the gendisk capacity is changed.
>> >>>
>> >> This patch causes my Fedora 11 system with all filesystems on RAID-1 to
>> >> not boot (it hangs in early userspace, Ctrl-Alt-Del reboots the system.)
>> >
>> > I reported similar findings, with some more detail, relative to
>> > Fedora's rawhide here:
>> > http://lkml.org/lkml/2009/8/5/275
>>
>> Sounds to be the same, yes.
>
> Thanks for the reports guys.
>
> I managed to reproduce the lockup and I think this patch should fix
> it.
> If you could review/test I would appreciate it.
>
> Thanks,
> NeilBrown
>
>
> From cf90cb85596d05d9595bb8ee4cadb7d4091212b5 Mon Sep 17 00:00:00 2001
> From: NeilBrown <neilb@...e.de>
> Date: Thu, 6 Aug 2009 13:10:43 +1000
> Subject: [PATCH] Remove deadlock potential in md_open
>
> A recent commit:
> commit 449aad3e25358812c43afc60918c5ad3819488e7
>
> introduced the possibility of an A-B/B-A deadlock between
> bd_mutex and reconfig_mutex.
>
> __blkdev_get holds bd_mutex while calling md_open which takes
> reconfig_mutex,
> do_md_run is always called with reconfig_mutex held, and it now
> takes bd_mutex in the call the revalidate_disk.
>
> This potential deadlock was not caught by lockdep due to the
> use of mutex_lock_interruptible_nexted which was introduced
> by
> commit d63a5a74dee87883fda6b7d170244acaac5b05e8
> do avoid a warning of an impossible deadlock.
>
> It is quite possible to split reconfig_mutex in to two locks.
> One protects the array data structures while it is being
> reconfigured, the other ensures that an array is never even partially
> open while it is being deactivated.
>
> So create a new lock, open_mutex, just to ensure exclusion between
> 'open' and 'stop'.
>
> This avoids the deadlock and also avoid the lockdep warning mentioned
> in commit d63a5a74d
>
> Reported-by: "Mike Snitzer" <snitzer@...il.com>
> Reported-by: "H. Peter Anvin" <hpa@...or.com>
> Signed-off-by: NeilBrown <neilb@...e.de>
Neil,
I finally tested this with the LVM testsuite's
t-pvcreate-operation-md.sh script that I mentioned here:
http://lkml.org/lkml/2009/8/5/275
The test succeeds but unfortunately triggers the following lockdep warning:
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.31-0.125.rc5.git2.snitm.x86_64 #1
-------------------------------------------------------
mdadm/1348 is trying to acquire lock:
(&bdev->bd_mutex){+.+.+.}, at: [<ffffffff811809dc>]
bd_release_from_disk+0x49/0x113
but task is already holding lock:
(&new->open_mutex){+.+...}, at: [<ffffffff81418ac7>] do_md_stop+0x3f/0x529
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&new->open_mutex){+.+...}:
[<ffffffff8109ed70>] __lock_acquire+0x990/0xb29
[<ffffffff810a0298>] lock_acquire+0xd5/0x115
[<ffffffff8152e318>] __mutex_lock_common+0x5d/0x3b0
[<ffffffff8152e6b8>] mutex_lock_interruptible_nested+0x4d/0x68
[<ffffffff8141e683>] md_open+0x71/0xb3
[<ffffffff81181821>] __blkdev_get+0xe1/0x37e
[<ffffffff81181900>] __blkdev_get+0x1c0/0x37e
[<ffffffff81181ae1>] blkdev_get+0x23/0x39
[<ffffffff81181b7c>] blkdev_open+0x85/0xd1
[<ffffffff8114f909>] __dentry_open+0x1af/0x303
[<ffffffff81150da8>] nameidata_to_filp+0x55/0x7d
[<ffffffff8115ecb1>] do_filp_open+0x504/0x960
[<ffffffff8114f595>] do_sys_open+0x70/0x12e
[<ffffffff8114f6c0>] sys_open+0x33/0x49
[<ffffffff810141c2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
-> #1 (&bdev->bd_mutex/1){+.+...}:
[<ffffffff8109ed70>] __lock_acquire+0x990/0xb29
[<ffffffff810a0298>] lock_acquire+0xd5/0x115
[<ffffffff8152e318>] __mutex_lock_common+0x5d/0x3b0
[<ffffffff8152e789>] mutex_lock_nested+0x4d/0x6a
[<ffffffff811817ce>] __blkdev_get+0x8e/0x37e
[<ffffffff81181900>] __blkdev_get+0x1c0/0x37e
[<ffffffff81181ae1>] blkdev_get+0x23/0x39
[<ffffffff81181b7c>] blkdev_open+0x85/0xd1
[<ffffffff8114f909>] __dentry_open+0x1af/0x303
[<ffffffff81150da8>] nameidata_to_filp+0x55/0x7d
[<ffffffff8115ecb1>] do_filp_open+0x504/0x960
[<ffffffff8114f595>] do_sys_open+0x70/0x12e
[<ffffffff8114f6c0>] sys_open+0x33/0x49
[<ffffffff810141c2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
-> #0 (&bdev->bd_mutex){+.+.+.}:
[<ffffffff8109ec64>] __lock_acquire+0x884/0xb29
[<ffffffff810a0298>] lock_acquire+0xd5/0x115
[<ffffffff8152e318>] __mutex_lock_common+0x5d/0x3b0
[<ffffffff8152e789>] mutex_lock_nested+0x4d/0x6a
[<ffffffff811809dc>] bd_release_from_disk+0x49/0x113
[<ffffffff81415cd5>] unbind_rdev_from_array+0x5f/0x148
[<ffffffff81416f96>] kick_rdev_from_array+0x25/0x48
[<ffffffff81417011>] export_array+0x58/0xc2
[<ffffffff81418d57>] do_md_stop+0x2cf/0x529
[<ffffffff8141d7ab>] md_ioctl+0xa5e/0xffe
[<ffffffff8128141f>] __blkdev_driver_ioctl+0x4b/0xa2
[<ffffffff81281d9b>] blkdev_ioctl+0x881/0x8ee
[<ffffffff81181f54>] block_ioctl+0x50/0x68
[<ffffffff81161466>] vfs_ioctl+0x3e/0xa2
[<ffffffff81161a03>] do_vfs_ioctl+0x499/0x500
[<ffffffff81161ad5>] sys_ioctl+0x6b/0xa2
[<ffffffff810141c2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
other info that might help us debug this:
2 locks held by mdadm/1348:
#0: (&new->reconfig_mutex){+.+.+.}, at: [<ffffffff81415e82>]
mddev_lock+0x2a/0x40
#1: (&new->open_mutex){+.+...}, at: [<ffffffff81418ac7>] do_md_stop+0x3f/0x529
stack backtrace:
Pid: 1348, comm: mdadm Tainted: G W
2.6.31-0.125.rc5.git2.snitm.x86_64 #1
Call Trace:
[<ffffffff8109e3c1>] print_circular_bug_tail+0x80/0x9f
[<ffffffff8109ec64>] __lock_acquire+0x884/0xb29
[<ffffffff811809dc>] ? bd_release_from_disk+0x49/0x113
[<ffffffff81014d10>] ? restore_args+0x0/0x30
[<ffffffff811809dc>] ? bd_release_from_disk+0x49/0x113
[<ffffffff810a0298>] lock_acquire+0xd5/0x115
[<ffffffff811809dc>] ? bd_release_from_disk+0x49/0x113
[<ffffffff8152e318>] __mutex_lock_common+0x5d/0x3b0
[<ffffffff811809dc>] ? bd_release_from_disk+0x49/0x113
[<ffffffff811809dc>] ? bd_release_from_disk+0x49/0x113
[<ffffffff810537e7>] ? need_resched+0x36/0x54
[<ffffffff8152fbdf>] ? _spin_unlock_irq+0x46/0x61
[<ffffffff8152e789>] mutex_lock_nested+0x4d/0x6a
[<ffffffff811809dc>] bd_release_from_disk+0x49/0x113
[<ffffffff81415cd5>] unbind_rdev_from_array+0x5f/0x148
[<ffffffff81416f96>] kick_rdev_from_array+0x25/0x48
[<ffffffff81082dd6>] ? flush_workqueue+0x0/0xd1
[<ffffffff81417011>] export_array+0x58/0xc2
[<ffffffff81418d57>] do_md_stop+0x2cf/0x529
[<ffffffff8152e6b8>] ? mutex_lock_interruptible_nested+0x4d/0x68
[<ffffffff8141d7ab>] md_ioctl+0xa5e/0xffe
[<ffffffff8109e710>] ? __lock_acquire+0x330/0xb29
[<ffffffff8103bd8f>] ? pvclock_clocksource_read+0x56/0xa6
[<ffffffff8128141f>] __blkdev_driver_ioctl+0x4b/0xa2
[<ffffffff8109bc58>] ? lock_release_holdtime+0x61/0x7c
[<ffffffff81281d9b>] blkdev_ioctl+0x881/0x8ee
[<ffffffff81246cb1>] ? __rcu_read_unlock+0x34/0x4a
[<ffffffff812472c9>] ? avc_has_perm_noaudit+0x358/0x37d
[<ffffffff8109d37a>] ? mark_lock+0x31/0x221
[<ffffffff81247bee>] ? avc_has_perm+0x60/0x86
[<ffffffff8103bd8f>] ? pvclock_clocksource_read+0x56/0xa6
[<ffffffff812492b3>] ? inode_has_perm+0x7d/0xa0
[<ffffffff815305e6>] ? _spin_unlock_irqrestore+0x5e/0x83
[<ffffffff81181f54>] block_ioctl+0x50/0x68
[<ffffffff81161466>] vfs_ioctl+0x3e/0xa2
[<ffffffff81161a03>] do_vfs_ioctl+0x499/0x500
[<ffffffff81161ad5>] sys_ioctl+0x6b/0xa2
[<ffffffff810141c2>] system_call_fastpath+0x16/0x1b
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists