[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <x49ipvhslcd.fsf@segfault.boston.devel.redhat.com>
Date: Thu, 17 Mar 2011 13:33:38 -0400
From: Jeff Moyer <jmoyer@...hat.com>
To: NeilBrown <neilb@...e.de>
Cc: James Bottomley <James.Bottomley@...e.de>,
device-mapper development <dm-devel@...hat.com>,
Jens Axboe <axboe@...nel.dk>, linux-raid@...r.kernel.org,
linux-kernel@...r.kernel.org,
Christoph Hellwig <hch@...radead.org>,
linux-fsdevel@...r.kernel.org
Subject: Re: [dm-devel] [PATCH] Fix over-zealous flush_disk when changing device size.
NeilBrown <neilb@...e.de> writes:
> On Wed, 16 Mar 2011 16:30:22 -0400 Jeff Moyer <jmoyer@...hat.com> wrote:
>
>> NeilBrown <neilb@...e.de> writes:
>>
>> >> Synchronous notification of errors. If we don't try to write everything
>> >> back immediately after the size change, we don't see dirty pages in
>> >> zapped regions until the writeout/page cache management takes it into
>> >> its head to try to clean the pages.
>> >>
>> >
>> > So if you just want synchronous errors, I think you want:
>> > fsync_bdev()
>> >
>> > which calls sync_filesystem() if it can find a filesystem, else
>> > sync_blockdev(); (sync_filesystem itself calls sync_blockdev too).
>>
>> ... which deadlocks md. ;-) writeback_inodes_sb_nr is waiting for the
>> flusher thread to write back the dirty data. The flusher thread is
>> stuck in md_write_start, here:
>>
>> wait_event(mddev->sb_wait,
>> !test_bit(MD_CHANGE_PENDING, &mddev->flags));
>>
>> This is after reverting your change, and replacing the flush_disk call
>> in check_disk_size_change with a call to fsync_bdev. I'm not familiar
>> enough with md to really suggest a way forward. Neil?
>
> That would be quite easy to avoid.
> Just call
> md_write_start()
> before revalidate_disk, and
> md_write_end()
> afterwards.
That does not avoid the problem (if I understood your suggestion). You
instead end up with the following:
INFO: task md127_raid5:2282 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
md127_raid5 D ffff88011c72d0a0 5688 2282 2 0x00000080
ffff880118997c20 0000000000000046 ffff880100000000 0000000000000246
0000000000014d00 ffff88011c72cb10 ffff88011c72d0a0 ffff880118997fd8
ffff88011c72d0a8 0000000000014d00 ffff880118996010 0000000000014d00
Call Trace:
[<ffffffff8138bbbd>] md_write_start+0xad/0x1d0
[<ffffffff810801d0>] ? autoremove_wake_function+0x0/0x40
[<ffffffffa0311558>] raid5_finish_reshape+0x98/0x1e0 [raid456]
[<ffffffff8138a933>] reap_sync_thread+0x63/0x130
[<ffffffff8138c8b6>] md_check_recovery+0x1f6/0x6f0
[<ffffffffa03150ab>] raid5d+0x3b/0x610 [raid456]
[<ffffffff810804c9>] ? prepare_to_wait+0x59/0x90
[<ffffffff81387ee9>] md_thread+0x119/0x150
[<ffffffff810801d0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff81387dd0>] ? md_thread+0x0/0x150
[<ffffffff8107fb56>] kthread+0x96/0xa0
[<ffffffff8100cc04>] kernel_thread_helper+0x4/0x10
[<ffffffff8107fac0>] ? kthread+0x0/0xa0
[<ffffffff8100cc00>] ? kernel_thread_helper+0x0/0x10
I'll leave this to you to work out when you have time.
Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists