[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+b=bo9PsjNJ46=J-Ck1Czw34hXEC3VH8YMhUuufhcZ23w@mail.gmail.com>
Date: Mon, 20 Jun 2016 15:38:41 +0200
From: Dmitry Vyukov <dvyukov@...gle.com>
To: Jan Kara <jack@...e.cz>
Cc: Tejun Heo <tj@...nel.org>, Jens Axboe <axboe@...com>,
Andrey Ryabinin <ryabinin.a.a@...il.com>,
Alexander Viro <viro@...iv.linux.org.uk>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
syzkaller <syzkaller@...glegroups.com>,
Kostya Serebryany <kcc@...gle.com>,
Alexander Potapenko <glider@...gle.com>,
Ilya Dryomov <idryomov@...il.com>, Jan Kara <jack@...e.com>,
kernel-team@...com, Ted Tso <tytso@....edu>
Subject: Re: [PATCH] block: flush writeback dwork before detaching a bdev
inode from it
On Mon, Jun 20, 2016 at 3:31 PM, Jan Kara <jack@...e.cz> wrote:
> Hi,
>
> On Fri 17-06-16 12:04:05, Tejun Heo wrote:
>> 43d1c0eb7e11 ("block: detach bdev inode from its wb in
>> __blkdev_put()") detached bdev inode from its wb as the bdev inode may
>> outlive the underlying bdi and thus the wb. This is accomplished by
>> invoking inode_detach_wb() from __blkdev_put(); however, while the
>> inode can't be dirtied by the time control reaches there, that doesn't
>> guarantee that writeback isn't already in progress on the inode. This
>> can lead to the inode being disassociated from its wb while writeback
>> operation is in flight causing oopses like the following.
>
> <snip>
>
>> Fix it by flushing the wb dwork before detaching the inode. Combined
>> with the fact that the inode can no longer be dirtied, this guarantees
>> that no writeback operation can be in flight or initiated.
>
> Sorry for the late reply but now when thinking about the patch I don't
> think it is quite right. Writeback can happen from other contexts than just
> the worker one (e.g. kswapd can do writeback, or in some out-of-memory
> situations we may punt to doing writeback directly instead of calling the
> worker, or sync(2) calls fdatawrite() for block device inode directly when
> iterating through blockdev superblock). So flushing the workqueue IMHO is
> not covering 100% of cases.
>
> I wanted to suggest to use inode_wait_for_writeback() which will make sure
> writeback is done with the inode. However we effectively already do this
> by calling bdev_write_inode() which calls writeback_single_inode() in
> WB_SYNC_ALL mode. So by the time that call completes we are sure writeback
> code is not looking at the inode. *However* what I think is happening is
> that sync_blockdev() writes all the dirty pages, bdev_write_inode() writes
> the inode and clears all dirty bits, however the inode still stays in the
> b_dirty / b_io list of the wb because it has I_DIRTY_TIME set. Subsequently
> once flusher work runs, it finds the inode, looks at it and boom. And the
> problem seems to be that write_inode_now(inode, true) does not result in
> I_DIRTY_TIME being cleared.
>
> Attached patch should fix this issue - it is compile-tested only. Dmitry,
> can you check whether this patch fixes the issue for you as well?
I can't directly test it because crash happened very infrequently.
If Tejun/Ted agree that it is the right way to fix it, then I can
patch it, restart the fuzzer and leave it running for a while.
Powered by blists - more mailing lists