[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150824222720.GD714@dastard>
Date: Tue, 25 Aug 2015 08:27:20 +1000
From: Dave Chinner <david@...morbit.com>
To: Tejun Heo <tj@...nel.org>
Cc: Eryu Guan <eguan@...hat.com>, Jens Axboe <axboe@...nel.dk>,
Jan Kara <jack@...e.cz>, linux-kernel@...r.kernel.org,
xfs@....sgi.com, axboe@...com, Jan Kara <jack@...e.com>,
linux-fsdevel@...r.kernel.org, kernel-team@...com
Subject: Re: [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME
inodes
On Mon, Aug 24, 2015 at 02:10:38PM -0400, Tejun Heo wrote:
> Hello, Dave.
>
> On Fri, Aug 21, 2015 at 09:04:51AM +1000, Dave Chinner wrote:
> > > Maybe I'm misunderstanding the code but all xfs_writepage() calls are
> > > from unbound workqueues - the writeback workers - while
> > > xfs_setfilesize() are from bound workqueues, so I wondered why that
> > > was and looked at the code and the setsize functions are run off of a
> > > separate work item which is queued from the end_bio callback and I
> > > can't tell who would be waiting for them. Dave, what am I missing?
> >
> > xfs_setfilesize runs transactions, so it can't be run from IO
> > completion context as it needs to block (i.e. on log space or inode
> > locks). It also can't block log IO completion, nor metadata Io
> > completion, as only log IO completion can free log space, and the
> > inode lock might be waiting on metadata buffer IO completion (e.g.
> > during delayed allocation). Hence we have multiple IO completion
> > workqueues to keep these things separated and deadlock free. i.e.
> > they all get punted to a workqueue where they are then processed in
> > a context that can block safely.
>
> I'm still a bit confused. What prevents the following from happening?
>
> 1. io completion of last dirty page of an inode and work item for
> xfs_setfilesize() is queued.
>
> 2. inode removed from dirty list.
The inode has already been removed from the dirty list - that
happens at inode writeback submission time, not IO completion.
> 3. __sync_filesystem() invokes sync_inodes_sb(). There are no dirty
> pages, so it finishes.
There are no dirty pages, but the pages aren't clean, either. i.e
they are still under writeback. Hence we need to invoke
wait_inodes_sb() to wait for writeback on all pages to complete
before returning.
> 4. xfs_fs_sync_fs() is called which calls _xfs_log_force() but the
> work item from #1 hasn't run yet, so the size update isn't written
> out.
The bug here is that wait_inodes_sb() has not been run, therefore
->syncfs is being run before IO completions have been processed and
pages marked clean.
> 5. Crash.
>
> Is it that _xfs_log_force() waits for the setfilesize transaction
> created during writepage?
No, it's wait_inodes_sb() that does the waiting for data IO
completion for sync.
Cheers,
Dave.
--
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists