[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130812161455.GA19471@quack.suse.cz>
Date: Mon, 12 Aug 2013 18:14:55 +0200
From: Jan Kara <jack@...e.cz>
To: Dave Chinner <david@...morbit.com>
Cc: Jan Kara <jack@...e.cz>, linux-fsdevel@...r.kernel.org,
xfs@....sgi.com, hch@...radead.org, Jeff Moyer <jmoyer@...hat.com>,
Al Viro <viro@...IV.linux.org.uk>, linux-ext4@...r.kernel.org,
Christoph Hellwig <hch@....de>
Subject: Re: [PATCH 1/2] direct-io: Implement generic deferred AIO completions
Hi Dave,
I remembered about this patch set and realized I didn't get reply from
you regarding the following question (see quoted email below for details):
Do you really need to defer completion of appending direct IO? Because
generic code makes sure appending direct IO isn't async and thus
dio_complete() -> xfs_end_io_direct_write() gets called directly from
do_blockdev_direct_IO(). I.e. from a normal context and not from interrupt.
I've already addressed rest of your comments so this is the only item that
is remaining.
On Tue 16-07-13 23:00:27, Jan Kara wrote:
> > > -static ssize_t dio_complete(struct dio *dio, loff_t offset, ssize_t ret, bool is_async)
> > > +static ssize_t dio_complete(struct dio *dio, loff_t offset, ssize_t ret,
> > > + bool is_async)
> > > {
> > > ssize_t transferred = 0;
> > >
> > > @@ -258,19 +262,26 @@ static ssize_t dio_complete(struct dio *dio, loff_t offset, ssize_t ret, bool is
> > > if (ret == 0)
> > > ret = transferred;
> > >
> > > - if (dio->end_io && dio->result) {
> > > - dio->end_io(dio->iocb, offset, transferred,
> > > - dio->private, ret, is_async);
> > > - } else {
> > > - inode_dio_done(dio->inode);
> > > - if (is_async)
> > > - aio_complete(dio->iocb, ret, 0);
> > > - }
> > > + if (dio->end_io && dio->result)
> > > + dio->end_io(dio->iocb, offset, transferred, dio->private);
> >
> > Ok, so by here we are assuming all filesystem IO completion
> > processing is completed.
> Yes.
>
> > > +
> > > + inode_dio_done(dio->inode);
> > > + if (is_async)
> > > + aio_complete(dio->iocb, ret, 0);
> > >
> > > + kmem_cache_free(dio_cache, dio);
> > > return ret;
> >
> > Hmmm. I started wonder if this is valid, because XFS is supposed to
> > be doing workqueue based IO completion for appending writes and they
> > are supposed to be run in a workqueue.
> >
> > But, looking at the XFS code, there's actually a bug in the direct
> > IO completion code and we are not defering completion to a workqueue
> > like we should be for the append case. This code in
> > xfs_finish_ioend() demonstrates the intent:
> >
> > if (ioend->io_type == XFS_IO_UNWRITTEN)
> > queue_work(mp->m_unwritten_workqueue, &ioend->io_work);
> > else if (ioend->io_append_trans ||
> > >>>>>> (ioend->io_isdirect && xfs_ioend_is_append(ioend)))
> > queue_work(mp->m_data_workqueue, &ioend->io_work);
> >
> > The problem is that we only ever call xfs_finish_ioend() if is_async
> > is true, and that will never be true for direct IO beyond the current
> > EOF. That's a bug, and is Bad(tm).
> >
> > [ Interestingly, it turns out that dio->is_async is never set for
> > writes beyond EOF because of filesystems that update file sizes
> > before data IO completion occurs (stale data exposure issues). For
> > XFS, that can't happen and so dio->is_async could always be set.... ]
> >
> > What this means is that there's a condition for work queue deferral
> > of DIO IO completion that this patch doesn't handle. Setting deferral
> > only on unwritten extents like this:
> >
> > > @@ -1292,8 +1281,10 @@ __xfs_get_blocks(
> > > if (create || !ISUNWRITTEN(&imap))
> > > xfs_map_buffer(inode, bh_result, &imap, offset);
> > > if (create && ISUNWRITTEN(&imap)) {
> > > - if (direct)
> > > + if (direct) {
> > > bh_result->b_private = inode;
> > > + set_buffer_defer_completion(bh_result);
> > > + }
> > > set_buffer_unwritten(bh_result);
> > > }
> > > }
> >
> > misses that case. I suspect Christoph's original patch predated the
> > changes to XFS that added transactional file size updates to IO
> > completion and so didn't take it into account.
> OK, thanks for catching this. Doing the i_size check in
> _xfs_get_blocks() would be somewhat cumbersome I presume so I guess we can
> handle that case by adding __blockdev_direct_IO() flag to defer dio
> completion to a workqueue. XFS can then set the flag when it sees it needs
> and i_size update. OK?
>
> > > @@ -1390,9 +1381,7 @@ xfs_end_io_direct_write(
> > > struct kiocb *iocb,
> > > loff_t offset,
> > > ssize_t size,
> > > - void *private,
> > > - int ret,
> > > - bool is_async)
> > > + void *private)
> > > {
> > > struct xfs_ioend *ioend = iocb->private;
> > >
> > > @@ -1414,17 +1403,10 @@ xfs_end_io_direct_write(
> > >
> > > ioend->io_offset = offset;
> > > ioend->io_size = size;
> > > - ioend->io_iocb = iocb;
> > > - ioend->io_result = ret;
> > > if (private && size > 0)
> > > ioend->io_type = XFS_IO_UNWRITTEN;
> > >
> > > - if (is_async) {
> > > - ioend->io_isasync = 1;
> > > - xfs_finish_ioend(ioend);
> > > - } else {
> > > - xfs_finish_ioend_sync(ioend);
> > > - }
> > > + xfs_finish_ioend_sync(ioend);
> > > }
> >
> > As i mentioned, the existing code here is buggy. What we should be
> > doing here is:
> >
> > if (is_async) {
> > ioend->io_isasync = 1;
> > xfs_finish_ioend(ioend);
> > } else if (xfs_ioend_is_append(ioend))) {
> > xfs_finish_ioend(ioend);
> > } else {
> > xfs_finish_ioend_sync(ioend);
> > }
> Umm, I started to wonder why do you actually need to defer the completion
> for appending ioend. Because if DIO isn't async, dio_complete() won't be
> called from interrupt context but from __blockdev_direct_IO(). So it seems
> you can do everything you need directly there without deferring to a
> workqueue. But maybe there's some locking subtlety I'm missing.
Honza
--
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists