[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120126170209.GE17113@quack.suse.cz>
Date: Thu, 26 Jan 2012 18:02:09 +0100
From: Jan Kara <jack@...e.cz>
To: Jeff Moyer <jmoyer@...hat.com>
Cc: linux-ext4@...r.kernel.org
Subject: Re: [patch|rfc] ext4: fix race between unwritten extent conversion
and truncate
Hi,
On Wed 25-01-12 15:40:56, Jeff Moyer wrote:
> The following comment in ext4_end_io_dio caught my attention:
>
> /* XXX: probably should move into the real I/O completion handler */
> inode_dio_done(inode);
>
> The truncate code takes i_mutex, then calls inode_dio_wait. Because the
> ext4 code path above will end up dropping the mutex before it is
> reacquired by the worker thread that does the extent conversion, it
> seems to me that the truncate can happen out of order. Does it matter?
> I really don't know, but I'm hoping someone here might. ;-) Anyway,
> here's a patch I cooked up to address the issue. I'm not sure what the
> result of a race would even be, so I haven't really been able to test
> that it works as intended.
>
> So, comments?
Yeah, the race looks real. We won't probably crash but we will certainly
curse into the logs which isn't nice ;). Thanks for spotting this.
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 513004f..fc2a373 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -2286,7 +2286,7 @@ extern void ext4_exit_pageio(void);
> extern void ext4_ioend_wait(struct inode *);
> extern void ext4_free_io_end(ext4_io_end_t *io);
> extern ext4_io_end_t *ext4_init_io_end(struct inode *inode, gfp_t flags);
> -extern int ext4_end_io_nolock(ext4_io_end_t *io);
> +extern int ext4_end_io_nolock(ext4_io_end_t *io, bool direct);
> extern void ext4_io_submit(struct ext4_io_submit *io);
> extern int ext4_bio_write_page(struct ext4_io_submit *io,
> struct page *page,
> diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
> index 00a2cb7..f9aec9a 100644
> --- a/fs/ext4/fsync.c
> +++ b/fs/ext4/fsync.c
> @@ -104,7 +104,7 @@ int ext4_flush_completed_IO(struct inode *inode)
> * queue work.
> */
> spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
> - ret = ext4_end_io_nolock(io);
> + ret = ext4_end_io_nolock(io, false);
This is wrong. i_completed_io_list contains work items for both direct
and buffered IO. Just in ext4_flush_completed_IO() we process the list
synchronously while ext4_end_io_work() processes the list in the
background. So what you have to do is store in ext4_io_end_t whether the IO
was direct or not and then use that in ext4_end_io_nolock() function.
> if (ret < 0)
> ret2 = ret;
> spin_lock_irqsave(&ei->i_completed_io_lock, flags);
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index feaa82f..4e76c30 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2795,9 +2795,6 @@ out:
>
> /* queue the work to convert unwritten extents to written */
> queue_work(wq, &io_end->work);
> -
> - /* XXX: probably should move into the real I/O completion handler */
> - inode_dio_done(inode);
> }
>
> static void ext4_end_io_buffer_write(struct buffer_head *bh, int uptodate)
> diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
> index 4758518..47c4a03 100644
> --- a/fs/ext4/page-io.c
> +++ b/fs/ext4/page-io.c
> @@ -87,7 +87,7 @@ void ext4_free_io_end(ext4_io_end_t *io)
> * Called with inode->i_mutex; we depend on this when we manipulate
> * io->flag, since we could otherwise race with ext4_flush_completed_IO()
> */
> -int ext4_end_io_nolock(ext4_io_end_t *io)
> +int ext4_end_io_nolock(ext4_io_end_t *io, bool direct)
> {
> struct inode *inode = io->inode;
> loff_t offset = io->offset;
> @@ -110,6 +110,8 @@ int ext4_end_io_nolock(ext4_io_end_t *io)
> if (io->iocb)
> aio_complete(io->iocb, io->result, 0);
>
> + if (direct)
> + inode_dio_done(inode);
> /* Wake up anyone waiting on unwritten extent conversion */
> if (atomic_dec_and_test(&EXT4_I(inode)->i_aiodio_unwritten))
> wake_up_all(ext4_ioend_wq(io->inode));
> @@ -152,7 +154,7 @@ static void ext4_end_io_work(struct work_struct *work)
> }
> list_del_init(&io->list);
> spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
> - (void) ext4_end_io_nolock(io);
> + (void) ext4_end_io_nolock(io, true);
> mutex_unlock(&inode->i_mutex);
> free:
> ext4_free_io_end(io);
Honza
--
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists