[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20121221224947.GA23652@quack.suse.cz>
Date: Fri, 21 Dec 2012 23:49:47 +0100
From: Jan Kara <jack@...e.cz>
To: Theodore Ts'o <tytso@....edu>
Cc: Jan Kara <jack@...e.cz>, Dmitry Monakhov <dmonakhov@...nvz.org>,
linux-ext4@...r.kernel.org
Subject: Re: Uninitialized extent races
On Fri 21-12-12 13:02:43, Ted Tso wrote:
> On Fri, Dec 21, 2012 at 05:19:29PM +0100, Jan Kara wrote:
> > No, I'm speaking about merging currently uninitialized extents. I.e.
> > suppose someone does the following on a filesystem with dioread_nolock so
> > that writeback happens via unwritten extents:
> > fd = open("file", O_RDWR);
> > pwrite(fd, buf, 4096, 0);
> > flusher thread starts writing
> > we create uninitialized extent for
> > range 0-4096
> > fallocate(fd, 0, 4096, 4096);
> > - we merge extents and now have just 1 uninitialized extent for range
> > 0-8192
> > ext4_convert_unwritten_extents() now
> > has to split the extent to finish
> > the IO.
>
> Ah, I see. Disabling the the merging that might take place as a
> result of the fallocate. Yes, I agree that's a completely sane thing
> to do.
OK, I'll write some patches.
> The alternate approach would be to add a flag in the extent status
> tree indicating that an unwritten conversion is pending, but that
> would add more complexity.
>
> Hmmm.... do we need that complexity anyway? What happens if we have a
> race between a punch (or truncate) and the flusher thread, so there is
> pending write. There are two things that would be of concern. (1)
> Will convert_unwritten_extents do the right thing if the extent in
> question has disappeared, and (2) what if the block gets reused for
> some other inode in the interim?
>
> I _think_ we're OK in the case of (2), since we're not using FUA writes
> for anything other than the commit block, so there shouldn't be any way
> that a write for the new inode could complete before the pending write
> finishes up. And (1) should be OK, although it may end up triggering a
> WARN_ON and a scarry ext4_msg() in ext4_convert_unwritten_extents().
> But it made me stop and think....
It's actually simpler than that. We wait for any pending DIO using
inode_dio_wait() and i_mutex protects from new writes to be submitted. So
that takes care of one possibility. truncate_inode_pages() waits for
PageWriteback bit so that handles waiting for IO itself. After I change
ext4 to convert extents before clearing PageWriteback, this will take care
also of extent conversion. Now a call to ext4_flush_unwritten_io() in
ext4_ext_truncate() resolves the problems. It's called after invalidating
page cache so we know all the pending IO for the truncated / punched area
is finished, just a conversion may be still pending.
Honza
--
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists