[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090514060015.GB7359@skywalker>
Date: Thu, 14 May 2009 11:30:15 +0530
From: "Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>
To: Theodore Tso <tytso@....edu>
Cc: Eric Sandeen <sandeen@...hat.com>, cmm@...ibm.com,
linux-ext4@...r.kernel.org
Subject: Re: [PATCH 2/3] ext4: Clear the unwritten buffer_head flag properly
On Wed, May 13, 2009 at 06:28:47PM -0400, Theodore Tso wrote:
> On Wed, May 13, 2009 at 01:56:56PM -0500, Eric Sandeen wrote:
> > The comment:
> >
> > /*
> > + * The above get_blocks can cause the buffer to be
> > + * marked unwritten. So clear the same.
> > + */
> > + clear_buffer_unwritten(bh);
> >
> > is imho not helpful; to me it says "we -just- set this, so clear it!"
> > It leaves me scratching my head.
>
> I've updated it the comment to say this instead.
>
> /*
> * When we call get_blocks without the create flag, the
> * BH_Unwritten flag could have gotten set if the blocks
> * requested were part of a uninitialized extent. We need to
> * clear this flag now that we are committed to convert all or
> * part of the uninitialized extent to be an initialized
> * extent. This is because we need to avoid the combination
> * of BH_Unwritten and BH_Mapped flags being simultaneously
> * set on the buffer_head.
> */
The last line in the above comment is not a problem with the latest
kernel with all the patches in the patch queue. The patch that does that
is "ext4: Mark the unwritten buffer_head as mapped during write_begin"
The unwritten and mapped state together was a problem with the code path
we had before in ext4_da_get_block_prep
we had:
ret = ext4_get_blocks(NULL, inode, iblock, 1, bh_result, 0);
.....
..
} else if (ret > 0) {
bh_result->b_size = (ret << inode->i_blkbits);
if (buffer_unwritten(bh_result)) {
bh_result.b_blocknr = ~0;
....
}
}
The above can result in
1) We do a multi block delayed alloc to prealloc space. That would get
us multiple buffer_heads marked with BH_Unwritten. (say 10, 11, 12)
2) pdflush attempt to write some pages (say mapping block 10) which
cause
a get_block call with create = 1. That would attempt to convert
uninitialized extent to initialized one. This can cause multiple blocks
to be marked initialized. ( say 10, 11 , 12)
3) We do an overwrite of block 11. That would mean we call
ext4_da_get_block_prep, which would again do a get_block for block 11
with create = 0. But remember we already have buffer_head marked with
BH_Unwritten flag. But the buffer was unmapped because it is unwritten
( We are fixing this mess in the patch for 2.6.31).
4) The get_block call will find the buffer mapped due to step b. And
mark the buffer_head mapped. There we go . We end up with buffer_head
mapped and unwritten
5) later in ext4_da_get_block_prep we check whether the buffer_head in
marked
BH_Unwritten if so we set the block number to ~0. This is introduced by
[PATCH -V4 1/2] Fix sub-block zeroing for buffered writes into unwritten
extents
6) So now we have a buffer_head that is mapped, unwritten, with
b_blocknr = ~0. That would result in the I/O error.
Now that we have the actual block number in bh_result.b_blocknr I guess
we should be ok to have the unwritten flag set. But then i guess it is
against the VFS assumption of buffer_head state. The state unwritten
indicate the blocks are not yet written. So if we don't do a
clear_buffer_unwritten as in the patch we end up with a block that is
written ( done via create = 1 get_block call) and still marked
unwritten. (see the 6 step example above)
Now to explain what "ext4: Mark the unwritten buffer_head as mapped
during write_begin" does.
It marks the buffer_head as mapped in the write_begin phase.
And that helps in making sure we don't end up calling get_block
multiple times. So with delayed allocation we now have between
the write_begin and writepage phase a buffer_head which is mapped
and unwritten for blocks in the fallocate space. Once we do
writepage for the block we will have buffer_head which is mapped and
unwritten flag cleared. The unwritten get cleared when we do a get_block
call with create = 1.
To achieve the above we need to make sure writepage code path looks at
the unwritten flag and does a get_blocks call with create = 1. With
mainline we have in writepage code path
mpage_da_map_blocks(...)
if ((mpd->b_state & (1 << BH_Mapped)) &&
!(mpd->b_state & (1 << BH_Delay)))
return 0;
...
..
ext4_da_get_block_write()
If we start marking buffer_head mapped for fallocate blocks in the
write_begin phase, then the above code will return without doing
any get_block(create = 1) call. That means we don't convert the
uninitialized extent to initialized one. So along with marking
buffer_head mapped and unwritten in the write_begin phase we also
need changes to writepage code path which does something like below
mpage_da_map_blocks(...)
if ((mpd->b_state & (1 << BH_Mapped)) &&
!(mpd->b_state & (1 << BH_Delay)) &&
!(mpd->b_state & (1 << BH_Unwritten)))
return 0;
...
..
ext4_da_get_block_write()
this is done in patch "ext4: Mark the unwritten buffer_head as mapped
during write_begin".
-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists