[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1240968626.5583.25.camel@BVR-FS.beaverton.ibm.com>
Date: Tue, 28 Apr 2009 18:30:26 -0700
From: Mingming <cmm@...ibm.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>
Cc: tytso@....edu, sandeen@...hat.com, linux-ext4@...r.kernel.org
Subject: Re: [PATCH -V3] Fix sub-block zeroing for buffered writes into
unwritten extents
On Wed, 2009-04-29 at 00:20 +0530, Aneesh Kumar K.V wrote:
> We need to mark the buffer_head mapping prealloc space
> as new during write_begin. Otherwise we don't zero out the
> page cache content properly for a partial write. This will
> cause file corruption with preallocation.
>
> Also use block number -1 as the fake block number so that
> unmap_underlying_metadata doesn't drop wrong buffer_head
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@...ux.vnet.ibm.com>
>
> ---
> fs/ext4/inode.c | 11 ++++++++++-
> 1 files changed, 10 insertions(+), 1 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index e91f978..0214389 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2318,11 +2318,20 @@ static int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
> /* not enough space to reserve */
> return ret;
>
> - map_bh(bh_result, inode->i_sb, 0);
> + map_bh(bh_result, inode->i_sb, -1);
> set_buffer_new(bh_result);
> set_buffer_delay(bh_result);
> } else if (ret > 0) {
> bh_result->b_size = (ret << inode->i_blkbits);
> + bh_result->b_bdev = inode->i_sb->s_bdev;
> + bh->b_blocknr = -1;
A small typo, should be bh_result->b_blocknr
But isn't this will incorrect set up the b_blocknr for normal
successful(allocated, non preallocated) get_block lookup? As
ext4_get_blocks_wrap() will return 1 (>0) if it found it allocated.
> + /*
> + * With sub-block writes into unwritten extents
> + * we also need to mark the buffer as new so that
> + * the unwritten parts of the buffer gets correctly zeroed.
> + */
> + if (buffer_unwritten(bh_result))
> + set_buffer_new(bh_result);
> ret = 0;
> }
>
I think it nicer to setup the fake block_nr together when
set_buffer_new(), at the ext4_ext_get_block() time when it handles
preallocation lookup on delalloc. This will avoid calling
buffer_unwritten(bh_result) check for every return bh result for
ext4_get_blocks_wrap(). And makes the logic more saner.
How about patch attached, tested with my testcase, the partial write
preallocation corruption is fixed.
But looking at the comment change, looks like the original intention is
to set the buffer unwritten so that a read from that uninitialzed block
returns 0. Turns out the VFS needs to set the buffer new for this
purpose.
-----------------------------------------------------------------------------
This patch fixed the file data garbage with partial write to a
preallocated space when delayed allocation is enabled.
The preallocated (uninitialized) buffer need to be set as buffer_new()
so read to this uninitialized block will return 0. With delayed
allocation, the create flag pass to get_block() from write_begin()
does a look up on the preallocated extent, the returning buffer did not
have the proper buffer_new flag set, resulting the page filled up with
garbage and get written to disk later.
Signed-off-by: Mingming Cao <cmm@...ibm.com>
Index: linux-2.6.28-rc6/fs/ext4/extents.c
===================================================================
--- linux-2.6.28-rc6.orig/fs/ext4/extents.c 2009-04-28 11:52:05.000000000 -0700
+++ linux-2.6.28-rc6/fs/ext4/extents.c 2009-04-28 17:35:55.000000000 -0700
@@ -2767,15 +2767,28 @@ int ext4_ext_get_blocks(handle_t *handle
if (create == EXT4_CREATE_UNINITIALIZED_EXT)
goto out;
if (!create) {
+ if (allocated > max_blocks)
+ allocated = max_blocks;
/*
- * We have blocks reserved already. We
+ * We have blocks preallocated already. For
+ * lookup (creat=0) at write_begin time we
* return allocated blocks so that delalloc
* won't do block reservation for us. But
- * the buffer head will be unmapped so that
- * a read from the block returns 0s.
+ * we need to mark the buffer head new so that
+ * a read from the block returns 0s. Fake the
+ * block number -1 so that the following call
+ * of unmap_underlying_metadata doesn't drop
+ * wrong buffer_head
*/
- if (allocated > max_blocks)
- allocated = max_blocks;
+ bh_result->b_blocknr = -1;
+ bh_result->b_bdev = inode->i_sb->s_bdev;
+ set_buffer_new(bh_result);
+
+ /*
+ * We also needs to mark the buffer as
+ * unwritten so we don'te write these
+ * uninitalized pages
+ */
set_buffer_unwritten(bh_result);
goto out2;
}
Index: linux-2.6.28-rc6/fs/ext4/inode.c
===================================================================
--- linux-2.6.28-rc6.orig/fs/ext4/inode.c 2009-04-28 11:52:05.000000000 -0700
+++ linux-2.6.28-rc6/fs/ext4/inode.c 2009-04-28 17:34:24.000000000 -0700
@@ -2173,7 +2173,7 @@ static int ext4_da_get_block_prep(struct
/* not enough space to reserve */
return ret;
- map_bh(bh_result, inode->i_sb, 0);
+ map_bh(bh_result, inode->i_sb, -1);
set_buffer_new(bh_result);
set_buffer_delay(bh_result);
} else if (ret > 0) {
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists