lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1240968626.5583.25.camel@BVR-FS.beaverton.ibm.com>
Date:	Tue, 28 Apr 2009 18:30:26 -0700
From:	Mingming <cmm@...ibm.com>
To:	"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>
Cc:	tytso@....edu, sandeen@...hat.com, linux-ext4@...r.kernel.org
Subject: Re: [PATCH -V3] Fix sub-block zeroing for buffered writes into
	unwritten extents


On Wed, 2009-04-29 at 00:20 +0530, Aneesh Kumar K.V wrote:
> We need to mark the  buffer_head mapping prealloc space
> as new during write_begin. Otherwise we don't zero out the
> page cache content properly for a partial write. This will
> cause file corruption with preallocation.
> 
> Also use block number -1 as the fake block number so that
> unmap_underlying_metadata doesn't drop wrong buffer_head
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@...ux.vnet.ibm.com>
> 
> ---
>  fs/ext4/inode.c |   11 ++++++++++-
>  1 files changed, 10 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index e91f978..0214389 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2318,11 +2318,20 @@ static int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
>  			/* not enough space to reserve */
>  			return ret;
> 
> -		map_bh(bh_result, inode->i_sb, 0);
> +		map_bh(bh_result, inode->i_sb, -1);
>  		set_buffer_new(bh_result);
>  		set_buffer_delay(bh_result);
>  	} else if (ret > 0) {
>  		bh_result->b_size = (ret << inode->i_blkbits);
> +		bh_result->b_bdev = inode->i_sb->s_bdev;
> +		bh->b_blocknr = -1;

A small typo, should be bh_result->b_blocknr

But isn't this will incorrect set up the b_blocknr for normal
successful(allocated, non preallocated) get_block lookup? As
ext4_get_blocks_wrap() will return 1 (>0) if it found it allocated.

> +		/*
> +		 * With sub-block writes into unwritten extents
> +		 * we also need to mark the buffer as new so that
> +		 * the unwritten parts of the buffer gets correctly zeroed.
> +		 */
> +		if (buffer_unwritten(bh_result))
> +			set_buffer_new(bh_result);
>  		ret = 0;
>  	}
> 

I think it nicer to setup the fake block_nr together when
set_buffer_new(), at the ext4_ext_get_block() time when it handles
preallocation lookup on delalloc. This will avoid calling
buffer_unwritten(bh_result) check for every return bh result for
ext4_get_blocks_wrap(). And makes the logic more saner.

How about patch attached, tested with my testcase, the partial write
preallocation corruption is fixed.

But looking at the comment change, looks like the original intention is
to set the buffer unwritten so that a read from that uninitialzed block
returns 0. Turns out the VFS needs to set the buffer new for this
purpose.


-----------------------------------------------------------------------------

This patch fixed the file data garbage with partial write to a
preallocated space when delayed allocation is enabled.

The preallocated (uninitialized) buffer need to be set as buffer_new()
so read to this uninitialized block will return 0. With delayed
allocation, the create flag pass to get_block() from write_begin()
does a look up on the preallocated extent, the returning buffer did not
have the proper buffer_new flag set, resulting the page filled up with
garbage and get written to disk later.

Signed-off-by: Mingming Cao <cmm@...ibm.com>
Index: linux-2.6.28-rc6/fs/ext4/extents.c
===================================================================
--- linux-2.6.28-rc6.orig/fs/ext4/extents.c	2009-04-28 11:52:05.000000000 -0700
+++ linux-2.6.28-rc6/fs/ext4/extents.c	2009-04-28 17:35:55.000000000 -0700
@@ -2767,15 +2767,28 @@ int ext4_ext_get_blocks(handle_t *handle
 			if (create == EXT4_CREATE_UNINITIALIZED_EXT)
 				goto out;
 			if (!create) {
+				if (allocated > max_blocks)
+					allocated = max_blocks;
 				/*
-				 * We have blocks reserved already.  We
+				 * We have blocks preallocated already. For
+				 * lookup (creat=0) at write_begin time we
 				 * return allocated blocks so that delalloc
 				 * won't do block reservation for us.  But
-				 * the buffer head will be unmapped so that
-				 * a read from the block returns 0s.
+				 * we need to mark the buffer head new so that
+				 * a read from the block returns 0s. Fake the
+				 * block number -1 so that the following call
+				 * of unmap_underlying_metadata doesn't drop
+				 * wrong buffer_head
 				 */
-				if (allocated > max_blocks)
-					allocated = max_blocks;
+				bh_result->b_blocknr = -1;
+				bh_result->b_bdev = inode->i_sb->s_bdev;
+				set_buffer_new(bh_result);
+
+				 /*
+				  * We also needs to mark the buffer as
+				  * unwritten so we don'te write these
+				  * uninitalized pages
+				  */
 				set_buffer_unwritten(bh_result);
 				goto out2;
 			}
Index: linux-2.6.28-rc6/fs/ext4/inode.c
===================================================================
--- linux-2.6.28-rc6.orig/fs/ext4/inode.c	2009-04-28 11:52:05.000000000 -0700
+++ linux-2.6.28-rc6/fs/ext4/inode.c	2009-04-28 17:34:24.000000000 -0700
@@ -2173,7 +2173,7 @@ static int ext4_da_get_block_prep(struct
 			/* not enough space to reserve */
 			return ret;
 
-		map_bh(bh_result, inode->i_sb, 0);
+		map_bh(bh_result, inode->i_sb, -1);
 		set_buffer_new(bh_result);
 		set_buffer_delay(bh_result);
 	} else if (ret > 0) {

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ