lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 26 Feb 2013 22:23:42 +0800
From:	Zheng Liu <gnehzuil.liu@...il.com>
To:	Dmitry Monakhov <dmonakhov@...nvz.org>
Cc:	linux-ext4@...r.kernel.org, tytso@....edu, jack@...e.cz,
	wenqing.lz@...bao.com
Subject: [PATCH] ext4: fix wrong m_len value after unwritten extent
 conversion (Re: [PATCH 5/5] ext4: invalidate...)

Hi Dmitry and Ted,

On Mon, Feb 25, 2013 at 08:29:46PM +0400, Dmitry Monakhov wrote:
[snip]
> One BIG note about this patch.
> It fix BUG_ON(bh->b_blocknr != pblock) in fs/ext4/inode.c:1452
> but 300'th xfstest(from my repo) still failed due to data corruption in
> verifier thread. 
> LOG:
> MOUNT_OPTIONS -- -o acl,user_xattr /dev/mapper/vzvg-scratch_dev
> /mnt_scratch
> 
> 300 33s ... [failed, exit status 1] - output mismatch (see 300.out.bad)
>     --- 300.out      2013-02-20 12:46:24.000000000 +0400
>     +++ 300.out.bad  2013-02-25 20:18:24.000000000 +0400
>     @@ -2,3 +2,5 @@
>      
>       Start defragment activity 
>      
>     +failed: '/usr/local/bin/fio /tmp/1665-300.fio'
>     +(see 300.full for details)
>      ...
>      (Run 'diff -u 300.out 300.out.bad' to see the entire diff)
> 
> #300.full
> crc32c: verify failed at file /mnt_scratch/test2 offset 16973824, length
>     65536
>        Expected CRC: d062565e
>        Received CRC: f2cd2028
>        received data dumped as test2.16973824.received
>        expected data dumped as test2.16973824.expected
> defrag-4k: Laying out IO file(s) (1 file(s) / 3400MB)
> donor-file-fuzzer: Laying out IO file(s) (1 file(s) / 3400MB)
> 
> #DIFF:
> --- /tmp/test2.16973824.expected        2013-02-25 20:23:04.000000000
>     +0400
> +++ /tmp/test2.16973824.received        2013-02-25 20:22:55.000000000
>     +0400
> @@ -254,3844 +254,6 @@
>  0000fd0 eb95 6201 89ec 151a bd72 9675 203c 0d80
>  0000fe0 b7ae b415 b8cc 0b2f b6f5 baab 9d0e 1741
>  0000ff0 f6de fbda d64b 1bd3 5edb 040c 7cdc 18e6
> -0001000 0bdb 5114 cd95 01eb 017b a435 d128 11af
> -0001010 202f 93c9 a80a 013e a405 c261 8b1c 0dab
> -0001020 b480 c649 1032 1b0a 3690 7e89 dee1 0ac7
> -0001030 26d2 9489 3c97 0bd8 24da 5f28 3d4e 066d
> -0001040 049b b978 7815 159c 8093 b4e1 b246 1c25
> .....
> -000ffc0 bbb1 fa68 5622 022d 9776 0174 2d90 1ecb
> -000ffd0 92ee b473 64f7 0bcb 725d 2d17 9265 0fff
> -000ffe0 6e4b ec74 5bc0 0618 0dc9 e669 953d 002f
> -000fff0 a1b9 35e8 ff73 17a5 9437 15e0 24fa 150a
> +0001000 0000 0000 0000 0000 0000 0000 0000 0000
> +*
>  0010000
> 
> It seems that we data was written to uninitialized extent, but
> unwritten->initialized extent conversion was missed somewhere.
> I have not fix for that issue yet.

I take a close look at this bug, and I found that a wrong 'map->m_len'
is returned from ext4_ext_map_blocks when we try to convert an unwritten
extent with EXT4_GET_BLOCKS_IO_CONVERT_EXT flag.  Here we always assume
that the return value of ext/ind_map_blocks() is equal to m_len.  But
here it breaks this assumption.  Let us consider what happens.

When we do a dio write for a extent-based file, we will preallocate an
unwritten extent for it.  After dio write has been done, we will convert
it in ext4_end_io callback.  We use ext4_convert_unwritten_extents to do
this conversion.  Then the codepath is like beblow:

ext4_convert_unwritten_extents()
  ->ext4_map_blocks()
    ->ext4_es_lookup_extent()
      [We can lookup an unwritten extent]
    ->ext4_ext_map_blocks()
      ->ext4_ext_handle_uninitialized_extents()
    ->ext4_es_insert_extent()
      [We update status tree, but the length of written extent is wrong.
       The length of written extent is greater than the length of we
       have converted]

The following case demonstrates what happens.

1. We do a dio write.  An unwritten extent will be preallocated.

status tree: [0, 23]:unwritten
extent tree: [0, 23]:unwritten

2. We try to convert an unwritten extent, e.g. [0, 15].  But due to this
bug we will update all unwritten extent in status tree.

status tree: [0, 23]:written
extent tree: [0, 15]:written, [16, 23]:unwritten

3. Then we try to convert the following unwritten extent, but we will
return directly in ext4_map_blocks because we lookup an written extent
from status tree, and that unwritten extent will never be converted.  At
the time if e4defrag is running, the status tree could be removed, and
we could read an unwritten extent.

Certainly this is only my *guess* because in my sandbox xfstests #300 and
#301 never fail.  I am not sure this patch could fix this regression.
*But* I am pretty sure it will cause some potential bugs if it hasn't been
fixed.  I paste the patch at the below, which has been tested by xfstests
plus Dmitry's #300 and #301 test case at the following configurations.

 * Ext4 4k block
 * Ext4 4k block w/dioread_nolock
 * EXt4 4k block w/bigalloc


Dmitry, I really appreciate if you could give this patch a try.  Hope it
can fix this regression.  Please let me know if there is any update.
Thank you very much.

Regards,
						- Zheng


Subject: [PATCH] ext4: fix wrong m_len value after unwritten extent conversion

From: Zheng Liu <wenqing.lz@...bao.com>

We always assume that the return value of ext4_ext_map_blocks is equal
to map->m_len but when we try to convert an unwritten extent 'm_len'
value will break this assumption.  It is harmless until we use status
tree as a extent cache because we need to update status tree according
to 'm_len' value.

Meanwhile this commit marks EXT4_MAP_MAPPED flag after unwritten extent
conversion.  It shouldn't cause a bug because we update status tree
according to checking EXT4_MAP_UNWRITTEN flag.  But it should be fixed.

Signed-off-by: Zheng Liu <wenqing.lz@...bao.com>
Cc: "Theodore Ts'o" <tytso@....edu>
Cc: Dmitry Monakhov <dmonakhov@...nvz.org>
---
 fs/ext4/extents.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 372b2cb..0793a13 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3626,6 +3626,10 @@ ext4_ext_handle_uninitialized_extents(handle_t *handle, struct inode *inode,
 						 path, map->m_len);
 		} else
 			err = ret;
+		map->m_flags |= EXT4_MAP_MAPPED;
+		if (allocated > map->m_len)
+			allocated = map->m_len;
+		map->m_len = allocated;
 		goto out2;
 	}
 	/* buffered IO case */
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ