lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200514222120.GB4710@localhost.localdomain>
Date:   Thu, 14 May 2020 18:21:20 -0400
From:   Eric Whitney <enwlinux@...il.com>
To:     Jeffle Xu <jefflexu@...ux.alibaba.com>
Cc:     enwlinux@...il.com, linux-ext4@...r.kernel.org, tytso@....edu,
        joseph.qi@...ux.alibaba.com
Subject: Re: [PATCH RFC] ext4: fix partial cluster initialization when
 splitting extent

* Jeffle Xu <jefflexu@...ux.alibaba.com>:
> Hi Eric, would you mind explaining why the magic number '2' is used here
> when calculating the physical cluster number of the partial cluster in
> commit f4226d9ea400 ("ext4: fix partial cluster initialization") ?
> 
> ```
> +     /*
> +      * If we're going to split the extent, note that
> +      * the cluster containing the block after 'end' is
> +      * in use to avoid freeing it when removing blocks.
> +      */
> +     if (sbi->s_cluster_ratio > 1) {
> +             pblk = ext4_ext_pblock(ex) + end - ee_block + 2;
> +             partial_cluster =
> +                     -(long long) EXT4_B2C(sbi, pblk);
> +     }
> ```
> 
> As far as I understand, there we are initializing the partial cluster
> describing the beginning of the split extent after 'end'. The
> corrsponding physical block number of the first block in the split
> extent should be 'ext4_ext_pblock(ex) + end - ee_block + 1'.
> 
> This bug will cause xfstests shared/298 failure on ext4 with bigalloc
> enabled sometimes. Ext4 error messages indicate that previously freed
> blocks are being freed again, and the following fsck will fail due to
> the inconsistency of block bitmap and bg descriptor.
> 
> The following is an example case:
> 
> 1. First, Initialize a ext4 filesystem with cluster size '16K', block size
> '4K', in which case, one cluster contains four blocks.
> 
> 2. Create one file (e.g., xxx.img) on this ext4 filesystem. Now the extent
> tree of this file is like:
> 
> ...
> 36864:[0]4:220160
> 36868:[0]14332:145408
> 51200:[0]2:231424
> ...
> 
> 3. Then execute PUNCH_HOLE fallocate on this file. The hole range is
> like:
> 
> ..
> ext4_ext_remove_space: dev 254,16 ino 12 since 49506 end 49506 depth 1
> ext4_ext_remove_space: dev 254,16 ino 12 since 49544 end 49546 depth 1
> ext4_ext_remove_space: dev 254,16 ino 12 since 49605 end 49607 depth 1
> ...
> 
> 4. Then the extent tree of this file after punching is like
> 
> ...
> 49507:[0]37:158047
> 49547:[0]58:158087
> ...
> 
> 5. Detailed procedure of punching hole [49544, 49546]
> 
> 5.1. The block address space:
> ```
> lblk        ~49505  49506   49507~49543     49544~49546    49547~
> 	  ---------+------+-------------+----------------+--------
> 	    extent | hole |   extent	|	hole	 | extent
> 	  ---------+------+-------------+----------------+--------
> pblk       ~158045  158046  158047~158083  158084~158086   158087~
> ```
> 
> 5.2. The detailed layout of cluster 39521:
> ```
> 		cluster 39521
> 	<------------------------------->
> 
> 		hole		  extent
> 	<----------------------><--------
> 
> lblk      49544   49545   49546   49547
> 	+-------+-------+-------+-------+
> 	|	|	|	|	|
> 	+-------+-------+-------+-------+
> pblk     158084  1580845  158086  158087
> ```
> 
> 5.3. The ftrace output when punching hole [49544, 49546]:
> - ext4_ext_remove_space (start 49544, end 49546)
>   - ext4_ext_rm_leaf (start 49544, end 49546, last_extent [49507(158047), 40], partial [pclu 39522 lblk 0 state 2])
>     - ext4_remove_blocks (extent [49507(158047), 40], from 49544 to 49546, partial [pclu 39522 lblk 0 state 2]
>       - ext4_free_blocks: (block 158084 count 4)
>         - ext4_mballoc_free (extent 1/6753/1)
> 
> In this case, the whole cluster 39521 is freed mistakenly when freeing
> pblock 158084~158086 (i.e., the first three blocks of this cluster),
> although pblock 158087 (the last remaining block of this cluster) has
> not been freed yet.
> 
> The root cause of this isuue is that, the pclu of the partial cluster is
> calculated mistakenly in ext4_ext_remove_space(). The correct
> partial_cluster.pclu (i.e., the cluster number of the first block in the
> next extent, that is, lblock 49597 (pblock 158086)) should be 39521 rather
> than 39522.
> 
> Fixes: f4226d9ea400 ("ext4: fix partial cluster initialization")
> Signed-off-by: Jeffle Xu <jefflexu@...ux.alibaba.com>
> ---
>  fs/ext4/extents.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index f2b577b..cb74496 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -2828,7 +2828,7 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start,
>  			 * in use to avoid freeing it when removing blocks.
>  			 */
>  			if (sbi->s_cluster_ratio > 1) {
> -				pblk = ext4_ext_pblock(ex) + end - ee_block + 2;
> +				pblk = ext4_ext_pblock(ex) + end - ee_block + 1;
>  				partial.pclu = EXT4_B2C(sbi, pblk);
>  				partial.state = nofree;
>  			}
> -- 
> 1.8.3.1
> 

Hi, Jeffle:

Thanks for the report and the patch.  At first glance I suspect the "2" is
simply a bug; logically we're just looking for the first block after the
extent split to set the partial cluster, as you suggest.  I'll post the
results of my review once I've had a chance to refresh my memory of the
code and run some more tests.

Thanks,
Eric

Powered by blists - more mailing lists