lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Fri, 9 Sep 2022 12:07:50 +0200
From:   Jan Kara <jack@...e.cz>
To:     Zhihao Cheng <chengzhihao1@...wei.com>
Cc:     jack@...e.cz, tytso@....edu, adilger.kernel@...ger.ca,
        linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org,
        yukuai3@...wei.com
Subject: Re: [PATCH] ext4: Fix dir corruption when ext4_dx_add_entry() fails

On Fri 09-09-22 14:27:36, Zhihao Cheng wrote:
> Following process may lead to fs corruption:
> 1. ext4_create(dir/foo)
>  ext4_add_nondir
>   ext4_add_entry
>    ext4_dx_add_entry
>      a. add_dirent_to_buf
>       ext4_mark_inode_dirty
>       ext4_handle_dirty_metadata   // dir inode bh is recorded into journal
>      b. ext4_append    // dx_get_count(entries) == dx_get_limit(entries)
>        ext4_bread(EXT4_GET_BLOCKS_CREATE)
>         ext4_getblk
>          ext4_map_blocks
>           ext4_ext_map_blocks
>             ext4_mb_new_blocks
>              dquot_alloc_block
>               dquot_alloc_space_nodirty
>                inode_add_bytes    // update dir's i_blocks
>             ext4_ext_insert_extent
> 	     ext4_ext_dirty  // record extent bh into journal
>               ext4_handle_dirty_metadata(bh)
> 	      // record new block into journal
>        inode->i_size += inode->i_sb->s_blocksize   // new size(in mem)
>      c. ext4_handle_dirty_dx_node(bh2)
> 	// record dir's new block(dx_node) into journal
>      d. ext4_handle_dirty_dx_node((frame - 1)->bh)
>      e. ext4_handle_dirty_dx_node(frame->bh)
>      f. do_split    // ret err!
>      g. add_dirent_to_buf
> 	 ext4_mark_inode_dirty(dir)  // update raw_inode on disk(skipped)
> 2. fsck -a /dev/sdb
>  drop last block(dx_node) which beyonds dir's i_size.
>   /dev/sdb: recovering journal
>   /dev/sdb contains a file system with errors, check forced.
>   /dev/sdb: Inode 12, end of extent exceeds allowed value
> 	(logical block 128, physical block 3938, len 1)
> 3. fsck -fn /dev/sdb
>  dx_node->entry[i].blk > dir->i_size
>   Pass 2: Checking directory structure
>   Problem in HTREE directory inode 12 (/dir): bad block number 128.
>   Clear HTree index? no
>   Problem in HTREE directory inode 12: block #3 has invalid depth (2)
>   Problem in HTREE directory inode 12: block #3 has bad max hash
>   Problem in HTREE directory inode 12: block #3 not referenced
> 
> Just like make_indexed_dir() does, update dir inode if error occurs.
> Fetch a reproducer in [Link].
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216466
> CC: stable@...r.kernel.org
> Signed-off-by: Zhihao Cheng <chengzhihao1@...wei.com>

Thanks for the analysis and the fix! In principle it looks fine but overall
we seem to be defering directory dirtying too much in ext4 directory
handling code and as a result things are too fragile as you've noticed. By
looking through fs/ext4/namei.c I've found several more places that have
exactly the same problem as you're fixing here. I think that specifically
for these problems with ext4_append() the best solution is to mark inode
dirty directly inside ext4_append(). Sure it will result in more copying of
inode data into the journal but growing directory size is not that
performance critical operation so I think the code simplicity is worth the
extra CPU cycles.

								Honza

> ---
>  fs/ext4/namei.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 3a31b662f661..f04871fa4ead 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -2617,6 +2617,13 @@ static int ext4_dx_add_entry(handle_t *handle, struct ext4_filename *fname,
>  	 */
>  	if (restart && err == 0)
>  		goto again;
> +	/*
> +	 * Even if the dx_add_entry failed, we have to properly write
> +	 * out all the changes we did so far. Otherwise we can end up
> +	 * with corrupted filesystem.
> +	 */
> +	if (err)
> +		ext4_mark_inode_dirty(handle, dir);
>  	return err;
>  }
>  
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ