lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 19 Aug 2022 04:45:41 +0530
From:   "Ritesh Harjani (IBM)" <ritesh.list@...il.com>
To:     Jan Kara <jack@...e.cz>
Cc:     Baokun Li <libaokun1@...wei.com>, linux-ext4@...r.kernel.org,
        tytso@....edu, adilger.kernel@...ger.ca, lczerner@...hat.com,
        enwlinux@...il.com, linux-kernel@...r.kernel.org,
        yi.zhang@...wei.com, yebin10@...wei.com, yukuai3@...wei.com
Subject: Re: [PATCH 2/2] ext4: add inode table check in __ext4_get_inode_loc
 to aovid possible infinite loop

On 22/08/18 07:23PM, Jan Kara wrote:
> On Thu 18-08-22 20:13:53, Ritesh Harjani (IBM) wrote:
> > On 22/08/17 04:31PM, Jan Kara wrote:
> > > On Wed 17-08-22 21:27:01, Baokun Li wrote:
> > > > In do_writepages, if the value returned by ext4_writepages is "-ENOMEM"
> > > > and "wbc->sync_mode == WB_SYNC_ALL", retry until the condition is not met.
> > > >
> > > > In __ext4_get_inode_loc, if the bh returned by sb_getblk is NULL,
> > > > the function returns -ENOMEM.
> > > >
> > > > In __getblk_slow, if the return value of grow_buffers is less than 0,
> > > > the function returns NULL.
> > > >
> > > > When the three processes are connected in series like the following stack,
> > > > an infinite loop may occur:
> > > >
> > > > do_writepages					<--- keep retrying
> > > >  ext4_writepages
> > > >   mpage_map_and_submit_extent
> > > >    mpage_map_one_extent
> > > >     ext4_map_blocks
> > > >      ext4_ext_map_blocks
> > > >       ext4_ext_handle_unwritten_extents
> > > >        ext4_ext_convert_to_initialized
> > > >         ext4_split_extent
> > > >          ext4_split_extent_at
> > > >           __ext4_ext_dirty
> > > >            __ext4_mark_inode_dirty
> > > >             ext4_reserve_inode_write
> > > >              ext4_get_inode_loc
> > > >               __ext4_get_inode_loc		<--- return -ENOMEM
> > > >                sb_getblk
> > > >                 __getblk_gfp
> > > >                  __getblk_slow			<--- return NULL
> > > >                   grow_buffers
> > > >                    grow_dev_page		<--- return -ENXIO
> > > >                     ret = (block < end_block) ? 1 : -ENXIO;
> > > >
> > > > In this issue, bg_inode_table_hi is overwritten as an incorrect value.
> > > > As a result, `block < end_block` cannot be met in grow_dev_page.
> > > > Therefore, __ext4_get_inode_loc always returns '-ENOMEM' and do_writepages
> > > > keeps retrying. As a result, the writeback process is in the D state due
> > > > to an infinite loop.
> > > >
> > > > Add a check on inode table block in the __ext4_get_inode_loc function by
> > > > referring to ext4_read_inode_bitmap to avoid this infinite loop.
> > > >
> > > > Signed-off-by: Baokun Li <libaokun1@...wei.com>
> > >
> > > Thanks for the fixes. Normally, we check that inode table is fine in
> > > ext4_check_descriptors() (and those checks are much stricter) so it seems
> > > unnecessary to check it again here. I understand that in your case it was
> > > resize that corrupted the group descriptor after the filesystem was mounted
> > > which is nasty but there's much more metadata that can be corrupted like
> > > this and it's infeasible to check each metadata block before we use it.
> > >
> > > IMHO a proper fix to this class of issues would be for sb_getblk() to
> > > return proper error so that we can distinguish ENOMEM from other errors.
> > > But that will be a larger undertaking...
> > >
> >
> > Hi Jan,
> >
> > How about adding a wrapper around sb_getblk() which will do the basic block
> > bound checks for ext4. Then we can carefully convert all the callers of
> > sb_getblk() in ext4 to call ext4_sb_getblk().
> >
> > ext4_sb_getblk() will then return either of below -
> > 1. ERR_PTR(-EFSCORRUPTED)
> > 2. NULL
> > 3. struct buffer_head*
> >
> > It's caller can then implement the proper error handling.
> >
> > Folding a small patch to implement the simple bound check. Is this the right
> > approach?
>
> Yep, looks sensible to me. Maybe I'd just make ext4_sb_getblk() return bh
> or ERR_PTR so something like ERR_PTR(-EFSCORRUPTED), ERR_PTR(-ENXIO), or bh
> pointer.

Sure, Thanks Jan. Will do that once I clear some confusion w.r.t
	"start_blk <= le32_to_cpu(sbi->s_es->s_first_data_block)"

At some places this is checked with "<= s_first_data_block"
	e.g. fs/ext4/ialloc.c, ext4_sb_block_valid()

while at some places I see it to be "< s_first_data_block"
	e.g. fs/ext4/mballoc.c, fs/ext4/mmp.c

Will spend sometime to understand why the difference and if there is anything
I might miss here for off-by-one check.

Adding more to the confusion would be difference w.r.t blocksize = 1024 v/s
other blocksizes. Based on the blocksize value I guess, s_first_data_block can
be different (0/1??). Or can bigalloc can change this...
...Will look more into this.

-ritesh

>
> 								Honza
>
> >
> > From: "Ritesh Harjani (IBM)" <ritesh.list@...il.com>
> > Date: Thu, 18 Aug 2022 07:53:58 -0500
> > Subject: [RFC] ext4: Add ext4_sb_getblk() wrapper for block bounds checking
> >
> > We might need more bounds checking on the block before calling sb_getblk().
> > This helper does that and if it is not valid then returns ERR_PTR(-EFSCORRUPTED)
> > Later we will need to carefully convert the callers to use ext4_sb_getblk()
> > instead of sb_getblk().
> > For e.g. __ext4_get_inode_loc()
> >
> > Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@...il.com>
> > ---
> >  fs/ext4/block_validity.c |  4 +---
> >  fs/ext4/ext4.h           | 12 ++++++++++++
> >  fs/ext4/super.c          |  9 +++++++++
> >  3 files changed, 22 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/ext4/block_validity.c b/fs/ext4/block_validity.c
> > index 5504f72bbbbe..69347fab7c38 100644
> > --- a/fs/ext4/block_validity.c
> > +++ b/fs/ext4/block_validity.c
> > @@ -301,9 +301,7 @@ int ext4_sb_block_valid(struct super_block *sb, struct inode *inode,
> >  	struct rb_node *n;
> >  	int ret = 1;
> >
> > -	if ((start_blk <= le32_to_cpu(sbi->s_es->s_first_data_block)) ||
> > -	    (start_blk + count < start_blk) ||
> > -	    (start_blk + count > ext4_blocks_count(sbi->s_es)))
> > +	if (!ext4_sb_block_valid_fastcheck(sbi->s_es, start_blk, count))
> >  		return 0;
> >
> >  	/*
> > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > index 9bca5565547b..79d0e45185d3 100644
> > --- a/fs/ext4/ext4.h
> > +++ b/fs/ext4/ext4.h
> > @@ -3072,6 +3072,8 @@ extern struct buffer_head *ext4_sb_bread(struct super_block *sb,
> >  					 sector_t block, blk_opf_t op_flags);
> >  extern struct buffer_head *ext4_sb_bread_unmovable(struct super_block *sb,
> >  						   sector_t block);
> > +extern struct buffer_head *ext4_sb_getblk(struct super_block *sb,
> > +					  sector_t block);
> >  extern void ext4_read_bh_nowait(struct buffer_head *bh, blk_opf_t op_flags,
> >  				bh_end_io_t *end_io);
> >  extern int ext4_read_bh(struct buffer_head *bh, blk_opf_t op_flags,
> > @@ -3358,6 +3360,16 @@ static inline unsigned int ext4_flex_bg_size(struct ext4_sb_info *sbi)
> >  	return 1 << sbi->s_log_groups_per_flex;
> >  }
> >
> > +static inline bool ext4_sb_block_valid_fastcheck(struct ext4_super_block *es,
> > +					sector_t start_blk, unsigned int count)
> > +{
> > +	if ((start_blk <= le32_to_cpu(es->s_first_data_block)) ||
> > +	    (start_blk + count < start_blk) ||
> > +	    (start_blk + count > ext4_blocks_count(es)))
> > +		return false;
> > +	return true;
> > +}
> > +
> >  #define ext4_std_error(sb, errno)				\
> >  do {								\
> >  	if ((errno))						\
> > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > index 9a66abcca1a8..5b29272ad9a9 100644
> > --- a/fs/ext4/super.c
> > +++ b/fs/ext4/super.c
> > @@ -269,6 +269,15 @@ void ext4_sb_breadahead_unmovable(struct super_block *sb, sector_t block)
> >  	}
> >  }
> >
> > +struct buffer_head *ext4_sb_getblk(struct super_block *sb, sector_t block)
> > +{
> > +	struct ext4_super_block *es = EXT4_SB(sb)->s_es;
> > +
> > +	if (!ext4_sb_block_valid_fastcheck(es, block, 1))
> > +		return ERR_PTR(-EFSCORRUPTED);
> > +	return sb_getblk(sb, block);
> > +}
> > +
> >  static int ext4_verify_csum_type(struct super_block *sb,
> >  				 struct ext4_super_block *es)
> >  {
> > --
> > 2.25.1
> >
> > -ritesh
> >
> >
> >
> > > 								Honza
> > >
> > > > ---
> > > >  fs/ext4/inode.c | 10 +++++++++-
> > > >  1 file changed, 9 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > > > index 601214453c3a..5e171879fa23 100644
> > > > --- a/fs/ext4/inode.c
> > > > +++ b/fs/ext4/inode.c
> > > > @@ -4466,9 +4466,17 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
> > > >  	inodes_per_block = EXT4_SB(sb)->s_inodes_per_block;
> > > >  	inode_offset = ((ino - 1) %
> > > >  			EXT4_INODES_PER_GROUP(sb));
> > > > -	block = ext4_inode_table(sb, gdp) + (inode_offset / inodes_per_block);
> > > >  	iloc->offset = (inode_offset % inodes_per_block) * EXT4_INODE_SIZE(sb);
> > > >
> > > > +	block = ext4_inode_table(sb, gdp);
> > > > +	if ((block <= le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block)) ||
> > > > +	    (block >= ext4_blocks_count(EXT4_SB(sb)->s_es))) {
> > > > +		ext4_error(sb, "Invalid inode table block %llu in "
> > > > +			   "block_group %u", block, iloc->block_group);
> > > > +		return -EFSCORRUPTED;
> > > > +	}
> > > > +	block += (inode_offset / inodes_per_block);
> > > > +
> > > >  	bh = sb_getblk(sb, block);
> > > >  	if (unlikely(!bh))
> > > >  		return -ENOMEM;
> > > > --
> > > > 2.31.1
> > > >
> > > --
> > > Jan Kara <jack@...e.com>
> > > SUSE Labs, CR
> --
> Jan Kara <jack@...e.com>
> SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ