linux-ext4 - Re: [PATCH 1/2 v2] e2fsck: Discard only unused parts of inode table

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.00.1202290812530.4926@dhcp-27-109.brq.redhat.com>
Date:	Wed, 29 Feb 2012 08:22:42 +0100 (CET)
From:	Lukas Czerner <lczerner@...hat.com>
To:	Eric Sandeen <sandeen@...deen.net>
cc:	Lukas Czerner <lczerner@...hat.com>, linux-ext4@...r.kernel.org,
	tytso@....edu, psusi@...ntu.com
Subject: Re: [PATCH 1/2 v2] e2fsck: Discard only unused parts of inode
 table

On Tue, 28 Feb 2012, Eric Sandeen wrote:

> On 02/23/2012 10:29 AM, Lukas Czerner wrote:
> > When calling e2fsck with '-E discard' option it might happen that
> > valid inodes are discarded accidentally. This is because we just
> > discard the part of inode table which lies past the highest used
> > inode. This is terribly wrong (sorry!).
> > 
> > This patch fixes it so only the free parts of an inode table
> > is discarded, leaving used inodes intact. This was tested with highly
> > fragmented inode tables with block size 4k and 1k.
> > 
> > Signed-off-by: Lukas Czerner <lczerner@...hat.com>
> > Reported-by: Phillip Susi <psusi@...ntu.com>
> 
> Apologies for being so iterative about this review but it's taken a while
> for this function to make sense to me.
> 
> This patch does seem to fix the original bug, but I have a few more
> nitpicks below.
> 
> > ---
> > v2: reworked so that we comply with inode number counting and adjust
> >     start in the e2fsck_discard_inodes(). Add some comments
> 
> >  e2fsck/pass5.c |   82 ++++++++++++++++++++++++++++++++++++++++---------------
> >  1 files changed, 59 insertions(+), 23 deletions(-)
> > 
> > diff --git a/e2fsck/pass5.c b/e2fsck/pass5.c
> > index 1e836e3..57a207d 100644
> > --- a/e2fsck/pass5.c
> > +++ b/e2fsck/pass5.c
> > @@ -94,6 +94,43 @@ static void e2fsck_discard_blocks(e2fsck_t ctx, io_manager manager,
> >  		ctx->options &= ~E2F_OPT_DISCARD;
> >  }
> >  
> > +/*
> > + * This will try to discard number 'count' starting at inode
> > + * number 'start'. Note that 'start' is a inode number and hence
> > + * it is counted from 1, it means that we need to adjust it
> > + * by -1 in this function to compute right offset in the
> > + * inode table.
> > + */
> 
> But that's not quite right... a casual reader would assume that we
> pass in the inode number of the starting inode we want to free;
> that's true for the first group, but not the rest:

Well, I thought that this should be obvious since we pass in 'group'
number as well. So what do you think the right comment should be ?

> 
> $ e2fsck/e2fsck -f -E discard fsfile 
> e2fsck 1.42 (29-Nov-2011)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> discard starting at 12
> discard starting at 1
> discard starting at 1
> discard starting at 1
> discard starting at 1
> discard starting at 1
> 
> ...
> 
> we are not discarding inode 1 that many times ;) 
> 
> The comment should reflect that the function expects to receive
> the inode number within this group, i.e. the Nth inode, starting
> with the 1st, i.e. 1-based.  (which is a little weird, but I don't
> see a better way).
> 
> The calling loop _does_ keep track of the actual inode number
> within the system ('i') - but that's not what's passed in here,
> and I don't think ctx->fs has the info we need to be able to
> do it that way.
> 
> > +static void e2fsck_discard_inodes(e2fsck_t ctx, int group,
> > +				  int start, int count)
> > +{
> > +	ext2_filsys fs = ctx->fs;
> > +	blk64_t blk, num;
> > +	int orig = count;
> > +
> > +	if ((ctx->options & E2F_OPT_NO) || !(ctx->options & E2F_OPT_DISCARD))
> > +		return;
> > +
> > +	/*
> > +	 * Start is inode number which starts counting from 1,
> 
> ... inode number within the group ...
> 
> > +	 * so we need to adjust it.
> > +	 */
> > +	start -= 1;
> 
> I wonder if we need defense against programming errors here; if start
> inadvertently comes in at 0, what happens?  I suppose blk goes off the end
> and other checks hopefully catch it... probably not 'til it hits the kernel
> though?

Then blk overflows and would point the start to something like 524287TB,
but I think that we can defense it here.

> 
> > +	/*
> > +	 * We can discard only blocks containing only unused
> > +	 * inodes in the table.
> > +	 */
> > +	blk = DIV_ROUND_UP(start,
> > +		EXT2_INODES_PER_BLOCK(fs->super));
> > +	count -= (blk * EXT2_INODES_PER_BLOCK(fs->super) - start);
> > +	blk += ext2fs_inode_table_loc(fs, group);
> > +	num = count / EXT2_INODES_PER_BLOCK(fs->super);
> > +
> > +	if (num > 0)
> > +		e2fsck_discard_blocks(ctx, fs->io->manager, blk, num);
> > +}
> > +
> >  #define NO_BLK ((blk64_t) -1)
> >  
> >  static void print_bitmap_problem(e2fsck_t ctx, int problem,
> > @@ -435,6 +472,7 @@ static void check_inode_bitmaps(e2fsck_t ctx)
> >  	int		skip_group = 0;
> >  	int		redo_flag = 0;
> >  	io_manager	manager = ctx->fs->io->manager;
> > +	unsigned long long	first_free = fs->super->s_inodes_per_group + 1;
> 
> >  
> >  	clear_problem_context(&pctx);
> >  	free_array = (int *) e2fsck_allocate_memory(ctx,
> > @@ -497,6 +535,7 @@ redo_counts:
> >  				 * are 0, count the free inode,
> >  				 * skip the current block group.
> >  				 */
> > +				first_free = 1;
> >  				inodes = fs->super->s_inodes_per_group - 1;
> >  				group_free = inodes;
> >  				free_inodes += inodes;
> > @@ -561,50 +600,47 @@ redo_counts:
> >  		ctx->options &= ~E2F_OPT_DISCARD;
> >  
> >  do_counts:
> > +		inodes++;
> >  		if (bitmap) {
> >  			if (ext2fs_test_inode_bitmap2(ctx->inode_dir_map, i))
> >  				dirs_count++;
> > +			if (inodes > first_free) {
> > +				e2fsck_discard_inodes(ctx, group, first_free,
> > +						      inodes - first_free);
> > +				first_free = fs->super->s_inodes_per_group + 1;
> > +			}
> >  		} else if (!skip_group || csum_flag) {
> >  			group_free++;
> >  			free_inodes++;
> > +			if (first_free > inodes)
> > +				first_free = inodes;
> >  		}
> >  
> > -		inodes++;
> >  		if ((inodes == fs->super->s_inodes_per_group) ||
> >  		    (i == fs->super->s_inodes_count)) {
> > -
> > -			free_array[group] = group_free;
> > -			dir_array[group] = dirs_count;
> > -
> > -			/* Discard inode table */
> > -			if (ctx->options & E2F_OPT_DISCARD) {
> > -				blk64_t used_blks, blk, num;
> > -
> > -				used_blks = DIV_ROUND_UP(
> > -					(EXT2_INODES_PER_GROUP(fs->super) -
> > -					group_free),
> > -					EXT2_INODES_PER_BLOCK(fs->super));
> > -
> > -				blk = ext2fs_inode_table_loc(fs, group) +
> > -				      used_blks;
> > -				num = fs->inode_blocks_per_group -
> > -				      used_blks;
> > -				e2fsck_discard_blocks(ctx, manager, blk, num);
> > -			}
> > -
> > +			/*
> > +			 * If the last inode is free, we can discard it as well.
> > +			 */
> > +			if (inodes >= first_free)
> > +				e2fsck_discard_inodes(ctx, group, first_free,
> > +						      inodes - first_free + 1);
> 
> This is unrelated to the changelog
> 
> >  			/*
> >  			 * If discard zeroes data and the group inode table
> >  			 * was not zeroed yet, set itable as zeroed
> >  			 */
> >  			if ((ctx->options & E2F_OPT_DISCARD) &&
> > -			    (io_channel_discard_zeroes_data(fs->io)) &&
> > +			    !(ctx->options & E2F_OPT_NO) &&
> 
> This is unrelated as well...

So do you want me to separate those unrelated changes to a separate
patch ? Those are pretty small changes..

> 
> Rather than continuing to check _OPT_NO here and in e2fsck_discard_blocks(), it
> might be better (in a separate patch) ;) to simply turn off OPT_DISCARD right
> after options processing if -n was specified, and not worry about it down here.

Makes sense, I can change it.

Thanks!
-Lukas

> 
> > +			    io_channel_discard_zeroes_data(fs->io) &&
> >  			    !(ext2fs_bg_flags_test(fs, group,
> > -						  EXT2_BG_INODE_ZEROED))) {
> > +						   EXT2_BG_INODE_ZEROED))) {
> >  				ext2fs_bg_flags_set(fs, group,
> >  						    EXT2_BG_INODE_ZEROED);
> >  				ext2fs_group_desc_csum_set(fs, group);
> >  			}
> >  
> > +			first_free = fs->super->s_inodes_per_group + 1;
> > +			free_array[group] = group_free;
> > +			dir_array[group] = dirs_count;
> >  			group ++;
> >  			inodes = 0;
> >  			skip_group = 0;
> 
> 

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html