lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.00.1007281108160.3033@localhost>
Date:	Wed, 28 Jul 2010 11:13:40 +0200 (CEST)
From:	Lukas Czerner <lczerner@...hat.com>
To:	Jan Kara <jack@...e.cz>
cc:	Lukas Czerner <lczerner@...hat.com>, linux-ext4@...r.kernel.org,
	jmoyer@...hat.com, rwheeler@...hat.com, eshishki@...hat.com,
	sandeen@...hat.com, tytso@....edu
Subject: Re: [PATCH 2/3] Add batched discard support for ext3

On Tue, 27 Jul 2010, Jan Kara wrote:

> On Tue 27-07-10 14:41:53, Lukas Czerner wrote:
> > Walk through each allocation group and trim all free extents. It can be
> > invoked through TRIM ioctl on the file system. The main idea is to
> > provide a way to trim the whole file system if needed, since some SSD's
> > may suffer from performance loss after the whole device was filled (it
> > does not mean that fs is full!).
> > 
> > It search for free extents in each allocation group. When the free
> > extent is found, blocks are marked as used and then trimmed. Afterwards
> > these blocks are marked as free in per-group bitmap.
>   Thanks for the patch. It looks much better than the previous one. Some
> comments below.
> 
> > Signed-off-by: Lukas Czerner <lczerner@...hat.com>
> > ---
> >  fs/ext3/balloc.c        |  229 +++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/ext3/super.c         |    1 +
> >  include/linux/ext3_fs.h |    1 +
> >  3 files changed, 231 insertions(+), 0 deletions(-)
> > 
> > diff --git a/fs/ext3/balloc.c b/fs/ext3/balloc.c
> > index a177122..97c6867 100644
> > --- a/fs/ext3/balloc.c
> > +++ b/fs/ext3/balloc.c
> > @@ -20,6 +20,7 @@
> >  #include <linux/ext3_jbd.h>
> >  #include <linux/quotaops.h>
> >  #include <linux/buffer_head.h>
> > +#include <linux/blkdev.h>
> >  
> >  /*
> >   * balloc.c contains the blocks allocation and deallocation routines
> > @@ -1876,3 +1877,231 @@ unsigned long ext3_bg_num_gdb(struct super_block *sb, int group)
> >  	return ext3_bg_num_gdb_meta(sb,group);
> >  
> >  }
> > +
> > +/**
> > + * ext3_trim_all_free -- function to trim all free space in alloc. group
> > + * @sb:			super block for file system
> > + * @group:		allocation group to trim
> > + * @gdp:		allocation group description structure
> > + * @minblocks:		minimum extent block count
> > + *
> > + * ext3_trim_all_free walks through group's block bitmap searching for free
> > + * blocks. When the free block is found, it tries to allocate this block and
> > + * consequent free block to get the biggest free extent possible, until it
> > + * reaches any used block. Then issue a TRIM command on this extent and free
> > + * the extent in the block bitmap. This is done until whole group is scanned.
> > + */
> > +ext3_grpblk_t ext3_trim_all_free(struct super_block *sb,
> > +			unsigned int group, ext3_grpblk_t minblocks)
> > +{
> > +	handle_t *handle;
> > +	ext3_grpblk_t max = EXT3_BLOCKS_PER_GROUP(sb);
> > +	ext3_grpblk_t next, count = 0, start, bit;
> > +	struct ext3_sb_info *sbi;
> > +	ext3_fsblk_t discard_block;
> > +	struct buffer_head *bitmap_bh = NULL;
> > +	struct buffer_head *gdp_bh;
> > +	ext3_grpblk_t free_blocks;
> > +	struct ext3_group_desc *gdp;
> > +	int err = 0, ret;
> > +	ext3_grpblk_t freed;
> > +
> > +	/**
> > +	 * We will update one block bitmap, and one group descriptor
> > +	 */
> > +	handle = ext3_journal_start_sb(sb, 2);
> > +	if (IS_ERR(handle)) {
> > +		err = PTR_ERR(handle);
> > +		ext3_warning(sb, __func__, "error %d on journal start", err);
>   I think there's no need to issue another warning here... When
> journal_start fails there should be notices about this in the log already.

That's right, thanks.

> 
> > +		return err;
> > +	}
> > +
> > +	bitmap_bh = read_block_bitmap(sb, group);
> > +	if (!bitmap_bh)
> > +		goto err_out;
> > +
> > +	BUFFER_TRACE(bitmap_bh, "getting undo access");
> > +	err = ext3_journal_get_undo_access(handle, bitmap_bh);
> > +	if (err)
> > +		goto err_out;
> > +
> > +	gdp = ext3_get_group_desc(sb, group, &gdp_bh);
> > +	if (!gdp)
> > +		goto err_out;
> > +
> > +	BUFFER_TRACE(gd_bh, "get_write_access");
> > +	err = ext3_journal_get_write_access(handle, gdp_bh);
> > +	if (err)
> > +		goto err_out;
> > +
> > +	free_blocks = le16_to_cpu(gdp->bg_free_blocks_count);
> > +	sbi = EXT3_SB(sb);
> > +
> > +	 /* Walk through the whole group */
> > +	start = 0;
> > +	while (start < max) {
> > +
> > +		start = bitmap_search_next_usable_block(start, bitmap_bh, max);
> > +		if (start < 0)
> > +			break;
> > +		next = start;
> > +
> > +		/**
>                 ^^^ Use just /*. /** is reserved for comments processed by
>   kernel documentation generator and they have to have special format...

Ok.

> 
> > +		 * Allocate contiguous free extents by setting bits in the
> > +		 * block bitmap
> > +		 */
> > +		while (next < max
> > +			&& claim_block(sb_bgl_lock(sbi, group),
> > +					next, bitmap_bh)) {
> > +			next++;
> > +		}
> > +
> > +		 /* We did not claimed any blocks */
>                                ^^^^ claim

Oh, that is my astonishing English skills ;). Thanks.

> 
> > +		if (next == start)
> > +			continue;
> > +
> > +		discard_block = (ext3_fsblk_t)start +
> > +				ext3_group_first_block_no(sb, group);
> > +
> > +		/* Update counters */
> > +		spin_lock(sb_bgl_lock(sbi, group));
> > +		le16_add_cpu(&gdp->bg_free_blocks_count, start - next);
> > +		spin_unlock(sb_bgl_lock(sbi, group));
> > +		percpu_counter_sub(&sbi->s_freeblocks_counter, next - start);
> > +
> > +		/* Do not issue a TRIM on extents smaller than minblocks */
> > +		if ((next - start) < minblocks)
> > +			goto free_extent;
> > +
> > +		 /* Send the TRIM command down to the device */
> > +		sb_issue_discard(sb, discard_block, next - start);
> > +		count += (next - start);
> > +		cond_resched();
>   Maybe you could cond_resched() after freeing the blocks instead of here?
> So that we don't sleep with blocks unnecessarily allocated?

Of course, that would be better.

> 
> > +
> > +free_extent:
> > +
> > +		freed = 0;
> > +		jbd_lock_bh_state(bitmap_bh);
> > +
> > +		for (bit = start; bit < next; bit++) {
> > +
> > +			/**
> > +			 * @@@ This prevents newly-allocated data from being
> > +			 * freed and then reallocated within the same
> > +			 * transaction.
> > +			 */
> > +			BUFFER_TRACE(bitmap_bh, "set in b_committed_data");
> > +			J_ASSERT_BH(bitmap_bh,
> > +				bh2jh(bitmap_bh)->b_committed_data != NULL);
> > +			ext3_set_bit_atomic(sb_bgl_lock(sbi, group), bit,
> > +				bh2jh(bitmap_bh)->b_committed_data);
>   You don't have to do this. Since you didn't really allocate the blocks
> (you are actually never going to commit bitmap changes to the journal)
> blocks can be happily reallocated just after you clear the bits in the
> bitmap. There's no problem with that. Also you don't have to hold bh_state
> lock at all.

Thanks.

> 
> > +
> > +			/**
> > +			 * We clear the bit in the bitmap after setting the
> > +			 * committed data bit, because this is the reverse
> > +			 * order to that which the allocator uses.
> > +			 */
> > +			BUFFER_TRACE(bitmap_bh, "clear bit");
> > +			if (!ext3_clear_bit_atomic(sb_bgl_lock(sbi, group),
> > +						bit, bitmap_bh->b_data)) {
> > +				jbd_unlock_bh_state(bitmap_bh);
> > +				ext3_error(sb, __func__,
> > +					"bit already cleared for block "E3FSBLK,
> > +					 (unsigned long)bit);
> > +				jbd_lock_bh_state(bitmap_bh);
> > +				BUFFER_TRACE(bitmap_bh, "bit already cleared");
> > +			} else {
> > +				freed++;
> > +			}
> > +		}
> > +
> > +		jbd_unlock_bh_state(bitmap_bh);
> > +
> > +		/* Update couters */
> > +		spin_lock(sb_bgl_lock(sbi, group));
> > +		le16_add_cpu(&gdp->bg_free_blocks_count, freed);
> > +		spin_unlock(sb_bgl_lock(sbi, group));
> > +		percpu_counter_add(&sbi->s_freeblocks_counter, next - start);
> > +
> > +		start = next;
> > +
> > +		if (signal_pending(current)) {
> > +			count = -ERESTARTSYS;
> > +			break;
> > +		}
> > +
> > +		/* No more suitable extents */
> > +		if ((free_blocks - count) < minblocks)
> > +			break;
> > +	}
> > +
> > +	/* We dirtied the bitmap block */
> > +	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
> > +	err = ext3_journal_dirty_metadata(handle, bitmap_bh);
> > +
> > +	/* And the group descriptor block */
> > +	BUFFER_TRACE(gdp_bh, "dirtied group descriptor block");
> > +	ret = ext3_journal_dirty_metadata(handle, gdp_bh);
> > +	if (!err)
> > +		err = ret;
> > +
> > +	ext3_debug("trimmed %d blocks in the group %d\n",
> > +		count, group);
> > +
> > +err_out:
> > +	if (err) {
> > +		ext3_std_error(sb, err);
> > +		count = -err;
> > +	}
> > +
> > +	ext3_journal_stop(handle);
> > +	brelse(bitmap_bh);
> > +
> > +	return count;
> > +}
>   Otherwise the patch looks OK.

Great, thanks for looking at this. I will resend the patch shortly.

> 
> 								Honza
> 

-Lukas

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ