lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 30 Jul 2008 05:36:23 -0600
From:	Andreas Dilger <adilger@....com>
To:	Mingming Cao <cmm@...ibm.com>
Cc:	tytso <tytso@....edu>, Shehjar Tikoo <shehjart@....unsw.edu.au>,
	linux-ext4@...r.kernel.org,
	"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>
Subject: Re: [PATCH v3]Ext4: journal credits reservation fixes for DIO,
	fallocate and delalloc writepages

On Jul 29, 2008  18:58 -0700, Mingming Cao wrote:
> + * For inserting a single extent, in the worse case extent tree depth is 5
> + * for old tree and new tree, for every level we need to reserve
> + * credits to log the bitmap and block group descriptors
> + *
> + * credit needed for the update of super block + inode block + quota files
> + * are not included here. The caller of this function need to take care of this.
>   */
> -int ext4_ext_calc_credits_for_insert(struct inode *inode,
> +int ext4_ext_calc_credits_for_single_extent(struct inode *inode,
>  						struct ext4_ext_path *path)
>  {
>  	int depth, needed;
>  
> +	depth = ext_depth(inode);
> +
>  	if (path) {
>  		/* probably there is space in leaf? */
>  		if (le16_to_cpu(path[depth].p_hdr->eh_entries)
>  				< le16_to_cpu(path[depth].p_hdr->eh_max))

Please fix code style here - '<' at end of previous line, align with "if (".

> +			/* 1 for block bitmap, 1 for group descriptor */
> +			return 2;
>  	}
>  
> +	/* add one more level in case of tree increase when insert a extent */
> +	depth += 1;

Shouldn't this only be if depth < 5?

>	/*
>	 * tree can be full, so it would need to grow in depth:
>	 * we need one credit to modify old root, credits for
>	 * new root will be added in split accounting
>	 */
>	needed += 1;

Similarly, this should only be if depth < 5?  We should have a

/*
 * given 32-bit logical block (4294967296 blocks), max. tree
 * can be 4 levels in depth -- 4 * 340^4 == 53453440000.
 * Let's also add one more level for imbalance.
 */
#define EXT4_EXT_MAX_DEPTH 5

>	/*
>	 * Index split can happen, we would need:
>	 *    allocate intermediate indexes (bitmap + group)
>	 *  + change two blocks at each level, but root (already included)
>  	 */
>  	needed += (depth * 2) + (depth * 2);

Again, this can only happen if < EXT4_EXT_MAX_DEPTH.

> +int  ext4_ext_writeblocks_trans_credits(struct inode *inode, int nrblocks)

please remove duplicate space after "int"

>  {
>  	int needed;
>  
> +	/* cost of adding a single extent:
> +	 * index blocks, leafs, bitmaps,
> +	 * groupdescp
> +	 */
> +	needed = ext4_ext_calc_credits_for_single_extent(inode, NULL);
> +	/*
> +	 * For data=journalled mode need to account for the data blocks
> +	 * Also need to add super block and inode block
> +	 */
> +	if (ext4_should_journal_data(inode))
> +		needed = nrblocks * (needed + 1)  + 2;
> +	else
> +		needed = nrblocks * needed  + 2;

It is also hard to understand why we need "nrblocks" times a single extent
insert.  That would assume we need a full tree split on EVERY block
inserted, which I don't think is reasonable.  Have you printed out some
of these values to see how large they actually are?

Instead, we shouldn't have ext4_ext_calc_credits_for_single_extent()
at all, and instead pass in the nrblocks parameter and work it out
on this basis.  That means at most nrblocks/340 extents/bitmaps/groups
at one time, plus at most 4 more levels of split in worst case.  We
don't need 4 * nrblocks for each write.

> @@ -2202,10 +2181,31 @@ static int ext4_da_writepages(struct add
> +		/*
> +		 * Estimate the worse case needed credits to write out
> +		 * to_write pages
> +		 */
> +		needed_blocks = ext4_writepages_trans_blocks(inode,
> +							     wbc->nr_to_write);
> +		while (needed_blocks > max_credit_blocks) {
> +			wbc->nr_to_write --;
> +			needed_blocks = ext4_writepages_trans_blocks(inode,
> +							     wbc->nr_to_write);
> +		}

This isn't a very efficient loop to decrement nr_to_write by one each time.
It is best to pass multiples of the RAID stripe size to mballoc to avoid
extra overhead.  I'd check something like below:

		needed_blocks = ~0U;

		while (needed_blocks > max_credit_blocks) {
			needed_blocks = ext4_writepages_trans_blocks(inode,
							     wbc->nr_to_write);
			/* We are more than twice the max_credit_blocks */
			if (needed_blocks + max_credit_blocks / 2 >
			    2 * max_credit_blocks)
				wbc->nr_to_write /= 2;
			else
				wbc->nr_to_write -=
					(needed_blocks-max_credit_blocks+3) / 4;

> +static inline int ext4_journal_max_transaction_buffers(struct inode *inode)
> +{
> +	/*
> +	 * max transaction buffers
> + 	 * calculation based on
> + 	 * journal->j_max_transaction_buffers = journal->j_maxlen / 4;
> + 	 */
> +        return (EXT4_JOURNAL(inode))->j_maxlen / 4;

Why does this not use "j_max_transaction_buffers" directly?  That is what
start_this_handle() checks against.  Also, this function should probably
be in fs/jbd2 instead of in fs/ext4.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ