lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <20090423190817.GN3209@webber.adilger.int>
Date:	Thu, 23 Apr 2009 13:08:17 -0600
From:	Andreas Dilger <adilger@....com>
To:	Curt Wohlgemuth <curtw@...gle.com>
Cc:	ext4 development <linux-ext4@...r.kernel.org>
Subject: Re: Question on block group allocation

On Apr 23, 2009  09:41 -0700, Curt Wohlgemuth wrote:
> I'm seeing a performance problem on ext4 vs ext2, and in trying to
> narrow it down, I've got a question about block allocation in ext4
> that I'm having trouble figuring out.
> 
> Using dd, I created (in this order) two 4GB files and a 10GB file in
> the mount directory.
> 
> The extent blocks are reasonably close together for the two 4GB files,
> but the extents for the 10GB file show a huge gap, which seems to hurt
> the random read performance pretty substantially.  Here's the output
> from debugfs:
> 
> BLOCKS:
> (IND):8396832, (0-106495):8282112-8388607,
> (106496-399359):11241472-11534335, (399360-888831):20482048-20971519,
> (888832-1116159):23889920-24117247, (1116160-1277951):71665664-
> 71827455, (1277952-1767423):78678016-79167487,
> (1767424-2125823):102402048-102760447,
> (2125824-2148351):102768672-102791199,
> (2148352-2621439):102793216-103266303
> TOTAL: 2621441
> 
> Note the gap between blocks 79167487 and 102402048.

Well, there are other even larger gaps for other chunks of the file.

> I was lucky enough to capture the mb_history from this 10GB create:
> 
> 29109 14       735/30720/32758@...4112 735/30720/2048@...4112
> 735/30720/2048@...4112  1     0     0  1568  M     0     0
> 29109 14       736/0/32758@...6160     736/0/2048@...6160
> 2187/2048/2048@...6160  1     1     0  1568        0     0
> 29109 14       2187/4096/32758@...8208 2187/4096/2048@...8208
> 2187/4096/2048@...8208  1     0     0  1568  M     2048  4096
> 
> I've been staring at ext4_mb_regular_allocator() trying to understand
> why an allocation with a goal block of 736 ends up with a best found
> extent group of 2187, and I'm stuck -- at least without a lot of
> printk messages.  It seems to me that we just cycle through the block
> groups starting with the goal group until we find a group that fits.
> Again, according to dumpe2fs, block groups 737, 738, 739, ... all have
> 32768 free blocks.  So why we end up with a best fit group of 2187 is
> a mystery to me.

This is likely the "uninit_bg" feature that is causing the allocations
to skip groups which are marked BLOCK_UNINIT.  In some sense the benefit
of skipping the block bitmap read during e2fsck is probably not at all
beneficial compared to the cost of the extra seeking during IO.  As the
filesystem gets more full, the BLOCK_UNIIT flags would be cleared anyways,
so we might as well just keep the early allocations contiguous.

A simple change to verify this would be something like the following,
but it hasn't actually been tested.

--- ./fs/ext4/mballoc.c.uninit    2009-04-08 19:13:13.000000000 -0600
+++ ./fs/ext4/mballoc.c 2009-04-23 13:02:22.000000000 -0600
@@ -1742,10 +1723,6 @@ static int ext4_mb_good_group(struct ext
 	switch (cr) {
 	case 0:
 		BUG_ON(ac->ac_2order == 0);
-		/* If this group is uninitialized, skip it initially */
-		desc = ext4_get_group_desc(ac->ac_sb, group, NULL);
-		if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))
-			return 0;
 
 		bits = ac->ac_sb->s_blocksize_bits + 1;
 		for (i = ac->ac_2order; i <= bits; i++)
@@ -2039,9 +2035,7 @@ repeat:
 			ac->ac_groups_scanned++;
 			desc = ext4_get_group_desc(sb, group, NULL);
-			if (cr == 0 || (desc->bg_flags &
-				cpu_to_le16(EXT4_BG_BLOCK_UNINIT) &&
-				ac->ac_2order != 0))
+			if (cr == 0)
 				ext4_mb_simple_scan_group(ac, &e4b);
 			else if (cr == 1 &&
 					ac->ac_g_ex.fe_len == sbi->s_stripe)


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ