linux-ext4 - Re: [PATCH] ext4: add noorlov parameter to avoid spreading of directory inodes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20131002170237.GB16076@kvack.org>
Date:	Wed, 2 Oct 2013 13:02:37 -0400
From:	Benjamin LaHaise <bcrl@...ck.org>
To:	Theodore Ts'o <tytso@....edu>
Cc:	Eric Sandeen <sandeen@...hat.com>, Jan Kara <jack@...e.cz>,
	Andreas Dilger <adilger.kernel@...ger.ca>,
	linux-ext4@...r.kernel.org
Subject: Re: [PATCH] ext4: add noorlov parameter to avoid spreading of directory inodes

On Wed, Oct 02, 2013 at 12:23:23PM -0400, Theodore Ts'o wrote:
> Ext3 used an orlov style allocator as well.  The main difference
> between ext4 and ext3 is the orlov allocator is now done on a
> per-flexbg basis instead of per-blockgroup basis.
> 
> That is, we do the statistics based on a flex-bg basis instead of the
> blockgroup basis.  As a result, I suspect Ben would see the inode
> allocation behavior equivalent to ext3 if he creates the file system
> using "mke2fs -t ext4 -G 1" to force the flex_bg size to 1.
> 
> Can you let me know what the size of the file system was, and mke2fs
> parameters you were using for ext3 and ext4?  I have a feeling that
> inode allocations weren't optimal for your use case even with ext3,
> but because we now spread the inodes based on flex_bg's instead of
> block groups, that's why you saw the performance degredation.

This may have been a bit misleading -- other parts of the system changed 
between the version running on ext3 vs ext4.  Subdirectories weren't used 
as much on ext3 as on ext4, so the effect wasn't nearly as pronounced.  
It was on further investigation that showed that the spreading of inodes 
for directories was resulting in the files being laid out in different 
block groups, which made the operation of reading/writing files to disk 
much less sequential.

The other big change in allocation between ext3 and ext4 is mballoc.  
Without fallocate() on the files, the allocator in ext4 was preferentially 
aligning files to power-of-2 block numbers.  This lead to one of our 
tests where ~9MB files were used to have gaps of ~1800 blocks between 
files (even in the same directory), which degraded transfer rates to/from 
disk thanks to the extra seeks.  But this aspect of tweaking the allocator 
was easily fixed by doing an fallocate() for the size of the file before 
writing to it.

		-ben
-- 
"Thought is the essence of where you are now."
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html