linux-ext4 - Re: ext4 unlink performance

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20081116005610.GI22948@mit.edu>
Date:	Sat, 15 Nov 2008 19:56:10 -0500
From:	Theodore Tso <tytso@....edu>
To:	linux-ext4@...r.kernel.org
Subject: Re: ext4 unlink performance

On Sat, Nov 15, 2008 at 02:44:23PM -0600, Bruce Guenter wrote:
> On Fri, Nov 14, 2008 at 09:59:14AM -0500, Theodore Tso wrote:
> > This is beginning to perhaps sound like a layout problem of some kind.
> 
> To test this theory, I ran one test where I populated the filesystem
> with ext3 and then mounted as ext4 to do the unlinking.  This produced
> unlink times comparable with ext3.  That is, the degredation is occuring
> when the filesystem is populated, not when it is cleaned.

The problem is definitely in how we choose the directory and file
inode numbers for ext4.  A quick look of the free block and free inode
counts from the dumpe2fs of your ext3 and ext4 256-byte inode e2images
tells the tail.  Ext4 is using blocks and inodes packed up against the
beginning of the filesystem, and ext3 has the blocks and inodes spread
out for better locality.

We didn't change the ext4's inode allocation algorithms, so I'm
guessing that it's interacting very poorly with ext4's block delayed
allocation algorithms.  Bruce, how much memory did you have in your
system?  Do you have a large amount of memory, say 6-8 gigs, by any
chance?  When the filesystem creates a new directory, if the block
group is especially full, it will choose a new block group for the
directory, to spread things out.  However, if the blocks haven't been
allocated yet, then the directories won't be spread out appropriately,
and then the inodes will be allocated close to the directories, and
then things go downhill from there.  This is much more likely to
happen if you have a large number of small files, and a large amount
of memory, and when you are unpacking a tar file and so are write out
a large number of these small files spaced very closely in time,
before they have a chance to get forced out to disk and thus allocated
so the filesystem can take block group fullness into account when
deciding how to allocate inode numbers.

When I have a chance I'll write a program which analyzes how close the
blocks are to inodes, and how close inodes are to their containing
directory, but I'm pretty sure what we'll find will just confirm
what's going on in greater detail.

One thing is clear --- we need to rethink our block and inode
allocation algorithms in light of delayed allocation.  Maybe XFS has
some tricks up its sleeve that we can learn from?

     	       	   	       	      	    - Ted

View attachment "ext3-256-inode-usage" of type "text/plain" (11985 bytes)

View attachment "ext4-256-inode-usage" of type "text/plain" (15769 bytes)