lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 1 Mar 2012 09:38:59 -0500
From:	Chris Mason <chris.mason@...cle.com>
To:	Theodore Tso <tytso@....EDU>
Cc:	Jacek Luczak <difrost.kernel@...il.com>,
	linux-ext4@...r.kernel.org,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>, linux-btrfs@...r.kernel.org
Subject: Re: getdents - ext4 vs btrfs performance

On Wed, Feb 29, 2012 at 11:44:31PM -0500, Theodore Tso wrote:
> You might try sorting the entries returned by readdir by inode number before you stat them.    This is a long-standing weakness in ext3/ext4, and it has to do with how we added hashed tree indexes to directories in (a) a backwards compatible way, that (b) was POSIX compliant with respect to adding and removing directory entries concurrently with reading all of the directory entries using readdir.
> 
> You might try compiling spd_readdir from the e2fsprogs source tree (in the contrib directory):
> 
> http://git.kernel.org/?p=fs/ext2/e2fsprogs.git;a=blob;f=contrib/spd_readdir.c;h=f89832cd7146a6f5313162255f057c5a754a4b84;hb=d9a5d37535794842358e1cfe4faa4a89804ed209
> 
> … and then using that as a LD_PRELOAD, and see how that changes things.
> 
> The short version is that we can't easily do this in the kernel since it's a problem that primarily shows up with very big directories, and using non-swappable kernel memory to store all of the directory entries and then sort them so they can be returned in inode number just isn't practical.   It is something which can be easily done in userspace, though, and a number of programs (including mutt for its Maildir support) does do, and it helps greatly for workloads where you are calling readdir() followed by something that needs to access the inode (i.e., stat, unlink, etc.)
> 

For reading the files, the acp program I sent him tries to do something
similar.  I had forgotten about spd_readdir though, we should consider
hacking that into cp and tar.

One interesting note is the page cache used to help here.  Picture two
tests:

A) time tar cf /dev/zero /home

and

cp -a /home /new_dir_in_new_fs
unmount / flush caches
B) time tar cf /dev/zero /new_dir_in_new_fs

On ext, The time for B used to be much faster than the time for A
because the files would get written back to disk in roughly htree order.
Based on Jacek's data, that isn't true anymore.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ