lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4FE29DA7.40405@redhat.com>
Date:	Wed, 20 Jun 2012 23:05:59 -0500
From:	Eric Sandeen <sandeen@...hat.com>
To:	Norbert Preining <preining@...ic.at>
CC:	"Ted Ts'o" <tytso@....edu>,
	"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>
Subject: Re: Ext4 slow on links

On 6/20/12 9:28 PM, Norbert Preining wrote:
> Hi Eric,
> 
> thanks a lot for looking into that.
> 
> On Mi, 20 Jun 2012, Eric Sandeen wrote:
>> so almost all reads, and no read merges; almost 35 megabytes read and every
>> one was a small 4k IO.
> 
> Ouch, that hurts.
> 
> On Mi, 20 Jun 2012, Eric Sandeen wrote:
>> Would you be willing to provide an "e2image -r" image of the filesystem?
> 
> Ok, it is running now since a few hours and I am far from finished
> I guess, since there are 350+G on the fs, and the compressed image
> is by now 200M.
> 
> Is it fine to do it on a running system, or do I have to boot
> from USB or so?

Well, don't bother, sorry.  See below.  Zach had it right.

> If it is not toooo big I will tr to upload it to some place were
> you can get access to.
> 
> On Mi, 20 Jun 2012, Eric Sandeen wrote:
>> Oh, but Zach Brown reminds me that if we stat the entries in getdents/hash
>> order, it's roughly random w.r.t. disk location.  Newer utils will sort into
>> inode order, I think(?)  Might be interesting to strace the ls -l and see
>> if it's doing it in inode order, or not.
> 
> Ok, is there a special option to strace, or -trace=all?

if you do 

# strace -v -o outfile ls -l 

you'll see things like:

getdents(3, {{d_ino=249052, d_off=186216735, d_reclen=32, d_name="file3"} {d_ino=245882, d_off=473549160, d_reclen=24, d_name="."} {d_ino=249051, d_off=516459536, d_reclen=32, d_name="file2"} {d_ino=249055, d_off=545762253, d_reclen=32, d_name="file6"} {d_ino=249049, d_off=550416647, d_reclen=32, d_name="file1"} ...

and from there see that the entries returned  are not in inode order (and therefore not in disk order).

and lstats after that, also out of order:

# grep lstat outfile
lstat("file3", {st_dev=makedev(8, 8), st_ino=249052, st_mode=S_IFLNK|0777, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=13, st_atime=2012/06/20-22:13:08, st_mtime=2012/06/20-22:13:07, st_ctime=2012/06/20-22:13:07}) = 0
lstat("file2", {st_dev=makedev(8, 8), st_ino=249051, st_mode=S_IFLNK|0777, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=13, st_atime=2012/06/20-22:13:08, st_mtime=2012/06/20-22:13:07, st_ctime=2012/06/20-22:13:07}) = 0
lstat("file6", {st_dev=makedev(8, 8), st_ino=249055, st_mode=S_IFLNK|0777, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=13, st_atime=2012/06/20-22:13:08, st_mtime=2012/06/20-22:13:07, st_ctime=2012/06/20-22:13:07}) = 0
lstat("file1", {st_dev=makedev(8, 8), st_ino=249049, st_mode=S_IFLNK|0777, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=13, st_atime=2012/06/20-22:13:08, st_mtime=2012/06/20-22:13:07, st_ctime=2012/06/20-22:13:07}) = 0
...

later on you'll see readlinks:

# grep readlink outfile
readlink("file3", "../dir2/file3", 14)  = 13
readlink("file2", "../dir2/file2", 14)  = 13
readlink("file6", "../dir2/file6", 14)  = 13
readlink("file1", "../dir2/file1", 14)  = 13
...

etc.

Hm.  Upstream coreutils fixed this for rm and some other ops:

http://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=24412edeaf556a

# grep unlink /tmp/rm-strace 
unlink("file1")                         = 0
unlink("file10")                        = 0
unlink("file2")                         = 0
unlink("file3")                         = 0
unlink("file4")                         = 0
unlink("file5")                         = 0
unlink("file6")                         = 0
unlink("file7")                         = 0
unlink("file8")                         = 0
unlink("file9")                         = 0

but maybe not for ls -l

You could see if you could get this LD_PRELOAD working:

http://git.kernel.org/?p=fs/ext2/e2fsprogs.git;a=blob_plain;f=contrib/spd_readdir.c

build & enable with:

gcc -o spd_readdir.so -fPIC -shared spd_readdir.c -ldl
export LD_PRELOAD=`pwd`/spd_readdir.so

and see if that addresses the problem; 

here, it does for me:

# grep readlink outfile2 
readlink("file1", "../dir2/file1"..., 14) = 13
readlink("file10", "../dir2/file10"..., 15) = 14
readlink("file2", "../dir2/file2"..., 14) = 13
readlink("file3", "../dir2/file3"..., 14) = 13
readlink("file4", "../dir2/file4"..., 14) = 13
readlink("file5", "../dir2/file5"..., 14) = 13

I'm guessing that operating in inode order should help
you a bit, at least.  I tested on a dir w/ 10,000 long symlinks
with and without the sorting, and you can see the difference pretty
clearly.

sorted took 2.6s, unsorted took 52s.

And you can see why:

http://people.redhat.com/esandeen/sorted_unsorted.png

meanwhile I can ask Jim about coreutils & ls -l.

-Eric

> Best wishes
> 
> Norbert
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ