linux-ext4 - Re: [RFC] Optimizing readdir()

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.2.00.1301301227410.28271@dhcp-1-104.brq.redhat.com>
Date:	Wed, 30 Jan 2013 12:34:21 +0100 (CET)
From:	Lukáš Czerner <lczerner@...hat.com>
To:	Radek Pazdera <rpazdera@...hat.com>
cc:	Andreas Dilger <adilger@...ger.ca>,
	"Theodore Ts'o" <tytso@....edu>, linux-ext4@...r.kernel.org,
	Lukáš Czerner <lczerner@...hat.com>
Subject: Re: [RFC] Optimizing readdir()

On Tue, 29 Jan 2013, Radek Pazdera wrote:

> Date: Tue, 29 Jan 2013 17:38:46 +0100
> From: Radek Pazdera <rpazdera@...hat.com>
> To: Andreas Dilger <adilger@...ger.ca>
> Cc: Theodore Ts'o <tytso@....edu>, linux-ext4@...r.kernel.org,
>     Lukáš Czerner <lczerner@...hat.com>
> Subject: Re: [RFC] Optimizing readdir()
> 
> On Tue, Jan 15, 2013 at 03:44:57PM -0700, Andreas Dilger wrote:
> >Having an upper limit on the directory cache is OK too.  Read all
> >of the entries that fit into the cache size, sort them, and return
> >them to the caller.  When the caller has processed all of those
> >entries, read another batch, sort it, return this list, repeat.
> >
> >As long as the list is piecewise ordered, I suspect it would gain
> >most of the benefit of linear ordering (sequential inode table
> >reads, avoiding repeated lookups of blocks).  Maybe worthwhile if
> >you could test this out?
> 
> I did the tests last week. I modified the spd_readdir preload to
> read at most $SPD_READDIR_CACHE_LIMIT entries, sort them and repeat.
> The patch is here:
> 
>     http://www.stud.fit.vutbr.cz/~xpazde00/soubory/dir-index-test-ext4/
> 
> I tested it with the limit set to 0 (i.e., no limit), 1000, 10000,
> 50000, and completely without the preload. The test runs were
> performed on the same directory, so the results shouldn't be
> affected by positioning on disk.
> 
> Directory sizes went from 10k to 1.5M. The tests were run twice.
> The first run is only with metadata. In the second run, each file
> has 4096B of data.
> 
> Here are the results:
>   0B files:
>     http://www.stud.fit.vutbr.cz/~xpazde00/soubory/dir-index-test-ext4/0B-files
> 
>   4096B files:
>     http://www.stud.fit.vutbr.cz/~xpazde00/soubory/dir-index-test-ext4/4096B-files/
> 
> The times seem to decrease accordingly as the limit of the cache
> increases. The differences are bigger in case of 4096B files, where
> the data blocks start to evict the inode tables. However, copying is
> still more than two times slower for 1.5M files when 50000 entries
> are cached.
> 
> It might be interesting to test what happens when the size of the
> files in the directory increases.
> 
> Best Regards
> Radek

Hi Radek,

those are interesting results and it supports the idea that you can
get most of the performance of completely sorted inode list by doing
it in "batches" as long as the size of the batch is sufficiently
large. However I do not think that using spd_readdir is the best
approach for the problem, nor do I think that it should be part of
the generic library. Aside from it's "hackish" nature and the fact
you will never be able to tell how much memory you can actually use
for the sorting, the fact is that other file systems can handle this
problem well enough in comparison with ext4 and we should really
focus on fixing it, rather than going around it.

Thanks!
-Lukas