lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110315001448.GG8120@thunk.org>
Date:	Mon, 14 Mar 2011 20:14:48 -0400
From:	Ted Ts'o <tytso@....edu>
To:	Phillip Susi <psusi@....rr.com>
Cc:	Eric Sandeen <sandeen@...hat.com>,
	"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>
Subject: Re: Large directories and poor order correlation

On Mon, Mar 14, 2011 at 07:43:57PM -0400, Phillip Susi wrote:
> > what if a process opens a directory, starts calling readdir, pauses in
> > the middle, and then holds onto it for days, weeks, or months?
> 
> The same thing that happened before htree?

No, because the readdir() semantics are exactly set up to mirror a
linear directory tree.

> > It's not hard to provide library routines that do the right thing, and
> > I have written an LD_PRELOAD which intercepts opendir() and readdir()
> > calls and does the sorting in userspace.  Perhaps the right answer is
> > getting this into libc, but I have exactly two words for you: "Uhlrich
> > Drepper".
> 
> Wouldn't it be better to just have readdir() use the main directory,
> which presumably is in a more sane ordering, instead of the htree?  That
> way you don't have to burn cpu and ram sorting on every opendir().

The htree is embedded into directory blocks (they appear to ext2 and
older kernels as deleted directory entries).  The leaf blocks are in
the "main directory" as well; there is no separate htree.

The reason why we have to traverse the directory tree in htree order
is because the POSIX requirements of how readdir() works in the face
of file deletes and creations, and what needs to happen if a leaf
block needs to be split.  Even if the readdir() started three months
ago, if in the intervening time, leaf nodes have been split, readdir()
is not allowed to return the same file twice.

> Also, I have checked some smaller directories and lsattr reports they
> are NOT using indexing, yet still display poor correlation.

Well, if the file system has been around for a long time, and there
are lots of "holes" in the inode allocation bitmap, it can happen that
even without indexing.

As another example, if you have a large maildir directory w/o
indexing, and files get removed, deleted, etc., over time the order of
the directory entries will have very little to do with the inode
number.  That's why programs like mutt sort the directory entries by
inode number.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ