lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <200807212257.38160.phillips@phunq.net>
Date:	Mon, 21 Jul 2008 22:57:38 -0700
From:	Daniel Phillips <phillips@...nq.net>
To:	Theodore Tso <tytso@....edu>
Cc:	linux-kernel@...r.kernel.org, Andreas Dilger <adilger@....com>,
	linux-ext4@...r.kernel.org, ext2-devel@...ts.sourceforge.net
Subject: Re: [RFC] A better solution to the HTree telldir problem

On Monday 21 July 2008 19:56, Theodore Tso wrote:
> Even worse, since if NFS is enabled in any way, shape or form, this
> scheme is totally and completely screwed.  So we have to the old way
> of doing things if NFS is turned on.

I didn't realize it would be so difficult to detect telldir coming from
NFSD.  NFS is the entire reason for all this posturing, so if the
strategy fails for NFS then it isn't worth spending any time on it.

I wrote generic btree code for ddsnap with arbitrary depth and merge on
delete, but was never able to justify the time to port it to HTree
during my day job.  Props to clusterfs (Andreas!) for stepping up to
do this.  No doubt a commercial imperative helped achieve focus there.

But I will not use HTree for my own filesystem work, I will use a new
thing called PHTree, which is a (P)hysically stable variant of HTree,
obtained by inserting a new layer of index blocks between the index
nodes and the dirent blocks, the "terminal" index blocks.  Each
terminal index block is populated with { hash : block } pairs, each of
which indicates that there is a dirent in <block> with hash <hash>.

This requires one extra lookup per operation which is regretable, but
it solves the physical stability problem.  Directory entries will just
stay in the leaf blocks in which they are first created, and only
index nodes will ever be split.  Because leaf blocks will typically be
full instead of 75% full in HTree, the space usage ends up about the
same.  A more efficient hash algorithm can be used, and deletion and
other operations will tend to be faster due to less thrashing of the
inode table.

Free space management becomes an issue: when dirents are deleted we
need to be able to re-use them for new entries.  To support efficient
detection of available dirents of appropriate size, I will use a lazy
method of recording the maximum sized dirent available in any leaf
block.  The actual largest free direct may be smaller than that, which
will be detected when a search fails, causing the lazy limit to be
updated.  We can safely skip searching for free space in any block
for which the lazy max is less than the size we require.  One byte is
sufficient for the lazy max, so one 4K block is sufficient to keep
track of 2^12 * 2^12 bytes worth of directory blocks, a 16 meg
directory with about half a million entries.  For larger directories
this structure becomes a radix tree, with lazy max recorded also at
each index pointer for quick free space searching.

Mind you, I haven't coded this yet and it will be a good few months
before I do, so I don't have performance numbers.  I just thought I
should mention it now, because I expect it to meet or beat HTree
overall in performance while supporting brain damaged NFS semantics in
a much nicer way.  Hopefully we will find out for sure before Ext4
indexing becomes cast in stone.

Regards,

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ