linux-kernel - Re: If not readdir() then what?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <17949.51797.386833.917451@notabene.brown>
Date:	Thu, 12 Apr 2007 15:57:41 +1000
From:	Neil Brown <neilb@...e.de>
To:	Jörn Engel <joern@...ybastard.org>
Cc:	Theodore Tso <tytso@....edu>, "H. Peter Anvin" <hpa@...or.com>,
	Christoph Hellwig <hch@...radead.org>,
	Ulrich Drepper <drepper@...il.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: If not readdir() then what?

On Thursday April 12, joern@...ybastard.org wrote:
> On Thu, 12 April 2007 11:46:41 +1000, Neil Brown wrote:
> > 
> > I could argue that nfs came before ext3+dirindex, so ext3 should have
> > been designed to work properly with NFS.  You could argue that fixing
> > it in nfsd fixes it for all filesystems.  But I'm not sure either of
> > those arguments are likely to be at all convincing...
> 
> Caring about a non-ext3 filesystem, I sure would like an nfs solution as
> well. :)

I have a non-ext3 filesystem I care about too.....

But my perspective is that a solution in nfsd at-best a work-around.
Caching the whole 'struct file' when there is just a small bit that we
might want seems like a heavy hammer.  The filesystem is in the best
place to know what needs to be cached, and it should be the one doing
the caching.

> 
> > Hmmm. I wonder.  Which is more likely?
> >   - That two 64bit hashes from some set are the same
> >   - or that 65536 48bit hashes from a set of equal size are the same.
> 
> The former.  Each bit going from hash strength to collision chain length
> reduces the likelihood of an overflow.  In the extreme case of a 0bit
> hash and 64bit collision chain, you need 2^64 entries compared to 2^32
> for the other extreme.
> 
> However, the collision chain gives me quite a bit of headache.  One
> would have to store each entry's position on the chain, deal with older
> entries getting deleted, newer entries getting removed, etc.  All this
> requires a lot of complicated code that basically never gets tested in
> the wild.

This is a simple consequence of the design decision to use hashes as
the search key.  They aren't dense and they will collide.  So the
solution will be a bit fuzzy around the edges.  And maybe that is an
acceptable tradeoff.  But the filesystem should take full
responsibility for it, whether in performance or correctness :-)

> 
> Just settling for a 64bit hash and returning -EEXIST when someone causes
> a collision an creat() sounds more appealing.  Directories with 4
> billion entries will cause problems, but that is hardly news to anyone.
> 

I think you want -EFBIG or -ENOSPC.  -EEXIST sounds just wrong.

But there are alternatives.  e.g. internal chaining.
Insist on a unique 64bit hash for every file.  If the hash is in use,
increment and try again.  On lookup, if the hash leads you to a file
with the wrong name, increment and try again until you find a hole
(hash value that is not stored).  When you delete an entry, leave a
place holder if the next hash is in use.  Conversely if the next hash
is not in use, delete the entry and delete the previous one if it is a
place holder.

Then you get 100% correct semantics and a performance hit in the face
of hash collisions that is probably no worse than that which ext3
currently gets.  It probably does cost you a bit of storage to store
those 64bit hashes, though I suspect some clever compression can help
out there (You only need one bit more than the filename when there is
no chaining).

You have to require 64bit cookies/fpos, but I think that today, that
is a reasonable thing to require (5 years ago it might not have been).

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/