[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <17949.51797.386833.917451@notabene.brown>
Date: Thu, 12 Apr 2007 15:57:41 +1000
From: Neil Brown <neilb@...e.de>
To: Jörn Engel <joern@...ybastard.org>
Cc: Theodore Tso <tytso@....edu>, "H. Peter Anvin" <hpa@...or.com>,
Christoph Hellwig <hch@...radead.org>,
Ulrich Drepper <drepper@...il.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: If not readdir() then what?
On Thursday April 12, joern@...ybastard.org wrote:
> On Thu, 12 April 2007 11:46:41 +1000, Neil Brown wrote:
> >
> > I could argue that nfs came before ext3+dirindex, so ext3 should have
> > been designed to work properly with NFS. You could argue that fixing
> > it in nfsd fixes it for all filesystems. But I'm not sure either of
> > those arguments are likely to be at all convincing...
>
> Caring about a non-ext3 filesystem, I sure would like an nfs solution as
> well. :)
I have a non-ext3 filesystem I care about too.....
But my perspective is that a solution in nfsd at-best a work-around.
Caching the whole 'struct file' when there is just a small bit that we
might want seems like a heavy hammer. The filesystem is in the best
place to know what needs to be cached, and it should be the one doing
the caching.
>
> > Hmmm. I wonder. Which is more likely?
> > - That two 64bit hashes from some set are the same
> > - or that 65536 48bit hashes from a set of equal size are the same.
>
> The former. Each bit going from hash strength to collision chain length
> reduces the likelihood of an overflow. In the extreme case of a 0bit
> hash and 64bit collision chain, you need 2^64 entries compared to 2^32
> for the other extreme.
>
> However, the collision chain gives me quite a bit of headache. One
> would have to store each entry's position on the chain, deal with older
> entries getting deleted, newer entries getting removed, etc. All this
> requires a lot of complicated code that basically never gets tested in
> the wild.
This is a simple consequence of the design decision to use hashes as
the search key. They aren't dense and they will collide. So the
solution will be a bit fuzzy around the edges. And maybe that is an
acceptable tradeoff. But the filesystem should take full
responsibility for it, whether in performance or correctness :-)
>
> Just settling for a 64bit hash and returning -EEXIST when someone causes
> a collision an creat() sounds more appealing. Directories with 4
> billion entries will cause problems, but that is hardly news to anyone.
>
I think you want -EFBIG or -ENOSPC. -EEXIST sounds just wrong.
But there are alternatives. e.g. internal chaining.
Insist on a unique 64bit hash for every file. If the hash is in use,
increment and try again. On lookup, if the hash leads you to a file
with the wrong name, increment and try again until you find a hole
(hash value that is not stored). When you delete an entry, leave a
place holder if the next hash is in use. Conversely if the next hash
is not in use, delete the entry and delete the previous one if it is a
place holder.
Then you get 100% correct semantics and a performance hit in the face
of hash collisions that is probably no worse than that which ext3
currently gets. It probably does cost you a bit of storage to store
those 64bit hashes, though I suspect some clever compression can help
out there (You only need one bit more than the filename when there is
no chaining).
You have to require 64bit cookies/fpos, but I think that today, that
is a reasonable thing to require (5 years ago it might not have been).
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists