[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <20090915144141.GS2537@webber.adilger.int>
Date: Tue, 15 Sep 2009 08:41:41 -0600
From: Andreas Dilger <adilger@....com>
To: Zhang Huan <zhhuan@...il.com>
Cc: linux-ext4@...r.kernel.org
Subject: Re: Question on readdir implementation
On Sep 15, 2009 17:57 +0800, Zhang Huan wrote:
> I'm reading EXT4 codes and has some questions about readdir
> implementation.
>
> Why traverse the directory in hash order? This brings lots of code to
> build and traverse a red-black tree. Why not just plainly traverse the
> directory's blocks?
Because htree does not maintain constant ordering of directory entries.
Consider a readdir running at the same time as files are being added:
readdir proc create proc
read block 0
read block 1
create new file
hash of filename puts entry into block 0
block 0 is full, split it
add new block 3
copy 1/2 of block 0 entries to block 3
add new filename to block 0
read block 2
read block 3
When the readdir returns block 3, 1/2 of the entries in that block
are the same as were returned with the original block 0 read. This
is a violation of POSIX readdir() semantics that require each existing
entry only be returned a single time (it doesn't matter if the new
filename is returned or not).
> Since the red-black tree is built every time a NFS readdir request comes
> in, in case of hash collision, the nfs client may receive duplicate dir
> entries if the buffer is not large enough to return all entries with the
> same hash value in once.
This is because NFSv2 only has a 32-bit cookie value, and if there is a
whole buffer full of values with the same 32-bit hash there isn't anything
that can be done about it except drop those extra duplicates (a very rare
case). It would have a much more serious problem if there was a directory
larger than 2^32 bytes in size, which would result in all entries beyond
2GB being lost.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists