[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070920170250.GF30221@thunk.org>
Date: Thu, 20 Sep 2007 13:02:50 -0400
From: Theodore Tso <tytso@....edu>
To: Jan Kara <jack@...e.cz>
Cc: linux-ext4@...r.kernel.org
Subject: Re: Enabling h-trees too early?
On Thu, Sep 20, 2007 at 06:19:04PM +0200, Jan Kara wrote:
> if (EXT4_HAS_COMPAT_FEATURE(inode->i_sb, EXT4_FEATURE_COMPAT_DIR_INDEX) &&
> ((EXT4_I(inode)->i_flags & EXT4_INDEX_FL) ||
> ((inode->i_size >> sb->s_blocksize_bits) == 1))) {
> error = ext4_dx_readdir(filp, dirent, filldir);
> if (error != ERR_BAD_DX_DIR) {
> ret = error;
> goto out;
> }
> /*
> * We don't set the inode dirty flag since it's not
> * critical that it get flushed back to the disk.
> */
> EXT4_I(filp->f_path.dentry->d_inode)->i_flags &= ~EXT4_INDEX_FL;
> }
> It calls ext4_dx_readdir() for *every* directory with 1 block (we have
> 1326 of them in the kernel tree). Now ext4_dx_readdir() calls
> ext4_htree_fill_tree() which finds out the directory is not h-tree and
> and calls htree_dirblock_to_tree(). So even for 4KB directories we end up
> deleting inodes in hash order! And as a bonus we burn some cycles building
> trees etc. What is the point of this?
That was added so we wouldn't get screwed when a directory that was
previously non htree became an htree directory while the directory fd
is open. So the failure case is one where you do opendir(), readdir()
on 25% of the directory, sleep for 2 hours, and in the meantime, 200
files are added to the directory and it gets converted into a htree
index, causing all of the previously returned readdir() results in
directory order to be completely screwed up now that the directory has
been converted into an htree. (All of the readdir/telldir/seekdir
POSIX requirements cause filesystem designers to tear their hair out.)
What we would need to do to avoid needing this is to read in the
entire directory leaf page into the rbtree, sorted by inode number,
and then to keep that rbtree for the entire life of the open directory
file descriptor. We would also have to change telldir/seekdir to use
something else as a telldir cookie, and readdir would have to be set
up to *only* use the rbtree, and never look at the on-disk directory.
This would also mean that all of the files created or deleted after
the initial opendir() would never be reflected in results returned by
readdir(), but that's allowed by POSIX. And if we do this for a
single block 4k directory, we might as well do it for a 32k or 64k
HTREE directory as well.
Does that make sense?
- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists