[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <20090823031039.GF5931@webber.adilger.int>
Date: Sat, 22 Aug 2009 21:10:39 -0600
From: Andreas Dilger <adilger@....com>
To: Andreas Schlick <schlick@...abit.com>
Cc: Theodore Tso <tytso@....edu>, linux-ext4@...r.kernel.org
Subject: Re: [PATCH 1/1] dir shrink (was Re: ext3/ext4 directories don't shrink
after deleting lots of files)
On Aug 22, 2009 16:20 +0200, Andreas Schlick wrote:
> I'd like to try it. It looks like a nice starting project.
> Following your outline the first version of the patch tries to remove an
> empty block at the end of a non-htree directory.
> I'd appreciate it if you checked it and gave me suggestions for improving it.
Adding the extra "dc" to each of the functions probably isn't necessary,
as this makes the API messier. Probably a better approach would be to
just do this within ext4_delete_entry(), analogous to how ext4_add_entry()
might add a new block at any time.
It would be even better if this could be done repeatedly if there are
more empty blocks at the end (i.e. they were not previously at the end
of the file), but that gets into trouble with the transactions. It isn't
easy to remove an intermediate block, because this will result in a hole
in the directory (a no-no), and there is no safe way to reorder the
blocks in the directory.
> At the moment I am looking at the dir_index code, so I can extend it to htree
> directories. Please let me know if you want me to port it to ext3, although
> personally I think it is better to do so at later point.
For dir_index what is important is that you don't have any holes in the
hash space, nor in the logical directory blocks. One possibility is in
the case where the direntry being removed is the last one[*] to remove
the block it resides in, move the last block to the current logical
offset, and update the htree index to reflect this.
Note that the htree index only records the starting hash value for each
block, so all that would need to be done to remove any mention of the
deleted block is to memmove() the entries to cover the deleted block and
the hash buckets will still be correct. Also, the logical block number
of the last entry would need to be changed to reflect its new position.
[*] This is easily determined in ext4_delete_entry() because it always
walks the block until it finds the entry, and if there are valid
entries before the one being deleted the block is not empty. Tracking
this takes basically no extra effort. If no valid entries are before
the one being deleted, and if the length of the entry after it fills
the rest of the space in the block then the block is empty.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists