[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250928034638.GC200463@mit.edu>
Date: Sat, 27 Sep 2025 23:46:38 -0400
From: "Theodore Ts'o" <tytso@....edu>
To: Julian Sun <sunjunchao2870@...il.com>
Cc: Harshad Shirwadkar <harshadshirwadkar@...il.com>, adilger@...ger.ca,
jack@...e.cz, Ext4 Developers List <linux-ext4@...r.kernel.org>
Subject: Re: [PATCH v2 3/3] ext4: reimplement ext4_empty_dir() using
is_dirent_block_empty
On Sat, Sep 27, 2025 at 04:11:54PM +0800, Julian Sun wrote:
> >
> > I’ve recently been looking into the ext4 directory shrinking problem
> > and was considering trying to add this feature myself. To my surprise,
> > I found that this patch set had already implemented it and even
> > received Reviewed-by. I’m curious whether it was never merged, or if it
> > was merged and later reverted?
> >
> > If possible, is there anything I could do to contribute to moving this
> > patch set forward toward being merged?
I *think* there was one or two test regressions that Harshad was
wrking on, but real problem was the original business imperative for
the project became no longer as compelling, and we moved to focus on
other priorities.
So if you'd like tocontribute to moving this forward, what we'd need
to do is to forward port the patch set to the latest kernel. I've
taken a quick look at the patches, and predates the addition of the
support of 3-level htrees (the incompat_largedir feature). There are
also some hardening against maliciously fuzzed file systems that will
prevent the patches from applying cleanly.
Then we'd need to run regression tests on a variety of different ext4
configurations to see if there are any regressions, and if so, they
would need to be fixed.
Also, please note that this first set of changes doesn't really make a
big difference for real-world use casses, since a directory block
won't get dropped when it is completely empty. For example, if we
assume an average directory entry size of 32, there can be up to 128
entries in a 4k block. If we assume that the average leaf block is
75% filled, there will be 96 directory entries. All 96 directory
entries have to be deleted before that block can be removed. If the
directory is 4MB, there will be roughly 100,000 directory entries and
1024 blocks. If we assume a random distribution and random deletion
(which is a fair assumption given that we're using a hash of the file
name). I will leave it as an exercise to the reader what percentage
of directory entries need to be deleted before the probability that at
least one 4k directory block is emptied is at least, say, 80%. But in
practice, you have to delete most of the files in the directory before
the directory starts shrinking.
So this this is why we really need to implement the next step (which
is not in this patch series), and that is to merging two adjacent leaf
blocks once they fall below to some threshold --- say, 25%. We will
also need to merge two adjacent index nodes if they are mostly empty.
Cheers,
- Ted
Powered by blists - more mailing lists