[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAHB1NaifpACESRtCMsbF3f8EACD__gnM0bsXyyi4sQ0HYcJs=A@mail.gmail.com>
Date: Sun, 28 Sep 2025 14:51:09 +0800
From: Julian Sun <sunjunchao2870@...il.com>
To: "Theodore Ts'o" <tytso@....edu>
Cc: Harshad Shirwadkar <harshadshirwadkar@...il.com>, adilger@...ger.ca, jack@...e.cz,
Ext4 Developers List <linux-ext4@...r.kernel.org>
Subject: Re: [PATCH v2 3/3] ext4: reimplement ext4_empty_dir() using is_dirent_block_empty
Hi,
On Sun, Sep 28, 2025 at 11:46 AM Theodore Ts'o <tytso@....edu> wrote:
>
> On Sat, Sep 27, 2025 at 04:11:54PM +0800, Julian Sun wrote:
> > >
> > > I’ve recently been looking into the ext4 directory shrinking problem
> > > and was considering trying to add this feature myself. To my surprise,
> > > I found that this patch set had already implemented it and even
> > > received Reviewed-by. I’m curious whether it was never merged, or if it
> > > was merged and later reverted?
> > >
> > > If possible, is there anything I could do to contribute to moving this
> > > patch set forward toward being merged?
>
> I *think* there was one or two test regressions that Harshad was
> wrking on, but real problem was the original business imperative for
> the project became no longer as compelling, and we moved to focus on
> other priorities.
>
> So if you'd like tocontribute to moving this forward, what we'd need
> to do is to forward port the patch set to the latest kernel. I've
> taken a quick look at the patches, and predates the addition of the
> support of 3-level htrees (the incompat_largedir feature).
Emm. I checked the code and found that support for 3-level htrees was
added in 2017 via commit e08ac99fa2a2 ("ext4: add largedir feature"),
but this patch was submitted in 2020. Did I make a mistake somewhere?
> There are
> also some hardening against maliciously fuzzed file systems that will
> prevent the patches from applying cleanly.
Is this included in the xfstests test suite?
> Then we'd need to run regression tests on a variety of different ext4
> configurations to see if there are any regressions, and if so, they
> would need to be fixed.
Is testing with xfstests sufficient? Or are there any other test
suites that can be used to test this patch set?
>
> Also, please note that this first set of changes doesn't really make a
> big difference for real-world use casses, since a directory block
> won't get dropped when it is completely empty. For example, if we
> assume an average directory entry size of 32, there can be up to 128
> entries in a 4k block. If we assume that the average leaf block is
> 75% filled, there will be 96 directory entries. All 96 directory
> entries have to be deleted before that block can be removed. If the
> directory is 4MB, there will be roughly 100,000 directory entries and
> 1024 blocks. If we assume a random distribution and random deletion
> (which is a fair assumption given that we're using a hash of the file
> name). I will leave it as an exercise to the reader what percentage
> of directory entries need to be deleted before the probability that at
> least one 4k directory block is emptied is at least, say, 80%. But in
> practice, you have to delete most of the files in the directory before
> the directory starts shrinking.
Yes, I think the biggest beneficiary is rm -rf-type workloads.
>
> So this this is why we really need to implement the next step (which
> is not in this patch series), and that is to merging two adjacent leaf
> blocks once they fall below to some threshold --- say, 25%. We will
> also need to merge two adjacent index nodes if they are mostly empty.
Sounds great. Thanks for your kind and detailed explanation, Ted.
>
> Cheers,
>
> - Ted
Thanks,
--
Julian Sun <sunjunchao2870@...il.com>
Powered by blists - more mailing lists