[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250908194053.GA3620067@mit.edu>
Date: Mon, 8 Sep 2025 15:40:53 -0400
From: "Theodore Ts'o" <tytso@....edu>
To: Rogier Wolff <R.E.Wolff@...wizard.nl>
Cc: linux-kernel@...r.kernel.org
Subject: Re: Deleting a bunch of files takes long.
On Mon, Sep 08, 2025 at 06:18:51PM +0200, Rogier Wolff wrote:
> Is the "logging file system" a contributing factor? Maybe after each
> rm or after each rmdir that something needs to be written to the log?
If you want to avoid running fsck after a crash, it's not free. So
there is always a certain amount overhead in journalling. Comparing
Linux with Minix is comparing apples and oranges, since with Minix,
you have to run fsck after a crash or power failure.
You *can* run ext4 without the journal. If the file system has been
cleanly unmounted, or you've run fsck, you can mount the file system
using -o noload to disable journalling. Or you can format the file
system without the journal. ("mkfs.ext4 -O ^has_journal") BTW, this is
something that Google contributed some 15+ years ago, because Google
uses a cluster file system (back then, GFS) because at very large
scales, they need to make sure data isn't lost when a hard drive dies,
or when a power supply on a particular server dies, or the entry
router at the top of a rack gives up the ghost. So if data gets lost
after a crash or power failure, the cluster file system can recover
since it has to handle much worse (e.g., when an entire rack of server
becomes inaccessible when a router die, or the power management unit
takes out multiple racks in a power failure domain), and so the ext4
journal was unnecessary overhead.
So if you don't care about reliable recovery after a power failure, by
all means, you can disable the journal with ext4. That *will* make
certain workloads faster. But users tend to get cranky when they lose
data after a crash, unless you have some kind of higher-level data
recovery (e.g., like a cluster-level file system which has erasure
coding or replication across different servers that are in different
failure domains).
The other thing which ext4 does is it spreads the files across the
entire file system, which reduces file fragmetnation, but it does mean
that if you create a huge number of files, and then you want to delete
a huge number of files, a larger number of block groups will need to
be updated compared to minix. But this was a deliberate design
decision, because reducing performance degradation over the long-term
is something that we considered far more important than optimizing for
"rm -rf".
Cheers,
- Ted
Powered by blists - more mailing lists