linux-ext4 - Re: [Bug 220594] Online defragmentation has broken in 6.16

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251124051545.GE13687@macsyma-3.local>
Date: Sun, 23 Nov 2025 23:15:45 -0600
From: "Theodore Tso" <tytso@....edu>
To: bugzilla-daemon@...nel.org
Cc: linux-ext4@...r.kernel.org
Subject: Re: [Bug 220594] Online defragmentation has broken in 6.16

So it's not that all files can't be defragged; just *some* files.  Is
that correct?

And when I ask whether or not it's reproducible, can you take a
snapshot of your file system, and then remount the snapshot, and will
the exact same file that failed before fails on the snapshot?

And for the files that were failing, if you unmount the file system
and remount it, can you then defrag the file in question?  If the
answer is yes, this is why bug reports of the form "Online
defragmentation in 6.16 is broken" is not particularly useful.  And
it's why I've not spent a lot of time on this bug.  We have
defragmentation tests in fstests, and they are passing, and I've tried
running defrag on the snapshot that you sent me, And It Works For Me.
So a broad "it's broken" without any further data, when it most
manifestly is not broken in my tests, means that if you really want it
to be fixed, you're going to have to do more of the debugging.

But now that we know that it's an EBUSY error, it sounds like it's
some kind of transient thing, and that's why I'm not seeing it when I
tried running it on your snapshot.

For example, one of the places where you can get EBUSY in the MOVE_EXT
ioctl is here:

                if (!filemap_release_folio(folio[0], 0) ||
                    !filemap_release_folio(folio[1], 0)) {
                        *err = -EBUSY;
                        goto drop_data_sem;
                }

... and this ultimately calls ext4_release_folio:

static bool ext4_release_folio(struct folio *folio, gfp_t wait)
{
	struct inode *inode = folio->mapping->host;
	journal_t *journal = EXT4_JOURNAL(inode);

	trace_ext4_release_folio(inode, folio);

	/* Page has dirty journalled data -> cannot release */
	if (folio_test_checked(folio))
		return false;
	if (journal)
		return jbd2_journal_try_to_free_buffers(journal, folio);
	else
		return try_to_free_buffers(folio);
}

What this means is that if the file has pages which need to be written
out to the final location on disk (e.g., if you are in data=journal
mode, and the modified file may have been written or scheduled to be
written to the journal, but not *yet* to the final location on disk,
or you are using delayed allocation and the file was just recently
written, delayed allocation is enabled, and blocks get allocated but
they haven't been written back yet) --- then the MOVE_EXT ioctl will
return EBUSY.

This is not new behaviour; we've always had this.  Now, 6.16 is when
large folio support landed for ext4, and this can result in some
really wonderful performance improvements.  This may have resulted in
a change in how often recently written files might end up getting
EBUSY when you try to defrag them --- but quite frankly, if this is a
very tiny fraction of the files in your file system, and a subsequent
defrag run will take care of them --- I'd probably think that is a
fair tradeoff.

So... if you take a look at the files that failed trying call MOVE_EXT
--- can you take a look at the timestamps and see if they are
relatively recently written files?

Also, for future reference, if you had disclosed that this was only
happening on a tiny percentage of all of the files in your file
system, and if you checked to see if the specific small number of
files (by percentage) that were failing could be defragged later, and
checked the timestamps, that would have been really useful data which
would have allowed you (and me) to waste a lot less time.

Cheers,

					- Ted