linux-ext4 - [Bug 220594] Online defragmentation has broken in 6.16

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <bug-220594-13602-a5cboOdpR3@https.bugzilla.kernel.org/>
Date: Mon, 24 Nov 2025 16:13:27 +0000
From: bugzilla-daemon@...nel.org
To: linux-ext4@...r.kernel.org
Subject: [Bug 220594] Online defragmentation has broken in 6.16

https://bugzilla.kernel.org/show_bug.cgi?id=220594

--- Comment #12 from Artem S. Tashkinov (aros@....com) ---
(In reply to Theodore Tso from comment #11)
> So it's not that all files can't be defragged; just *some* files.  Is
> that correct?

That's correct.

> 
> And when I ask whether or not it's reproducible, can you take a
> snapshot of your file system, and then remount the snapshot, and will
> the exact same file that failed before fails on the snapshot?

It still fails on the snapshot.

> 
> And for the files that were failing, if you unmount the file system
> and remount it, can you then defrag the file in question?  If the

No. Tried that thrice.

> answer is yes, this is why bug reports of the form "Online
> defragmentation in 6.16 is broken" is not particularly useful.  And
> it's why I've not spent a lot of time on this bug.  We have
> defragmentation tests in fstests, and they are passing, and I've tried
> running defrag on the snapshot that you sent me, And It Works For Me.

It still doesn't with the Fedora's kernel (now running 6.17.8-200.fc42.x86_64).

> So a broad "it's broken" without any further data, when it most
> manifestly is not broken in my tests, means that if you really want it
> to be fixed, you're going to have to do more of the debugging.

I'd love to help however I cant to get it fixed.

> 
> But now that we know that it's an EBUSY error, it sounds like it's
> some kind of transient thing, and that's why I'm not seeing it when I
> tried running it on your snapshot.
> 
> For example, one of the places where you can get EBUSY in the MOVE_EXT
> ioctl is here:
> 
>                 if (!filemap_release_folio(folio[0], 0) ||
>                     !filemap_release_folio(folio[1], 0)) {
>                         *err = -EBUSY;
>                         goto drop_data_sem;
>                 }
> 
> ... and this ultimately calls ext4_release_folio:
> 
> static bool ext4_release_folio(struct folio *folio, gfp_t wait)
> {
>       struct inode *inode = folio->mapping->host;
>       journal_t *journal = EXT4_JOURNAL(inode);
> 
>       trace_ext4_release_folio(inode, folio);
> 
>       /* Page has dirty journalled data -> cannot release */
>       if (folio_test_checked(folio))
>               return false;
>       if (journal)
>               return jbd2_journal_try_to_free_buffers(journal, folio);
>       else
>               return try_to_free_buffers(folio);
> }
> 
> What this means is that if the file has pages which need to be written
> out to the final location on disk (e.g., if you are in data=journal

Journalling is disabled on all my ext4 partitions.

> mode, and the modified file may have been written or scheduled to be
> written to the journal, but not *yet* to the final location on disk,
> or you are using delayed allocation and the file was just recently
> written, delayed allocation is enabled, and blocks get allocated but
> they haven't been written back yet) --- then the MOVE_EXT ioctl will
> return EBUSY.
> 
> This is not new behaviour; we've always had this.  Now, 6.16 is when
> large folio support landed for ext4, and this can result in some
> really wonderful performance improvements.  This may have resulted in
> a change in how often recently written files might end up getting
> EBUSY when you try to defrag them --- but quite frankly, if this is a
> very tiny fraction of the files in your file system, and a subsequent
> defrag run will take care of them --- I'd probably think that is a
> fair tradeoff.

6.15 didn't have the issue.

subsequent defrag runs don't help. I've tried rebooting multiple times, tried
to defrag in single user mode (booted with `1`), with only systemd running and
journald disabled altogether, so only ~/.bash_history is opened for writing,
nothing else. No dirty buffers to speak of, `sync` does nothing as there's
nothing to flush.

> 
> So... if you take a look at the files that failed trying call MOVE_EXT
> --- can you take a look at the timestamps and see if they are
> relatively recently written files?

I'll check it.

> 
> Also, for future reference, if you had disclosed that this was only
> happening on a tiny percentage of all of the files in your file
> system, and if you checked to see if the specific small number of
> files (by percentage) that were failing could be defragged later, and
> checked the timestamps, that would have been really useful data which
> would have allowed you (and me) to waste a lot less time.
> 
> Cheers,
> 
>                                       - Ted

Thanks!

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.