linux-ext4 - Re: [PATCH v3 1/2] ext4: fix fast commit inode enqueueing during a full journal commit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <877cdusk75.fsf@brahms.olymp>
Date: Tue, 09 Jul 2024 15:39:58 +0100
From: Luis Henriques <luis.henriques@...ux.dev>
To: "Theodore Ts'o" <tytso@....edu>
Cc: Andreas Dilger <adilger@...ger.ca>,  Jan Kara <jack@...e.cz>,  Harshad
 Shirwadkar <harshadshirwadkar@...il.com>,  linux-ext4@...r.kernel.org,
  linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 1/2] ext4: fix fast commit inode enqueueing during a
 full journal commit

On Mon, Jul 08 2024, Theodore Ts'o wrote:

> On Wed, May 29, 2024 at 10:20:29AM +0100, Luis Henriques (SUSE) wrote:
>> When a full journal commit is on-going, any fast commit has to be enqueued
>> into a different queue: FC_Q_STAGING instead of FC_Q_MAIN.  This enqueueing
>> is done only once, i.e. if an inode is already queued in a previous fast
>> commit entry it won't be enqueued again.  However, if a full commit starts
>> _after_ the inode is enqueued into FC_Q_MAIN, the next fast commit needs to
>> be done into FC_Q_STAGING.  And this is not being done in function
>> ext4_fc_track_template().
>> 
>> This patch fixes the issue by re-enqueuing an inode into the STAGING queue
>> during the fast commit clean-up callback if it has a tid (i_sync_tid)
>> greater than the one being handled.  The STAGING queue will then be spliced
>> back into MAIN.
>> 
>> This bug was found using fstest generic/047.  This test creates several 32k
>> bytes files, sync'ing each of them after it's creation, and then shutting
>> down the filesystem.  Some data may be loss in this operation; for example a
>> file may have it's size truncated to zero.
>> 
>> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@...ux.dev>
>
> This patch is causing a regression for the test generic/472
> generic/496 generic/643 if fast_commit is enabled.  So using the
> ext4/adv or ext4/fast_commit configuration, e.g:
>
> % kvm-xfstests  -c ext4/fast_commit  generic/472 generic/496 generic/643
>
> For all of these test, the failures seem to involve the swapon command
> erroring out:
>
>     --- tests/generic/496.out   2024-06-13 18:57:39.000000000 -0400
>     +++ /results/ext4/results-fast_commit/generic/496.out.bad   2024-07-08 23:46:39.720
>     @@ -1,3 +1,4 @@
>      QA output created by 496
>      fallocate swap
>      mixed swap
>     +swapon: Invalid argument
>     ...
>
> but it's unclear why this patch would affect swapon.

OK, that's... embarrassing.  I should have caught these failures :-(

> I've never been able to see generic/047 failure in any of my ext4/dev
> testing, nor in any of my daily fs-next CI testing.  So for that
> reason, I'm going to drop this patch from my tree.

There's nothing special about my test environment.  I can reproduce the
generic/047 failure (although not 100% of the times) by running it
manually in a virtme-ng test environment, using MKFS_OPTIONS="-O fast_commit".
Here's what I see when running it:

FSTYP         -- ext4
PLATFORM      -- Linux/x86_64 virtme-ng 6.10.0-rc7+ #269 SMP PREEMPT_DYNAMIC Tue Jul  9 14:24:22 WEST 2024
MKFS_OPTIONS  -- -F -O fast_commit /dev/vdb1
MOUNT_OPTIONS -- -o acl,user_xattr /dev/vdb1 /tmp/mnt/scratch

generic/047 162s ... - output mismatch (see [...]/testing/xfstests-dev/results//generic/047.out.bad)
    --- tests/generic/047.out   2021-01-11 12:08:14.972458324 +0000
    +++ [...]/testing/xfstests-dev/results//generic/047.out.bad   2024-07-09 14:28:36.626435948 +0100
    @@ -1 +1,2 @@
     QA output created by 047
    +file /tmp/mnt/scratch/944 has incorrect size - fsync failed
    ...
    (Run 'diff -u [...]/testing/xfstests-dev/tests/generic/047.out [...]/testing/xfstests-dev/results//generic/047.out.bad'  to see the entire diff)
Ran: generic/047
Failures: generic/047
Failed 1 of 1 tests

> The second patch in this series appears to be independent at least
> from a logical perspective --- although a minor change is needed to
> resolve a merge conflict after dropping this change.
>
> Luis, Harshad, could you look in this failure and then resubmit once
> it's been fixed?  Thanks!!  Also, Luis, can you give more details
> about the generic/047 failure that you had seen?  Is it one of those
> flaky tests where you need to run the test dozens or hundreds of time
> to see the failure?


So, I've done some quick tests, but I'll need some more time to dig into
it.  And this is what I _think_ it's happening:

When activating a swap file, the kernel forces an fsync, calling
ext4_sync_file() which will then call ext4_fc_commit() and, eventually,
the ext4_fc_cleanup().

With this patch an inode may be re-enqueued into the STAGING queue and
then spliced back into MAIN; and that's exactly what I see happening.

Later, still on the swap activation path, ext4_set_iomap() will be called
and will do this:

	if (ext4_inode_datasync_dirty(inode) ||
	    offset + length > i_size_read(inode))
		iomap->flags |= IOMAP_F_DIRTY;

'ext4_inode_datasync_dirty()' will be true because '->i_fc_list' is not
empty.  And that's why the swapoff will fail.

Anyway, I'll try to figure out what's missing here (or what's wrong with
my patch).

Cheers,
-- 
Luís