linux-kernel - Re: [PATCH v7 04/12] ltp/fsx.c: Add atomic writes support to fsx

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aOTkVmyEV8i_eQx6@li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com>
Date: Tue, 7 Oct 2025 15:28:46 +0530
From: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
To: Zorro Lang <zlang@...hat.com>
Cc: fstests@...r.kernel.org, Ritesh Harjani <ritesh.list@...il.com>,
        djwong@...nel.org, john.g.garry@...cle.com, tytso@....edu,
        linux-xfs@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-ext4@...r.kernel.org
Subject: Re: [PATCH v7 04/12] ltp/fsx.c: Add atomic writes support to fsx

On Mon, Oct 06, 2025 at 06:50:03PM +0530, Ojaswin Mujoo wrote:
> On Sun, Oct 05, 2025 at 11:39:56PM +0800, Zorro Lang wrote:
> > On Sun, Oct 05, 2025 at 06:27:24PM +0530, Ojaswin Mujoo wrote:
> > > On Sat, Oct 04, 2025 at 01:19:32AM +0800, Zorro Lang wrote:
> > > > On Thu, Oct 02, 2025 at 11:26:45PM +0530, Ojaswin Mujoo wrote:
> > > > > On Sun, Sep 28, 2025 at 09:19:24PM +0800, Zorro Lang wrote:
> > > > > > On Fri, Sep 19, 2025 at 12:17:57PM +0530, Ojaswin Mujoo wrote:
> > > > > > > Implement atomic write support to help fuzz atomic writes
> > > > > > > with fsx.
> > > > > > > 
> > > > > > > Suggested-by: Ritesh Harjani (IBM) <ritesh.list@...il.com>
> > > > > > > Reviewed-by: Darrick J. Wong <djwong@...nel.org>
> > > > > > > Reviewed-by: John Garry <john.g.garry@...cle.com>
> > > > > > > Signed-off-by: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
> > > > > > > ---
> > > > > > 
> > > > > > Hmm... this patch causes more regular fsx test cases fail on old kernel,
> > > > > > (e.g. g/760, g/617, g/263 ...) except set "FSX_AVOID=-a". Is there a way
> > > > > > to disable "atomic write" automatically if it's not supported by current
> > > > > > system?
> > > > > 
> > > > > Hi Zorro, 
> > > > > Sorry for being late, I've been on vacation this week.
> > > > > 
> > > > > Yes so by design we should be automatically disabling atomic writes when
> > > > > they are not supported by the stack but seems like the issue is that
> > > > > when we do disable it we print some extra messages to stdout/err which
> > > > > show up in the xfstests output causing failure.
> > > > > 
> > > > > I can think of 2 ways around this:
> > > > > 
> > > > > 1. Don't print anything and just silently drop atomic writes if stack
> > > > > doesn't support them.
> > > > > 
> > > > > 2. Make atomic writes as a default off instead of default on feature but
> > > > > his loses a bit of coverage as existing tests wont get atomic write
> > > > > testing free of cost any more.
> > > > 
> > > > Hi Ojaswin,
> > > > 
> > > > Please have a nice vacation :)
> > > > 
> > > > It's not the "extra messages" cause failure, those "quiet" failures can be fixed
> > > > by:
> > > 
> > > Oh okay got it.
> > > 
> > > > 
> > > > diff --git a/ltp/fsx.c b/ltp/fsx.c
> > > > index bdb87ca90..0a035b37b 100644
> > > > --- a/ltp/fsx.c
> > > > +++ b/ltp/fsx.c
> > > > @@ -1847,8 +1847,9 @@ int test_atomic_writes(void) {
> > > >         struct statx stx;
> > > >  
> > > >         if (o_direct != O_DIRECT) {
> > > > -               fprintf(stderr, "main: atomic writes need O_DIRECT (-Z), "
> > > > -                               "disabling!\n");
> > > > +               if (!quiet)
> > > > +                       fprintf(stderr, "main: atomic writes need O_DIRECT (-Z), "
> > > > +                                       "disabling!\n");
> > > >                 return 0;
> > > >         }
> > > >  
> > > > @@ -1867,8 +1868,9 @@ int test_atomic_writes(void) {
> > > >                 return 1;
> > > >         }
> > > >  
> > > > -       fprintf(stderr, "main: IO Stack does not support "
> > > > -                       "atomic writes, disabling!\n");
> > > > +       if (!quiet)
> > > > +               fprintf(stderr, "main: IO Stack does not support "
> > > > +                               "atomic writes, disabling!\n");
> > > >         return 0;
> > > >  }
> > > 
> > > > 
> > > > But I hit more read or write failures e.g. [1], this failure can't be
> > > > reproduced with FSX_AVOID=-a. Is it a atomic write bug or an unexpected
> > > > test failure?
> > > > 
> > > > Thanks,
> > > > Zorro
> > > > 
> > > 
> > > <...>
> > > 
> > > > +244(244 mod 256): SKIPPED (no operation)
> > > > +245(245 mod 256): FALLOC   0x695c5 thru 0x6a2e6	(0xd21 bytes) INTERIOR
> > > > +246(246 mod 256): MAPWRITE 0x5ac00 thru 0x5b185	(0x586 bytes)
> > > > +247(247 mod 256): WRITE    0x31200 thru 0x313ff	(0x200 bytes)
> > > > +248(248 mod 256): SKIPPED (no operation)
> > > > +249(249 mod 256): TRUNCATE DOWN	from 0x78242 to 0xf200	******WWWW
> > > > +250(250 mod 256): FALLOC   0x65000 thru 0x66f26	(0x1f26 bytes) PAST_EOF
> > > > +251(251 mod 256): WRITE    0x45400 thru 0x467ff	(0x1400 bytes) HOLE	***WWWW
> > > > +252(252 mod 256): SKIPPED (no operation)
> > > > +253(253 mod 256): SKIPPED (no operation)
> > > > +254(254 mod 256): MAPWRITE 0x4be00 thru 0x4daee	(0x1cef bytes)
> > > > +255(255 mod 256): MAPREAD  0xc000 thru 0xcae9	(0xaea bytes)
> > > > +256(  0 mod 256): READ     0x3e000 thru 0x3efff	(0x1000 bytes)
> > > > +257(  1 mod 256): SKIPPED (no operation)
> > > > +258(  2 mod 256): INSERT 0x45000 thru 0x45fff	(0x1000 bytes)
> > > > +259(  3 mod 256): ZERO     0x1d7d5 thru 0x1f399	(0x1bc5 bytes)	******ZZZZ
> > > > +260(  4 mod 256): TRUNCATE DOWN	from 0x4eaef to 0x11200	******WWWW
> > > > +261(  5 mod 256): WRITE    0x43000 thru 0x43fff	(0x1000 bytes) HOLE	***WWWW
> > > > +262(  6 mod 256): WRITE    0x2200 thru 0x31ff	(0x1000 bytes)
> > > > +263(  7 mod 256): WRITE    0x15000 thru 0x15fff	(0x1000 bytes)
> > > > +264(  8 mod 256): WRITE    0x2e400 thru 0x2e7ff	(0x400 bytes)
> > > > +265(  9 mod 256): COPY 0xd000 thru 0xdfff	(0x1000 bytes) to 0x1d800 thru 0x1e7ff	******EEEE
> > > > +266( 10 mod 256): CLONE 0x2a000 thru 0x2afff	(0x1000 bytes) to 0x21000 thru 0x21fff
> > > > +267( 11 mod 256): MAPREAD  0x31000 thru 0x31d0a	(0xd0b bytes)
> > > > +268( 12 mod 256): SKIPPED (no operation)
> > > > +269( 13 mod 256): WRITE    0x25000 thru 0x25fff	(0x1000 bytes)
> > > > +270( 14 mod 256): SKIPPED (no operation)
> > > > +271( 15 mod 256): MAPREAD  0x30000 thru 0x30577	(0x578 bytes)
> > > > +272( 16 mod 256): PUNCH    0x1a267 thru 0x1c093	(0x1e2d bytes)
> > > > +273( 17 mod 256): MAPREAD  0x1f000 thru 0x1f9c9	(0x9ca bytes)
> > > > +274( 18 mod 256): WRITE    0x40800 thru 0x40dff	(0x600 bytes)
> > > > +275( 19 mod 256): SKIPPED (no operation)
> > > > +276( 20 mod 256): MAPWRITE 0x20600 thru 0x22115	(0x1b16 bytes)
> > > > +277( 21 mod 256): MAPWRITE 0x3d000 thru 0x3ee5a	(0x1e5b bytes)
> > > > +278( 22 mod 256): WRITE    0x2ee00 thru 0x2efff	(0x200 bytes)
> > > > +279( 23 mod 256): WRITE    0x76200 thru 0x769ff	(0x800 bytes) HOLE
> > > > +280( 24 mod 256): SKIPPED (no operation)
> > > > +281( 25 mod 256): SKIPPED (no operation)
> > > > +282( 26 mod 256): MAPREAD  0xa000 thru 0xa5e7	(0x5e8 bytes)
> > > > +283( 27 mod 256): SKIPPED (no operation)
> > > > +284( 28 mod 256): SKIPPED (no operation)
> > > > +285( 29 mod 256): SKIPPED (no operation)
> > > > +286( 30 mod 256): SKIPPED (no operation)
> > > > +287( 31 mod 256): COLLAPSE 0x11000 thru 0x11fff	(0x1000 bytes)
> > > > +288( 32 mod 256): COPY 0x5d000 thru 0x5dfff	(0x1000 bytes) to 0x4ca00 thru 0x4d9ff
> > > > +289( 33 mod 256): TRUNCATE DOWN	from 0x75a00 to 0x1e400
> > > > +290( 34 mod 256): MAPREAD  0x1c000 thru 0x1d802	(0x1803 bytes)	***RRRR***
> > > > +Log of operations saved to "/mnt/xfstests/test/junk.fsxops"; replay with --replay-ops
> > > > +Correct content saved for comparison
> > > > +(maybe hexdump "/mnt/xfstests/test/junk" vs "/mnt/xfstests/test/junk.fsxgood")
> > > > 
> > > > Thanks,
> > > > Zorro
> > > 
> > > Hi Zorro, just to confirm is this on an older kernel that doesnt support
> > > RWF_ATOMIC or on a kernle that does support it.
> > 
> > I tested on linux 6.16 and current latest linux v6.17+ (will be 6.18-rc1 later).
> > About the RWF_ATOMIC flag in my system:
> > 
> > # grep -rsn RWF_ATOMIC /usr/include/
> > /usr/include/bits/uio-ext.h:51:#define RWF_ATOMIC       0x00000040 /* Write is to be issued with torn-write
> > /usr/include/linux/fs.h:424:#define RWF_ATOMIC  ((__kernel_rwf_t)0x00000040)
> > /usr/include/linux/fs.h:431:                     RWF_APPEND | RWF_NOAPPEND | RWF_ATOMIC |\
> > /usr/include/xfs/linux.h:236:#ifndef RWF_ATOMIC
> > /usr/include/xfs/linux.h:237:#define RWF_ATOMIC ((__kernel_rwf_t)0x00000040)
> 
> Hi Zorro, thanks for checking this. So correct me if im wrong but I
> understand that you have run this test on an atomic writes enabled 
> kernel where the stack also supports atomic writes.
> 
> Looking at the bad data log:
> 
> 	+READ BAD DATA: offset = 0x1c000, size = 0x1803, fname = /mnt/xfstests/test/junk
> 	+OFFSET      GOOD    BAD     RANGE
> 	+0x1c000     0x0000  0xcdcd  0x0
> 	+operation# (mod 256) for the bad data may be 205
> 
> We see that 0x0000 was expected but we got 0xcdcd. Now the operation
> that caused this is indicated to be 205, but looking at that operation:
> 
> +205(205 mod 256): ZERO     0x6dbe6 thru 0x6e6aa	(0xac5 bytes)
> 
> This doesn't even overlap the range that is bad. (0x1c000 to 0x1c00f).
> Infact, it does seem like an unlikely coincidence that the actual data
> in the bad range is 0xcdcd which is something xfs_io -c "pwrite" writes
> to default (fsx writes random data in even offsets and operation num in
> odd).
> 
> I am able to replicate this but only on XFS but not on ext4 (atleast not
> in 20 runs).  I'm trying to better understand if this is a test issue or
> not. Will keep you update.
> 
> I'm not sure how this will affect the upcoming release, if you want
> shall I send a small patch to make the atomic writes feature default off
> instead of default on till we root cause this?
> 
> Regards,
> Ojaswin

Hi Zorro,

So I'm able to narrow down the opoerations and replicate it via the
following replay file:

# -----
# replay.fsxops
# -----
write_atomic 0x57000 0x1000 0x69690
write_atomic 0x66000 0x1000 0x4de00
write_atomic 0x18000 0x1000 0x2c800
copy_range 0x20000 0x1000 0xe00 0x70e00
write_atomic 0x18000 0x1000 0x70e00
copy_range 0x21000 0x1000 0x23000 0x74218
truncate 0x0 0x11200 0x4daef *
write_atomic 0x43000 0x1000 0x11200 *
write_atomic 0x15000 0x1000 0x44000
copy_range 0xd000 0x1000 0x1d800 0x44000
mapread 0x1c000 0x1803 0x1e400 *


Command: ./ltp/fsx -N 10000 -o 8192 -l 500000 -r 4096 -t 512 -w 512 -Z -FKuHzI --replay-ops replay.fsxops $MNT/junk

$MNT/junk is always opened O_TRUNC and is an on an XFS FS where the
disk is non-atomic so all RWF_ATOMIC writes are software emulated.

Here are the logs generated for this run:

Seed set to 1
main: filesystem does not support exchange range, disabling!

READ BAD DATA: offset = 0x1c000, size = 0x1803, fname = /mnt/test/junk
OFFSET      GOOD    BAD     RANGE
0x1d000     0x0000  0xf322  0x0
operation# (mod 256) for the bad data may be 243
0x1d001     0x0000  0x22f3  0x1
operation# (mod 256) for the bad data may be 243
0x1d002     0x0000  0xf391  0x2
operation# (mod 256) for the bad data may be 243
0x1d003     0x0000  0x91f3  0x3
<... a few more such lines ..>

LOG DUMP (11 total operations):
openat(AT_FDCWD, "/mnt/test/junk.fsxops", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 7
1(  1 mod 256): WRITE    0x57000 thru 0x57fff   (0x1000 bytes) HOLE     ***WWWW ATOMIC
2(  2 mod 256): WRITE    0x66000 thru 0x66fff   (0x1000 bytes) HOLE ATOMIC
3(  3 mod 256): WRITE    0x18000 thru 0x18fff   (0x1000 bytes) ATOMIC
4(  4 mod 256): COPY 0x20000 thru 0x20fff       (0x1000 bytes) to 0xe00 thru 0x1dff
5(  5 mod 256): WRITE    0x18000 thru 0x18fff   (0x1000 bytes) ATOMIC
6(  6 mod 256): COPY 0x21000 thru 0x21fff       (0x1000 bytes) to 0x23000 thru 0x23fff
7(  7 mod 256): TRUNCATE DOWN   from 0x67000 to 0x11200 ******WWWW
8(  8 mod 256): WRITE    0x43000 thru 0x43fff   (0x1000 bytes) HOLE     ***WWWW ATOMIC
9(  9 mod 256): WRITE    0x15000 thru 0x15fff   (0x1000 bytes) ATOMIC
10( 10 mod 256): COPY 0xd000 thru 0xdfff        (0x1000 bytes) to 0x1d800 thru 0x1e7ff
11( 11 mod 256): MAPREAD  0x1c000 thru 0x1d802  (0x1803 bytes)  ***RRRR***
Log of operations saved to "/mnt/test/junk.fsxops"; replay with --replay-ops
Correct content saved for comparison
(maybe hexdump "/mnt/test/junk" vs "/mnt/test/junk.fsxgood")
+++ exited with 110 +++

We can see that the bad data is detected in the final MAPREAD operation
and and bad offset is at 0x1d000. If we look at the operations dump
above its clear that none of the operations should be modifying the
0x1d000 so we should have been reading 0s but yet we see some junk data
there in the file:

$ hexdump /mnt/test/junk -s 0x1c000 -n0x1020
001c000 0000 0000 0000 0000 0000 0000 0000 0000
*
001d000 22f3 91f3 7ff3 3af3 39f3 23f3 6df3 c2f3
001d010 c5f3 f6f3 a6f3 1ef3 58f3 40f3 32f3 5ff3
001d020

Another thing to not is that I can't reproduce the above on scsi-debug
device.  @Darrick, @John, could this be an issue in kernel?

Regards,
ojaswin
> 
> > 
> > Thanks,
> > Zorro
> > 
> > > 
> > > Regards,
> > > ojaswin
> > > 
> >