[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aOPCAzx0diQy7lFN@li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com>
Date: Mon, 6 Oct 2025 18:50:03 +0530
From: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
To: Zorro Lang <zlang@...hat.com>
Cc: fstests@...r.kernel.org, Ritesh Harjani <ritesh.list@...il.com>,
djwong@...nel.org, john.g.garry@...cle.com, tytso@....edu,
linux-xfs@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-ext4@...r.kernel.org
Subject: Re: [PATCH v7 04/12] ltp/fsx.c: Add atomic writes support to fsx
On Sun, Oct 05, 2025 at 11:39:56PM +0800, Zorro Lang wrote:
> On Sun, Oct 05, 2025 at 06:27:24PM +0530, Ojaswin Mujoo wrote:
> > On Sat, Oct 04, 2025 at 01:19:32AM +0800, Zorro Lang wrote:
> > > On Thu, Oct 02, 2025 at 11:26:45PM +0530, Ojaswin Mujoo wrote:
> > > > On Sun, Sep 28, 2025 at 09:19:24PM +0800, Zorro Lang wrote:
> > > > > On Fri, Sep 19, 2025 at 12:17:57PM +0530, Ojaswin Mujoo wrote:
> > > > > > Implement atomic write support to help fuzz atomic writes
> > > > > > with fsx.
> > > > > >
> > > > > > Suggested-by: Ritesh Harjani (IBM) <ritesh.list@...il.com>
> > > > > > Reviewed-by: Darrick J. Wong <djwong@...nel.org>
> > > > > > Reviewed-by: John Garry <john.g.garry@...cle.com>
> > > > > > Signed-off-by: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
> > > > > > ---
> > > > >
> > > > > Hmm... this patch causes more regular fsx test cases fail on old kernel,
> > > > > (e.g. g/760, g/617, g/263 ...) except set "FSX_AVOID=-a". Is there a way
> > > > > to disable "atomic write" automatically if it's not supported by current
> > > > > system?
> > > >
> > > > Hi Zorro,
> > > > Sorry for being late, I've been on vacation this week.
> > > >
> > > > Yes so by design we should be automatically disabling atomic writes when
> > > > they are not supported by the stack but seems like the issue is that
> > > > when we do disable it we print some extra messages to stdout/err which
> > > > show up in the xfstests output causing failure.
> > > >
> > > > I can think of 2 ways around this:
> > > >
> > > > 1. Don't print anything and just silently drop atomic writes if stack
> > > > doesn't support them.
> > > >
> > > > 2. Make atomic writes as a default off instead of default on feature but
> > > > his loses a bit of coverage as existing tests wont get atomic write
> > > > testing free of cost any more.
> > >
> > > Hi Ojaswin,
> > >
> > > Please have a nice vacation :)
> > >
> > > It's not the "extra messages" cause failure, those "quiet" failures can be fixed
> > > by:
> >
> > Oh okay got it.
> >
> > >
> > > diff --git a/ltp/fsx.c b/ltp/fsx.c
> > > index bdb87ca90..0a035b37b 100644
> > > --- a/ltp/fsx.c
> > > +++ b/ltp/fsx.c
> > > @@ -1847,8 +1847,9 @@ int test_atomic_writes(void) {
> > > struct statx stx;
> > >
> > > if (o_direct != O_DIRECT) {
> > > - fprintf(stderr, "main: atomic writes need O_DIRECT (-Z), "
> > > - "disabling!\n");
> > > + if (!quiet)
> > > + fprintf(stderr, "main: atomic writes need O_DIRECT (-Z), "
> > > + "disabling!\n");
> > > return 0;
> > > }
> > >
> > > @@ -1867,8 +1868,9 @@ int test_atomic_writes(void) {
> > > return 1;
> > > }
> > >
> > > - fprintf(stderr, "main: IO Stack does not support "
> > > - "atomic writes, disabling!\n");
> > > + if (!quiet)
> > > + fprintf(stderr, "main: IO Stack does not support "
> > > + "atomic writes, disabling!\n");
> > > return 0;
> > > }
> >
> > >
> > > But I hit more read or write failures e.g. [1], this failure can't be
> > > reproduced with FSX_AVOID=-a. Is it a atomic write bug or an unexpected
> > > test failure?
> > >
> > > Thanks,
> > > Zorro
> > >
> >
> > <...>
> >
> > > +244(244 mod 256): SKIPPED (no operation)
> > > +245(245 mod 256): FALLOC 0x695c5 thru 0x6a2e6 (0xd21 bytes) INTERIOR
> > > +246(246 mod 256): MAPWRITE 0x5ac00 thru 0x5b185 (0x586 bytes)
> > > +247(247 mod 256): WRITE 0x31200 thru 0x313ff (0x200 bytes)
> > > +248(248 mod 256): SKIPPED (no operation)
> > > +249(249 mod 256): TRUNCATE DOWN from 0x78242 to 0xf200 ******WWWW
> > > +250(250 mod 256): FALLOC 0x65000 thru 0x66f26 (0x1f26 bytes) PAST_EOF
> > > +251(251 mod 256): WRITE 0x45400 thru 0x467ff (0x1400 bytes) HOLE ***WWWW
> > > +252(252 mod 256): SKIPPED (no operation)
> > > +253(253 mod 256): SKIPPED (no operation)
> > > +254(254 mod 256): MAPWRITE 0x4be00 thru 0x4daee (0x1cef bytes)
> > > +255(255 mod 256): MAPREAD 0xc000 thru 0xcae9 (0xaea bytes)
> > > +256( 0 mod 256): READ 0x3e000 thru 0x3efff (0x1000 bytes)
> > > +257( 1 mod 256): SKIPPED (no operation)
> > > +258( 2 mod 256): INSERT 0x45000 thru 0x45fff (0x1000 bytes)
> > > +259( 3 mod 256): ZERO 0x1d7d5 thru 0x1f399 (0x1bc5 bytes) ******ZZZZ
> > > +260( 4 mod 256): TRUNCATE DOWN from 0x4eaef to 0x11200 ******WWWW
> > > +261( 5 mod 256): WRITE 0x43000 thru 0x43fff (0x1000 bytes) HOLE ***WWWW
> > > +262( 6 mod 256): WRITE 0x2200 thru 0x31ff (0x1000 bytes)
> > > +263( 7 mod 256): WRITE 0x15000 thru 0x15fff (0x1000 bytes)
> > > +264( 8 mod 256): WRITE 0x2e400 thru 0x2e7ff (0x400 bytes)
> > > +265( 9 mod 256): COPY 0xd000 thru 0xdfff (0x1000 bytes) to 0x1d800 thru 0x1e7ff ******EEEE
> > > +266( 10 mod 256): CLONE 0x2a000 thru 0x2afff (0x1000 bytes) to 0x21000 thru 0x21fff
> > > +267( 11 mod 256): MAPREAD 0x31000 thru 0x31d0a (0xd0b bytes)
> > > +268( 12 mod 256): SKIPPED (no operation)
> > > +269( 13 mod 256): WRITE 0x25000 thru 0x25fff (0x1000 bytes)
> > > +270( 14 mod 256): SKIPPED (no operation)
> > > +271( 15 mod 256): MAPREAD 0x30000 thru 0x30577 (0x578 bytes)
> > > +272( 16 mod 256): PUNCH 0x1a267 thru 0x1c093 (0x1e2d bytes)
> > > +273( 17 mod 256): MAPREAD 0x1f000 thru 0x1f9c9 (0x9ca bytes)
> > > +274( 18 mod 256): WRITE 0x40800 thru 0x40dff (0x600 bytes)
> > > +275( 19 mod 256): SKIPPED (no operation)
> > > +276( 20 mod 256): MAPWRITE 0x20600 thru 0x22115 (0x1b16 bytes)
> > > +277( 21 mod 256): MAPWRITE 0x3d000 thru 0x3ee5a (0x1e5b bytes)
> > > +278( 22 mod 256): WRITE 0x2ee00 thru 0x2efff (0x200 bytes)
> > > +279( 23 mod 256): WRITE 0x76200 thru 0x769ff (0x800 bytes) HOLE
> > > +280( 24 mod 256): SKIPPED (no operation)
> > > +281( 25 mod 256): SKIPPED (no operation)
> > > +282( 26 mod 256): MAPREAD 0xa000 thru 0xa5e7 (0x5e8 bytes)
> > > +283( 27 mod 256): SKIPPED (no operation)
> > > +284( 28 mod 256): SKIPPED (no operation)
> > > +285( 29 mod 256): SKIPPED (no operation)
> > > +286( 30 mod 256): SKIPPED (no operation)
> > > +287( 31 mod 256): COLLAPSE 0x11000 thru 0x11fff (0x1000 bytes)
> > > +288( 32 mod 256): COPY 0x5d000 thru 0x5dfff (0x1000 bytes) to 0x4ca00 thru 0x4d9ff
> > > +289( 33 mod 256): TRUNCATE DOWN from 0x75a00 to 0x1e400
> > > +290( 34 mod 256): MAPREAD 0x1c000 thru 0x1d802 (0x1803 bytes) ***RRRR***
> > > +Log of operations saved to "/mnt/xfstests/test/junk.fsxops"; replay with --replay-ops
> > > +Correct content saved for comparison
> > > +(maybe hexdump "/mnt/xfstests/test/junk" vs "/mnt/xfstests/test/junk.fsxgood")
> > >
> > > Thanks,
> > > Zorro
> >
> > Hi Zorro, just to confirm is this on an older kernel that doesnt support
> > RWF_ATOMIC or on a kernle that does support it.
>
> I tested on linux 6.16 and current latest linux v6.17+ (will be 6.18-rc1 later).
> About the RWF_ATOMIC flag in my system:
>
> # grep -rsn RWF_ATOMIC /usr/include/
> /usr/include/bits/uio-ext.h:51:#define RWF_ATOMIC 0x00000040 /* Write is to be issued with torn-write
> /usr/include/linux/fs.h:424:#define RWF_ATOMIC ((__kernel_rwf_t)0x00000040)
> /usr/include/linux/fs.h:431: RWF_APPEND | RWF_NOAPPEND | RWF_ATOMIC |\
> /usr/include/xfs/linux.h:236:#ifndef RWF_ATOMIC
> /usr/include/xfs/linux.h:237:#define RWF_ATOMIC ((__kernel_rwf_t)0x00000040)
Hi Zorro, thanks for checking this. So correct me if im wrong but I
understand that you have run this test on an atomic writes enabled
kernel where the stack also supports atomic writes.
Looking at the bad data log:
+READ BAD DATA: offset = 0x1c000, size = 0x1803, fname = /mnt/xfstests/test/junk
+OFFSET GOOD BAD RANGE
+0x1c000 0x0000 0xcdcd 0x0
+operation# (mod 256) for the bad data may be 205
We see that 0x0000 was expected but we got 0xcdcd. Now the operation
that caused this is indicated to be 205, but looking at that operation:
+205(205 mod 256): ZERO 0x6dbe6 thru 0x6e6aa (0xac5 bytes)
This doesn't even overlap the range that is bad. (0x1c000 to 0x1c00f).
Infact, it does seem like an unlikely coincidence that the actual data
in the bad range is 0xcdcd which is something xfs_io -c "pwrite" writes
to default (fsx writes random data in even offsets and operation num in
odd).
I am able to replicate this but only on XFS but not on ext4 (atleast not
in 20 runs). I'm trying to better understand if this is a test issue or
not. Will keep you update.
I'm not sure how this will affect the upcoming release, if you want
shall I send a small patch to make the atomic writes feature default off
instead of default on till we root cause this?
Regards,
Ojaswin
>
> Thanks,
> Zorro
>
> >
> > Regards,
> > ojaswin
> >
>
Powered by blists - more mailing lists