lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aOPCAzx0diQy7lFN@li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com>
Date: Mon, 6 Oct 2025 18:50:03 +0530
From: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
To: Zorro Lang <zlang@...hat.com>
Cc: fstests@...r.kernel.org, Ritesh Harjani <ritesh.list@...il.com>,
        djwong@...nel.org, john.g.garry@...cle.com, tytso@....edu,
        linux-xfs@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-ext4@...r.kernel.org
Subject: Re: [PATCH v7 04/12] ltp/fsx.c: Add atomic writes support to fsx

On Sun, Oct 05, 2025 at 11:39:56PM +0800, Zorro Lang wrote:
> On Sun, Oct 05, 2025 at 06:27:24PM +0530, Ojaswin Mujoo wrote:
> > On Sat, Oct 04, 2025 at 01:19:32AM +0800, Zorro Lang wrote:
> > > On Thu, Oct 02, 2025 at 11:26:45PM +0530, Ojaswin Mujoo wrote:
> > > > On Sun, Sep 28, 2025 at 09:19:24PM +0800, Zorro Lang wrote:
> > > > > On Fri, Sep 19, 2025 at 12:17:57PM +0530, Ojaswin Mujoo wrote:
> > > > > > Implement atomic write support to help fuzz atomic writes
> > > > > > with fsx.
> > > > > > 
> > > > > > Suggested-by: Ritesh Harjani (IBM) <ritesh.list@...il.com>
> > > > > > Reviewed-by: Darrick J. Wong <djwong@...nel.org>
> > > > > > Reviewed-by: John Garry <john.g.garry@...cle.com>
> > > > > > Signed-off-by: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
> > > > > > ---
> > > > > 
> > > > > Hmm... this patch causes more regular fsx test cases fail on old kernel,
> > > > > (e.g. g/760, g/617, g/263 ...) except set "FSX_AVOID=-a". Is there a way
> > > > > to disable "atomic write" automatically if it's not supported by current
> > > > > system?
> > > > 
> > > > Hi Zorro, 
> > > > Sorry for being late, I've been on vacation this week.
> > > > 
> > > > Yes so by design we should be automatically disabling atomic writes when
> > > > they are not supported by the stack but seems like the issue is that
> > > > when we do disable it we print some extra messages to stdout/err which
> > > > show up in the xfstests output causing failure.
> > > > 
> > > > I can think of 2 ways around this:
> > > > 
> > > > 1. Don't print anything and just silently drop atomic writes if stack
> > > > doesn't support them.
> > > > 
> > > > 2. Make atomic writes as a default off instead of default on feature but
> > > > his loses a bit of coverage as existing tests wont get atomic write
> > > > testing free of cost any more.
> > > 
> > > Hi Ojaswin,
> > > 
> > > Please have a nice vacation :)
> > > 
> > > It's not the "extra messages" cause failure, those "quiet" failures can be fixed
> > > by:
> > 
> > Oh okay got it.
> > 
> > > 
> > > diff --git a/ltp/fsx.c b/ltp/fsx.c
> > > index bdb87ca90..0a035b37b 100644
> > > --- a/ltp/fsx.c
> > > +++ b/ltp/fsx.c
> > > @@ -1847,8 +1847,9 @@ int test_atomic_writes(void) {
> > >         struct statx stx;
> > >  
> > >         if (o_direct != O_DIRECT) {
> > > -               fprintf(stderr, "main: atomic writes need O_DIRECT (-Z), "
> > > -                               "disabling!\n");
> > > +               if (!quiet)
> > > +                       fprintf(stderr, "main: atomic writes need O_DIRECT (-Z), "
> > > +                                       "disabling!\n");
> > >                 return 0;
> > >         }
> > >  
> > > @@ -1867,8 +1868,9 @@ int test_atomic_writes(void) {
> > >                 return 1;
> > >         }
> > >  
> > > -       fprintf(stderr, "main: IO Stack does not support "
> > > -                       "atomic writes, disabling!\n");
> > > +       if (!quiet)
> > > +               fprintf(stderr, "main: IO Stack does not support "
> > > +                               "atomic writes, disabling!\n");
> > >         return 0;
> > >  }
> > 
> > > 
> > > But I hit more read or write failures e.g. [1], this failure can't be
> > > reproduced with FSX_AVOID=-a. Is it a atomic write bug or an unexpected
> > > test failure?
> > > 
> > > Thanks,
> > > Zorro
> > > 
> > 
> > <...>
> > 
> > > +244(244 mod 256): SKIPPED (no operation)
> > > +245(245 mod 256): FALLOC   0x695c5 thru 0x6a2e6	(0xd21 bytes) INTERIOR
> > > +246(246 mod 256): MAPWRITE 0x5ac00 thru 0x5b185	(0x586 bytes)
> > > +247(247 mod 256): WRITE    0x31200 thru 0x313ff	(0x200 bytes)
> > > +248(248 mod 256): SKIPPED (no operation)
> > > +249(249 mod 256): TRUNCATE DOWN	from 0x78242 to 0xf200	******WWWW
> > > +250(250 mod 256): FALLOC   0x65000 thru 0x66f26	(0x1f26 bytes) PAST_EOF
> > > +251(251 mod 256): WRITE    0x45400 thru 0x467ff	(0x1400 bytes) HOLE	***WWWW
> > > +252(252 mod 256): SKIPPED (no operation)
> > > +253(253 mod 256): SKIPPED (no operation)
> > > +254(254 mod 256): MAPWRITE 0x4be00 thru 0x4daee	(0x1cef bytes)
> > > +255(255 mod 256): MAPREAD  0xc000 thru 0xcae9	(0xaea bytes)
> > > +256(  0 mod 256): READ     0x3e000 thru 0x3efff	(0x1000 bytes)
> > > +257(  1 mod 256): SKIPPED (no operation)
> > > +258(  2 mod 256): INSERT 0x45000 thru 0x45fff	(0x1000 bytes)
> > > +259(  3 mod 256): ZERO     0x1d7d5 thru 0x1f399	(0x1bc5 bytes)	******ZZZZ
> > > +260(  4 mod 256): TRUNCATE DOWN	from 0x4eaef to 0x11200	******WWWW
> > > +261(  5 mod 256): WRITE    0x43000 thru 0x43fff	(0x1000 bytes) HOLE	***WWWW
> > > +262(  6 mod 256): WRITE    0x2200 thru 0x31ff	(0x1000 bytes)
> > > +263(  7 mod 256): WRITE    0x15000 thru 0x15fff	(0x1000 bytes)
> > > +264(  8 mod 256): WRITE    0x2e400 thru 0x2e7ff	(0x400 bytes)
> > > +265(  9 mod 256): COPY 0xd000 thru 0xdfff	(0x1000 bytes) to 0x1d800 thru 0x1e7ff	******EEEE
> > > +266( 10 mod 256): CLONE 0x2a000 thru 0x2afff	(0x1000 bytes) to 0x21000 thru 0x21fff
> > > +267( 11 mod 256): MAPREAD  0x31000 thru 0x31d0a	(0xd0b bytes)
> > > +268( 12 mod 256): SKIPPED (no operation)
> > > +269( 13 mod 256): WRITE    0x25000 thru 0x25fff	(0x1000 bytes)
> > > +270( 14 mod 256): SKIPPED (no operation)
> > > +271( 15 mod 256): MAPREAD  0x30000 thru 0x30577	(0x578 bytes)
> > > +272( 16 mod 256): PUNCH    0x1a267 thru 0x1c093	(0x1e2d bytes)
> > > +273( 17 mod 256): MAPREAD  0x1f000 thru 0x1f9c9	(0x9ca bytes)
> > > +274( 18 mod 256): WRITE    0x40800 thru 0x40dff	(0x600 bytes)
> > > +275( 19 mod 256): SKIPPED (no operation)
> > > +276( 20 mod 256): MAPWRITE 0x20600 thru 0x22115	(0x1b16 bytes)
> > > +277( 21 mod 256): MAPWRITE 0x3d000 thru 0x3ee5a	(0x1e5b bytes)
> > > +278( 22 mod 256): WRITE    0x2ee00 thru 0x2efff	(0x200 bytes)
> > > +279( 23 mod 256): WRITE    0x76200 thru 0x769ff	(0x800 bytes) HOLE
> > > +280( 24 mod 256): SKIPPED (no operation)
> > > +281( 25 mod 256): SKIPPED (no operation)
> > > +282( 26 mod 256): MAPREAD  0xa000 thru 0xa5e7	(0x5e8 bytes)
> > > +283( 27 mod 256): SKIPPED (no operation)
> > > +284( 28 mod 256): SKIPPED (no operation)
> > > +285( 29 mod 256): SKIPPED (no operation)
> > > +286( 30 mod 256): SKIPPED (no operation)
> > > +287( 31 mod 256): COLLAPSE 0x11000 thru 0x11fff	(0x1000 bytes)
> > > +288( 32 mod 256): COPY 0x5d000 thru 0x5dfff	(0x1000 bytes) to 0x4ca00 thru 0x4d9ff
> > > +289( 33 mod 256): TRUNCATE DOWN	from 0x75a00 to 0x1e400
> > > +290( 34 mod 256): MAPREAD  0x1c000 thru 0x1d802	(0x1803 bytes)	***RRRR***
> > > +Log of operations saved to "/mnt/xfstests/test/junk.fsxops"; replay with --replay-ops
> > > +Correct content saved for comparison
> > > +(maybe hexdump "/mnt/xfstests/test/junk" vs "/mnt/xfstests/test/junk.fsxgood")
> > > 
> > > Thanks,
> > > Zorro
> > 
> > Hi Zorro, just to confirm is this on an older kernel that doesnt support
> > RWF_ATOMIC or on a kernle that does support it.
> 
> I tested on linux 6.16 and current latest linux v6.17+ (will be 6.18-rc1 later).
> About the RWF_ATOMIC flag in my system:
> 
> # grep -rsn RWF_ATOMIC /usr/include/
> /usr/include/bits/uio-ext.h:51:#define RWF_ATOMIC       0x00000040 /* Write is to be issued with torn-write
> /usr/include/linux/fs.h:424:#define RWF_ATOMIC  ((__kernel_rwf_t)0x00000040)
> /usr/include/linux/fs.h:431:                     RWF_APPEND | RWF_NOAPPEND | RWF_ATOMIC |\
> /usr/include/xfs/linux.h:236:#ifndef RWF_ATOMIC
> /usr/include/xfs/linux.h:237:#define RWF_ATOMIC ((__kernel_rwf_t)0x00000040)

Hi Zorro, thanks for checking this. So correct me if im wrong but I
understand that you have run this test on an atomic writes enabled 
kernel where the stack also supports atomic writes.

Looking at the bad data log:

	+READ BAD DATA: offset = 0x1c000, size = 0x1803, fname = /mnt/xfstests/test/junk
	+OFFSET      GOOD    BAD     RANGE
	+0x1c000     0x0000  0xcdcd  0x0
	+operation# (mod 256) for the bad data may be 205

We see that 0x0000 was expected but we got 0xcdcd. Now the operation
that caused this is indicated to be 205, but looking at that operation:

+205(205 mod 256): ZERO     0x6dbe6 thru 0x6e6aa	(0xac5 bytes)

This doesn't even overlap the range that is bad. (0x1c000 to 0x1c00f).
Infact, it does seem like an unlikely coincidence that the actual data
in the bad range is 0xcdcd which is something xfs_io -c "pwrite" writes
to default (fsx writes random data in even offsets and operation num in
odd).

I am able to replicate this but only on XFS but not on ext4 (atleast not
in 20 runs).  I'm trying to better understand if this is a test issue or
not. Will keep you update.

I'm not sure how this will affect the upcoming release, if you want
shall I send a small patch to make the atomic writes feature default off
instead of default on till we root cause this?

Regards,
Ojaswin

> 
> Thanks,
> Zorro
> 
> > 
> > Regards,
> > ojaswin
> > 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ