linux-kernel - Re: [PATCH v7 04/12] ltp/fsx.c: Add atomic writes support to fsx

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aPdgR5gdA3l3oTLQ@li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com>
Date: Tue, 21 Oct 2025 15:58:23 +0530
From: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
To: John Garry <john.g.garry@...cle.com>
Cc: Zorro Lang <zlang@...hat.com>, fstests@...r.kernel.org,
        Ritesh Harjani <ritesh.list@...il.com>, djwong@...nel.org,
        tytso@....edu, linux-xfs@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-ext4@...r.kernel.org
Subject: Re: [PATCH v7 04/12] ltp/fsx.c: Add atomic writes support to fsx

On Mon, Oct 20, 2025 at 11:33:40AM +0100, John Garry wrote:
> On 06/10/2025 14:20, Ojaswin Mujoo wrote:
> > Hi Zorro, thanks for checking this. So correct me if im wrong but I
> > understand that you have run this test on an atomic writes enabled
> > kernel where the stack also supports atomic writes.
> > 
> > Looking at the bad data log:
> > 
> > 	+READ BAD DATA: offset = 0x1c000, size = 0x1803, fname = /mnt/xfstests/test/junk
> > 	+OFFSET      GOOD    BAD     RANGE
> > 	+0x1c000     0x0000  0xcdcd  0x0
> > 	+operation# (mod 256) for the bad data may be 205
> > 
> > We see that 0x0000 was expected but we got 0xcdcd. Now the operation
> > that caused this is indicated to be 205, but looking at that operation:
> > 
> > +205(205 mod 256): ZERO     0x6dbe6 thru 0x6e6aa	(0xac5 bytes)
> > 
> > This doesn't even overlap the range that is bad. (0x1c000 to 0x1c00f).
> > Infact, it does seem like an unlikely coincidence that the actual data
> > in the bad range is 0xcdcd which is something xfs_io -c "pwrite" writes
> > to default (fsx writes random data in even offsets and operation num in
> > odd).
> > 
> > I am able to replicate this but only on XFS but not on ext4 (atleast not
> > in 20 runs).  I'm trying to better understand if this is a test issue or
> > not. Will keep you update.
> 
> 
> Hi Ojaswin,
> 
> Sorry for the very slow response.
> 
> Are you still checking this issue?
> 
> To replicate, should I just take latest xfs kernel and run this series on
> top of latest xfstests? Is it 100% reproducible?
> 
> Thanks,
> John

Hi John,

Yes Im looking into it but I'm now starting to run into some reflink/cow
based concepts that are taking time to understand. Let me share what I
have till now:

So the test.sh that I'm using can be found here [1] which just uses an
fsx replay file (which replays all operations) present in the same repo
[2]. If you see the replay file, there are a bunch of random operations
followed by the last 2 commented out operations:

# copy_range 0xd000 0x1000 0x1d800 0x44000   <--- # operations <start> <len> <dest of copy> <filesize (can be ignored)>
# mapread 0x1e000 0x1000 0x1e400 *

The copy_range here is the one which causes (or exposes) the corruption
at 0x1e800 (the end of copy range destination gets corrupted).

To have more control, I commented these 2 operations and am doing it by
hand in the test.sh file, with xfs_io. I'm also using a non atomic write
device so we only have S/W fallback.

Now some observations:

1. The copy_range operations is actually copying from a hole to a hole,
so we should be reading all 0s. But What I see is the following happening:

  vfs_copy_file_range
   do_splice_direct
    do_splice_direct_actor
     do_splice_read
       # Adds the folio at src offset to the pipe. I confirmed this is all 0x0.
     splice_direct_to_actor
      direct_splice_actor
       do_splice_from
        iter_file_splice_write
         xfs_file_write_iter
          xfs_file_buffered_write
           iomap_file_buferred_write
            iomap_iter
             xfs_buferred_write_iomap_begin
               # Here we correctly see that there is noting at the
               # destination in data fork, but somehow we find a mapped
               # extent in cow fork which is returned to iomap.
             iomap_write_iter
              __iomap_write_begin
                # Here we notice folio is not uptodate and call
                # iomap_read_folio_range() to read from the cow_fork
                # mapping we found earlier. This results in folio having
                # incorrect data at 0x1e800 offset.

 So it seems like the fsx operations might be corrupting the cow fork state
 somehow leading to stale data exposure. 

2. If we disable atomic writes we dont hit the issue.

3. If I do a -c pread of the destination range before doing the
copy_range operation then I don't see the corruption any more.

I'm now trying to figure out why the mapping returned is not IOMAP_HOLE
as it should be. I don't know the COW path in xfs so there are some gaps
in my understanding. Let me know if you need any other information since
I'm reliably able to replicate on 6.17.0-rc4.

[1]
https://github.com/OjaswinM/fsx-aw-issue/tree/master

[2] https://github.com/OjaswinM/fsx-aw-issue/blob/master/repro.fsxops

regards,
ojaswin