[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aPdgR5gdA3l3oTLQ@li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com>
Date: Tue, 21 Oct 2025 15:58:23 +0530
From: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
To: John Garry <john.g.garry@...cle.com>
Cc: Zorro Lang <zlang@...hat.com>, fstests@...r.kernel.org,
Ritesh Harjani <ritesh.list@...il.com>, djwong@...nel.org,
tytso@....edu, linux-xfs@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-ext4@...r.kernel.org
Subject: Re: [PATCH v7 04/12] ltp/fsx.c: Add atomic writes support to fsx
On Mon, Oct 20, 2025 at 11:33:40AM +0100, John Garry wrote:
> On 06/10/2025 14:20, Ojaswin Mujoo wrote:
> > Hi Zorro, thanks for checking this. So correct me if im wrong but I
> > understand that you have run this test on an atomic writes enabled
> > kernel where the stack also supports atomic writes.
> >
> > Looking at the bad data log:
> >
> > +READ BAD DATA: offset = 0x1c000, size = 0x1803, fname = /mnt/xfstests/test/junk
> > +OFFSET GOOD BAD RANGE
> > +0x1c000 0x0000 0xcdcd 0x0
> > +operation# (mod 256) for the bad data may be 205
> >
> > We see that 0x0000 was expected but we got 0xcdcd. Now the operation
> > that caused this is indicated to be 205, but looking at that operation:
> >
> > +205(205 mod 256): ZERO 0x6dbe6 thru 0x6e6aa (0xac5 bytes)
> >
> > This doesn't even overlap the range that is bad. (0x1c000 to 0x1c00f).
> > Infact, it does seem like an unlikely coincidence that the actual data
> > in the bad range is 0xcdcd which is something xfs_io -c "pwrite" writes
> > to default (fsx writes random data in even offsets and operation num in
> > odd).
> >
> > I am able to replicate this but only on XFS but not on ext4 (atleast not
> > in 20 runs). I'm trying to better understand if this is a test issue or
> > not. Will keep you update.
>
>
> Hi Ojaswin,
>
> Sorry for the very slow response.
>
> Are you still checking this issue?
>
> To replicate, should I just take latest xfs kernel and run this series on
> top of latest xfstests? Is it 100% reproducible?
>
> Thanks,
> John
Hi John,
Yes Im looking into it but I'm now starting to run into some reflink/cow
based concepts that are taking time to understand. Let me share what I
have till now:
So the test.sh that I'm using can be found here [1] which just uses an
fsx replay file (which replays all operations) present in the same repo
[2]. If you see the replay file, there are a bunch of random operations
followed by the last 2 commented out operations:
# copy_range 0xd000 0x1000 0x1d800 0x44000 <--- # operations <start> <len> <dest of copy> <filesize (can be ignored)>
# mapread 0x1e000 0x1000 0x1e400 *
The copy_range here is the one which causes (or exposes) the corruption
at 0x1e800 (the end of copy range destination gets corrupted).
To have more control, I commented these 2 operations and am doing it by
hand in the test.sh file, with xfs_io. I'm also using a non atomic write
device so we only have S/W fallback.
Now some observations:
1. The copy_range operations is actually copying from a hole to a hole,
so we should be reading all 0s. But What I see is the following happening:
vfs_copy_file_range
do_splice_direct
do_splice_direct_actor
do_splice_read
# Adds the folio at src offset to the pipe. I confirmed this is all 0x0.
splice_direct_to_actor
direct_splice_actor
do_splice_from
iter_file_splice_write
xfs_file_write_iter
xfs_file_buffered_write
iomap_file_buferred_write
iomap_iter
xfs_buferred_write_iomap_begin
# Here we correctly see that there is noting at the
# destination in data fork, but somehow we find a mapped
# extent in cow fork which is returned to iomap.
iomap_write_iter
__iomap_write_begin
# Here we notice folio is not uptodate and call
# iomap_read_folio_range() to read from the cow_fork
# mapping we found earlier. This results in folio having
# incorrect data at 0x1e800 offset.
So it seems like the fsx operations might be corrupting the cow fork state
somehow leading to stale data exposure.
2. If we disable atomic writes we dont hit the issue.
3. If I do a -c pread of the destination range before doing the
copy_range operation then I don't see the corruption any more.
I'm now trying to figure out why the mapping returned is not IOMAP_HOLE
as it should be. I don't know the COW path in xfs so there are some gaps
in my understanding. Let me know if you need any other information since
I'm reliably able to replicate on 6.17.0-rc4.
[1]
https://github.com/OjaswinM/fsx-aw-issue/tree/master
[2] https://github.com/OjaswinM/fsx-aw-issue/blob/master/repro.fsxops
regards,
ojaswin
Powered by blists - more mailing lists