[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aPdu2DtLSNrI7gfp@bfoster>
Date: Tue, 21 Oct 2025 07:30:32 -0400
From: Brian Foster <bfoster@...hat.com>
To: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
Cc: John Garry <john.g.garry@...cle.com>, Zorro Lang <zlang@...hat.com>,
fstests@...r.kernel.org, Ritesh Harjani <ritesh.list@...il.com>,
djwong@...nel.org, tytso@....edu, linux-xfs@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-ext4@...r.kernel.org
Subject: Re: [PATCH v7 04/12] ltp/fsx.c: Add atomic writes support to fsx
On Tue, Oct 21, 2025 at 03:58:23PM +0530, Ojaswin Mujoo wrote:
> On Mon, Oct 20, 2025 at 11:33:40AM +0100, John Garry wrote:
> > On 06/10/2025 14:20, Ojaswin Mujoo wrote:
> > > Hi Zorro, thanks for checking this. So correct me if im wrong but I
> > > understand that you have run this test on an atomic writes enabled
> > > kernel where the stack also supports atomic writes.
> > >
> > > Looking at the bad data log:
> > >
> > > +READ BAD DATA: offset = 0x1c000, size = 0x1803, fname = /mnt/xfstests/test/junk
> > > +OFFSET GOOD BAD RANGE
> > > +0x1c000 0x0000 0xcdcd 0x0
> > > +operation# (mod 256) for the bad data may be 205
> > >
> > > We see that 0x0000 was expected but we got 0xcdcd. Now the operation
> > > that caused this is indicated to be 205, but looking at that operation:
> > >
> > > +205(205 mod 256): ZERO 0x6dbe6 thru 0x6e6aa (0xac5 bytes)
> > >
> > > This doesn't even overlap the range that is bad. (0x1c000 to 0x1c00f).
> > > Infact, it does seem like an unlikely coincidence that the actual data
> > > in the bad range is 0xcdcd which is something xfs_io -c "pwrite" writes
> > > to default (fsx writes random data in even offsets and operation num in
> > > odd).
> > >
> > > I am able to replicate this but only on XFS but not on ext4 (atleast not
> > > in 20 runs). I'm trying to better understand if this is a test issue or
> > > not. Will keep you update.
> >
> >
> > Hi Ojaswin,
> >
> > Sorry for the very slow response.
> >
> > Are you still checking this issue?
> >
> > To replicate, should I just take latest xfs kernel and run this series on
> > top of latest xfstests? Is it 100% reproducible?
> >
> > Thanks,
> > John
>
> Hi John,
>
> Yes Im looking into it but I'm now starting to run into some reflink/cow
> based concepts that are taking time to understand. Let me share what I
> have till now:
>
> So the test.sh that I'm using can be found here [1] which just uses an
> fsx replay file (which replays all operations) present in the same repo
> [2]. If you see the replay file, there are a bunch of random operations
> followed by the last 2 commented out operations:
>
> # copy_range 0xd000 0x1000 0x1d800 0x44000 <--- # operations <start> <len> <dest of copy> <filesize (can be ignored)>
> # mapread 0x1e000 0x1000 0x1e400 *
>
> The copy_range here is the one which causes (or exposes) the corruption
> at 0x1e800 (the end of copy range destination gets corrupted).
>
> To have more control, I commented these 2 operations and am doing it by
> hand in the test.sh file, with xfs_io. I'm also using a non atomic write
> device so we only have S/W fallback.
>
> Now some observations:
>
> 1. The copy_range operations is actually copying from a hole to a hole,
> so we should be reading all 0s. But What I see is the following happening:
>
> vfs_copy_file_range
> do_splice_direct
> do_splice_direct_actor
> do_splice_read
> # Adds the folio at src offset to the pipe. I confirmed this is all 0x0.
> splice_direct_to_actor
> direct_splice_actor
> do_splice_from
> iter_file_splice_write
> xfs_file_write_iter
> xfs_file_buffered_write
> iomap_file_buferred_write
> iomap_iter
> xfs_buferred_write_iomap_begin
> # Here we correctly see that there is noting at the
> # destination in data fork, but somehow we find a mapped
> # extent in cow fork which is returned to iomap.
> iomap_write_iter
> __iomap_write_begin
> # Here we notice folio is not uptodate and call
> # iomap_read_folio_range() to read from the cow_fork
> # mapping we found earlier. This results in folio having
> # incorrect data at 0x1e800 offset.
>
> So it seems like the fsx operations might be corrupting the cow fork state
> somehow leading to stale data exposure.
>
> 2. If we disable atomic writes we dont hit the issue.
>
> 3. If I do a -c pread of the destination range before doing the
> copy_range operation then I don't see the corruption any more.
>
> I'm now trying to figure out why the mapping returned is not IOMAP_HOLE
> as it should be. I don't know the COW path in xfs so there are some gaps
> in my understanding. Let me know if you need any other information since
> I'm reliably able to replicate on 6.17.0-rc4.
>
I haven't followed your issue closely, but just on this hole vs. COW
thing, XFS has a bit of a quirk where speculative COW fork preallocation
can expand out over holes in the data fork. If iomap lookup for buffered
write sees COW fork blocks present, it reports those blocks as the
primary mapping even if the data fork happens to be a hole (since
there's no point in allocating blocks to the data fork when we can just
remap).
Again I've no idea if this relates to your issue or what you're
referring to as a hole (i.e. data fork only?), but just pointing it out.
The latest iomap/xfs patches I posted a few days ago kind of dance
around this a bit, but I was somewhat hoping that maybe the cleanups
there would trigger some thoughts on better iomap reporting in that
regard.
Brian
> [1]
> https://github.com/OjaswinM/fsx-aw-issue/tree/master
>
> [2] https://github.com/OjaswinM/fsx-aw-issue/blob/master/repro.fsxops
>
> regards,
> ojaswin
>
Powered by blists - more mailing lists