lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJnrk1anodUxD5GR18N8w8239S_kbgijQyZC48Nsa4isb-e5JA@mail.gmail.com>
Date: Mon, 9 Feb 2026 11:08:50 -0800
From: Joanne Koong <joannelkoong@...il.com>
To: Wei Gao <wegao@...e.com>
Cc: Sasha Levin <sashal@...nel.org>, willy@...radead.org, linux-fsdevel@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 1/1] iomap: fix race between iomap_set_range_uptodate
 and folio_end_read

On Fri, Feb 6, 2026 at 11:16 PM Wei Gao <wegao@...e.com> wrote:
>
> On Tue, Dec 23, 2025 at 08:31:57PM -0500, Sasha Levin wrote:
> > On Tue, Dec 23, 2025 at 05:12:09PM -0800, Joanne Koong wrote:
> > > On Tue, Dec 23, 2025 at 2:30 PM Sasha Levin <sashal@...nel.org> wrote:
> > > >
> > >
> > > Hi Sasha,
> > >
> > > Thanks for your patch and for the detailed writeup.
> >
> > Thanks for looking into this!
> >
> > > > When iomap uses large folios, per-block uptodate tracking is managed via
> > > > iomap_folio_state (ifs). A race condition can cause the ifs uptodate bits
> > > > to become inconsistent with the folio's uptodate flag.
> > > >
> > > > The race occurs because folio_end_read() uses XOR semantics to atomically
> > > > set the uptodate bit and clear the locked bit:
> > > >
> > > >   Thread A (read completion):          Thread B (concurrent write):
> > > >   --------------------------------     --------------------------------
> > > >   iomap_finish_folio_read()
> > > >     spin_lock(state_lock)
> > > >     ifs_set_range_uptodate() -> true
> > > >     spin_unlock(state_lock)
> > > >                                        iomap_set_range_uptodate()
> > > >                                          spin_lock(state_lock)
> > > >                                          ifs_set_range_uptodate() -> true
> > > >                                          spin_unlock(state_lock)
> > > >                                          folio_mark_uptodate(folio)
> > > >     folio_end_read(folio, true)
> > > >       folio_xor_flags()  // XOR CLEARS uptodate!
> > >
> > > The part I'm confused about here is how this can happen between a
> > > concurrent read and write. My understanding is that the folio is
> > > locked when the read occurs and locked when the write occurs and both
> > > locks get dropped only when the read or write finishes. Looking at
> > > iomap code, I see iomap_set_range_uptodate() getting called in
> > > __iomap_write_begin() and __iomap_write_end() for the writes, but in
> > > both those places the folio lock is held while this is called. I'm not
> > > seeing how the read and write race in the diagram can happen, but
> > > maybe I'm missing something here?
> >
> > Hmm, you're right... The folio lock should prevent concurrent read/write
> > access. Looking at this again, I suspect that FUSE was calling
> > folio_clear_uptodate() and folio_mark_uptodate() directly without updating the
> > ifs bits. For example, in fuse_send_write_pages() on write error, it calls
> > folio_clear_uptodate(folio) which clears the folio flag but leaves ifs still
> > showing all blocks uptodate?
>
> Hi Sasha
> On PowerPC with 64KB page size, msync04 fails with SIGBUS on NTFS-FUSE. The issue stems from a state inconsistency between
> the iomap_folio_state (ifs) bitmap and the folio's Uptodate flag.
> tst_test.c:1985: TINFO: === Testing on ntfs ===
> tst_test.c:1290: TINFO: Formatting /dev/loop0 with ntfs opts='' extra opts=''
> Failed to set locale, using default 'C'.
> The partition start sector was not specified for /dev/loop0 and it could not be obtained automatically.  It has been set to 0.
> The number of sectors per track was not specified for /dev/loop0 and it could not be obtained automatically.  It has been set to 0.
> The number of heads was not specified for /dev/loop0 and it could not be obtained automatically.  It has been set to 0.
> To boot from a device, Windows needs the 'partition start sector', the 'sectors per track' and the 'number of heads' to be set.
> Windows will not be able to boot from this device.
> tst_test.c:1302: TINFO: Mounting /dev/loop0 to /tmp/LTP_msy3ljVxi/msync04 fstyp=ntfs flags=0
> tst_test.c:1302: TINFO: Trying FUSE...
> tst_test.c:1953: TBROK: Test killed by SIGBUS!
>
> Root Cause Analysis: When a page fault triggers fuse_read_folio, the iomap_read_folio_iter handles the request. For a 64KB page,
> after fetching 4KB via fuse_iomap_read_folio_range_async, the remaining 60KB (61440 bytes) is zero-filled via iomap_block_needs_zeroing,
> then iomap_set_range_uptodate marks the folio as Uptodate globally, after folio_xor_flags folio's uptodate become 0 again, finally trigger
> an SIGBUS issue in filemap_fault.

Hi Wei,

Thanks for your report. afaict, this scenario occurs only if the
server is a fuseblk server with a block size different from the memory
page size and if the file size is less than the size of the folio
being read in.

Could you verify that this snippet from Sasha's patch fixes the issue?:

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index e5c1ca440d93..7ceda24cf6a7 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -87,12 +86,50 @@ static void iomap_set_range_uptodate(struct folio
*folio, size_t off,
  if (ifs) {
          spin_lock_irqsave(&ifs->state_lock, flags);
          uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
          + /*
          + * If a read is in progress, we must NOT call folio_mark_uptodate
          + * here. The read completion path (iomap_finish_folio_read or
          + * iomap_read_end) will call folio_end_read() which uses XOR
          + * semantics to set the uptodate bit. If we set it here, the XOR
          + * in folio_end_read() will clear it, leaving the folio not
          + * uptodate while the ifs says all blocks are uptodate.
          + */
         + if (uptodate && ifs->read_bytes_pending)
                   + uptodate = false;
        spin_unlock_irqrestore(&ifs->state_lock, flags);
  }

Thanks,
Joanne

>
> So your iomap_set_range_uptodate patch can fix above failed case since it block mark folio's uptodate to 1.
> Hope my findings are helpful.
>
> >
> > --
> > Thanks,
> > Sasha
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ