linux-kernel - Re: [PATCH 5.15] fuse: Fix race condition in writethrough path A race

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAFS-8+V6j-yunnt5yQSa=+P0mXVSg5jrfsBGWrEAbYGm21y8wg@mail.gmail.com>
Date: Mon, 20 Oct 2025 18:10:16 +0800
From: lu gu <giveme.gulu@...il.com>
To: Miklos Szeredi <miklos@...redi.hu>
Cc: Brian Foster <bfoster@...hat.com>, linux-fsdevel@...r.kernel.org, 
	linux-kernel@...r.kernel.org, Bernd Schubert <bernd@...ernd.com>, 
	Joanne Koong <joannelkoong@...il.com>
Subject: Re: [PATCH 5.15] fuse: Fix race condition in writethrough path A race

I tried to backport the fix  to my 5.15 environment.
After further investigation and comparing the code across kernel
versions, I now believe I understand why the straightforward backport
failed.

My understanding is that in kernel 5.15, FUSE's writeback detection
(e.g., in fuse_wait_on_page_writeback) relies on its own tracking
mechanism—the fi->writepages red-black tree, which is checked via
fuse_find_writeback(). In contrast, the fix in the mainline kernel
appears to rely on the generic VFS/MM mechanism, where
folio_wait_writeback() directly checks the PG_writeback flag on the
folio itself.

By simply backporting the logic that sets the PG_writeback flag
without also adding a corresponding entry to the fi->writepages
red-black tree, I created an inconsistent state: the page was marked
as under writeback, but FUSE's own checking functions were completely
unaware of it. I believe this inconsistency is what caused the
deadlock.

Therefore, a proper fix for 5.15 will require a more sophisticated approach.

On Thu, Oct 16, 2025 at 4:28 AM Joanne Koong <joannelkoong@...il.com> wrote:
>
> On Wed, Oct 15, 2025 at 12:44 PM Brian Foster <bfoster@...hat.com> wrote:
> >
> > On Wed, Oct 15, 2025 at 10:19:15AM -0700, Joanne Koong wrote:
> > > On Wed, Oct 15, 2025 at 7:09 AM Miklos Szeredi <miklos@...redi.hu> wrote:
> > > >
> > > > On Wed, 15 Oct 2025 at 06:00, lu gu <giveme.gulu@...il.com> wrote:
> > > > >
> > > > > >  Attaching a test patch, minimally tested.
> > > > > Since I only have a test environment for kernel 5.15, I ported this
> > > > > patch to the FUSE module in 5.15. I ran the previous LTP test cases
> > > > > more than ten times, and the data inconsistency issue did not reoccur.
> > > > > However, a deadlock occur. Below is the specific stack trace.
> > > >
> > > > This is does not reproduce for me on 6.17 even after running the test
> > > > for hours.  Without seeing your backport it is difficult to say
> > > > anything about the reason for the deadlock.
> > > >
> > > > Attaching an updated patch that takes care of i_wb initialization on
> > > > CONFIG_CGROUP_WRITEBACK=y.
> > >
> > > I think now we'll also need to always set
> > > mapping_set_writeback_may_deadlock_on_reclaim(), eg
> > >
> > > @@ -3125,8 +3128,7 @@ void fuse_init_file_inode(struct inode *inode,
> > > unsigned int flags)
> > >
> > >         inode->i_fop = &fuse_file_operations;
> > >         inode->i_data.a_ops = &fuse_file_aops;
> > > -       if (fc->writeback_cache)
> > > -               mapping_set_writeback_may_deadlock_on_reclaim(&inode->i_data);
> > > +       mapping_set_writeback_may_deadlock_on_reclaim(&inode->i_data);
> > >
> > >
> > > Does this completely get rid of the race? There's a fair chance I'm
> > > wrong here but doesn't the race still happen if the read invalidation
> > > happens before the write grabs the folio lock? This is the scenario
> > > I'm thinking of:
> > >
> > > Thread A (read):
> > > read, w/ auto inval and a outdated mtime triggers invalidate_inode_pages2()
> > > generic_file_read_iter() is called, which calls filemap_read() ->
> > > filemap_get_pages() -> triggers read_folio/readahead
> > > read_folio/readahead fetches data (stale) from the server, unlocks folios
> > >
> > > Thread B (writethrough write):
> > > fuse_perform_write() -> fuse_fill_write_pages():
> > > grabs the folio lock and copies new write data to page cache, sets
> > > writeback flag and unlocks folio, sends request to server
> > >
> > > Thread A (read):
> > > the read data that was fetched from the server gets copied to the page
> > > cache in filemap_read()
> > > overwrites the write data in the page cache with the stale data
> > >
> > > Am i misanalyzing something in this sequence?
> > >
> >
> > Maybe I misread the description, but I think folios are locked across
> > read I/O, so I don't follow how we could race with readahead in this
> > way. Hm?
>
> Ah I see where my analysis went wrong - the "copy_folio_to_iter()"
> call in filemap_read() copies the data into the client's user buffer,
> not the data into the page cache. The data gets copied to the page
> cache in the fuse code in fuse_copy_out_args() (through
> fuse_dev_do_write()), which has to be under the folio lock. Yeah
> you're right, there's no race condition here then. Thanks for clearing
> this up.
>
> >
> > Brian
> >
> > > Thanks,
> > > Joanne
> > > >
> > > > Thanks,
> > > > Miklos
> > >
> >