[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJnrk1b=UMb9GrU0oiah986of_dgwLiRsZKvodwBoO1PSUaP7w@mail.gmail.com>
Date: Wed, 15 Oct 2025 10:19:15 -0700
From: Joanne Koong <joannelkoong@...il.com>
To: Miklos Szeredi <miklos@...redi.hu>
Cc: lu gu <giveme.gulu@...il.com>, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, Bernd Schubert <bernd@...ernd.com>,
Brian Foster <bfoster@...hat.com>
Subject: Re: [PATCH 5.15] fuse: Fix race condition in writethrough path A race
On Wed, Oct 15, 2025 at 7:09 AM Miklos Szeredi <miklos@...redi.hu> wrote:
>
> On Wed, 15 Oct 2025 at 06:00, lu gu <giveme.gulu@...il.com> wrote:
> >
> > > Attaching a test patch, minimally tested.
> > Since I only have a test environment for kernel 5.15, I ported this
> > patch to the FUSE module in 5.15. I ran the previous LTP test cases
> > more than ten times, and the data inconsistency issue did not reoccur.
> > However, a deadlock occur. Below is the specific stack trace.
>
> This is does not reproduce for me on 6.17 even after running the test
> for hours. Without seeing your backport it is difficult to say
> anything about the reason for the deadlock.
>
> Attaching an updated patch that takes care of i_wb initialization on
> CONFIG_CGROUP_WRITEBACK=y.
I think now we'll also need to always set
mapping_set_writeback_may_deadlock_on_reclaim(), eg
@@ -3125,8 +3128,7 @@ void fuse_init_file_inode(struct inode *inode,
unsigned int flags)
inode->i_fop = &fuse_file_operations;
inode->i_data.a_ops = &fuse_file_aops;
- if (fc->writeback_cache)
- mapping_set_writeback_may_deadlock_on_reclaim(&inode->i_data);
+ mapping_set_writeback_may_deadlock_on_reclaim(&inode->i_data);
Does this completely get rid of the race? There's a fair chance I'm
wrong here but doesn't the race still happen if the read invalidation
happens before the write grabs the folio lock? This is the scenario
I'm thinking of:
Thread A (read):
read, w/ auto inval and a outdated mtime triggers invalidate_inode_pages2()
generic_file_read_iter() is called, which calls filemap_read() ->
filemap_get_pages() -> triggers read_folio/readahead
read_folio/readahead fetches data (stale) from the server, unlocks folios
Thread B (writethrough write):
fuse_perform_write() -> fuse_fill_write_pages():
grabs the folio lock and copies new write data to page cache, sets
writeback flag and unlocks folio, sends request to server
Thread A (read):
the read data that was fetched from the server gets copied to the page
cache in filemap_read()
overwrites the write data in the page cache with the stale data
Am i misanalyzing something in this sequence?
Thanks,
Joanne
>
> Thanks,
> Miklos
Powered by blists - more mailing lists