[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aO06hoYuvDGiCBc7@bfoster>
Date: Mon, 13 Oct 2025 13:44:38 -0400
From: Brian Foster <bfoster@...hat.com>
To: Miklos Szeredi <miklos@...redi.hu>
Cc: lu gu <giveme.gulu@...il.com>, Joanne Koong <joannelkoong@...il.com>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
Bernd Schubert <bernd@...ernd.com>
Subject: Re: [PATCH 5.15] fuse: Fix race condition in writethrough path A race
On Mon, Oct 13, 2025 at 03:39:48PM +0200, Miklos Szeredi wrote:
> On Fri, 10 Oct 2025 at 10:46, Miklos Szeredi <miklos@...redi.hu> wrote:
>
> > My idea is to introduce FUSE_I_MTIME_UNSTABLE (which would work
> > similarly to FUSE_I_SIZE_UNSTABLE) and when fetching old_mtime, verify
> > that it hasn't been invalidated. If old_mtime is invalid or if
> > FUSE_I_MTIME_UNSTABLE signals that a write is in progress, the page
> > cache is not invalidated.
>
> [Adding Brian Foster, the author of FUSE_AUTO_INVAL_DATA patches.
> Link to complete thread:
> https://lore.kernel.org/all/20251009110623.3115511-1-giveme.gulu@gmail.com/#r]
>
> In summary: auto_inval_data invalidates data cache even if the
> modification was done in a cache consistent manner (i.e. write
> through). This is not generally a consistency problem, because the
> backing file and the cache should be in sync. The exception is when
> the writeback to the backing file hasn't yet finished and a getattr()
> call triggers invalidation (mtime change could be from a previous
> write), and the not yet written data is invalidated and replaced with
> stale data.
>
Heh, well that's an old one. ;) I'm probably not going to recall all the
details, but from a quick look at the commits this was to facilitate
support for glusterfs. The original fuse code did an inval across i_size
changes and this patch updated that to try and accommodate overwrites by
doing a similar thing for mtime differences.
If I follow the report correctly, we're basically producing an internal
inconsistency between mtime and cache state that falsely presents as a
remote change, so one of these attr change checks can race with a write
in progress and invalidate cache. Do I have that right?
But still a few questions..
1. Do we know where exactly the mtime update comes from? Is it the write
in progress that updates the file mtime on the backend and creates the
inconsistency?
2. Is it confirmed that auto_inval is the culprit here? It seems logical
to me, but it can also be disabled dynamically so couldn't hurt to
confirm that if there's a reproducer.
3. I don't think we should be able to invalidate "dirty" folios like
this. On a quick look though, it seems we don't mark folios dirty in
this write path. Is that right?
If so, I'm a little curious if that's more of a "no apparent need" thing
since the writeback occurs right in that path vs. that is an actual
wrong thing to do for some reason. Hm?
If the former (and if there is simple confirmation of the auto inval
thing), I'm at least a little curious if marking folios
dirty/writeback/clean here would provide enough serialization against
the inval to prevent this problem.
> The proposed fix was to exclude concurrent reads and writes to the same region.
>
> But the real issue here is that mtime changes triggered by this client
> should not cause data to be invalidated. It's not only racy, but it's
> fundamentally wrong. Unfortunately this is hard to do this correctly.
> Best I can come up with is that any request that expects mtime to be
> modified returns the mtime after the request has completed.
>
Agreed in general. IIUC, this is ultimately a heuristic that isn't
guaranteed to necessarily get things right for the backing fs. ISTM that
maybe fuse is trying too hard to handle the distributed case correctly
where the backing fs should be the one to implement this sort of thing
through exposed mechanisms. OTOH so long as the heuristic exists we
should probably at least work to make it internally consistent.
> This would be much easier to implement in the fuse server: perform the
> "file changed remotely" check when serving a FUSE_GETATTR request and
> return a flag indicating whether the data needs to be invalidated or
> not.
>
Indeed something along those lines sounds more elegant long term, IMO.
Brian
> Thoughts?
>
> Thanks,
> Miklos
>
Powered by blists - more mailing lists