[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251106001914.GI196358@frogsfrogsfrogs>
Date: Wed, 5 Nov 2025 16:19:14 -0800
From: "Darrick J. Wong" <djwong@...nel.org>
To: Bernd Schubert <bernd@...ernd.com>
Cc: Joanne Koong <joannelkoong@...il.com>, miklos@...redi.hu,
neal@...pa.dev, linux-ext4@...r.kernel.org,
linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH 1/5] fuse: flush pending fuse events before aborting the
connection
On Tue, Nov 04, 2025 at 10:47:52PM +0100, Bernd Schubert wrote:
>
>
> On 11/4/25 20:22, Joanne Koong wrote:
> > On Mon, Nov 3, 2025 at 2:13 PM Darrick J. Wong <djwong@...nel.org> wrote:
> >>
> >> On Mon, Nov 03, 2025 at 09:20:26AM -0800, Joanne Koong wrote:
> >>> On Tue, Oct 28, 2025 at 5:43 PM Darrick J. Wong <djwong@...nel.org> wrote:
> >>>>
> >>>> From: Darrick J. Wong <djwong@...nel.org>
> >>>>
> >>>> generic/488 fails with fuse2fs in the following fashion:
> >>>>
> >>>> generic/488 _check_generic_filesystem: filesystem on /dev/sdf is inconsistent
> >>>> (see /var/tmp/fstests/generic/488.full for details)
> >>>>
> >>>> This test opens a large number of files, unlinks them (which really just
> >>>> renames them to fuse hidden files), closes the program, unmounts the
> >>>> filesystem, and runs fsck to check that there aren't any inconsistencies
> >>>> in the filesystem.
> >>>>
> >>>> Unfortunately, the 488.full file shows that there are a lot of hidden
> >>>> files left over in the filesystem, with incorrect link counts. Tracing
> >>>> fuse_request_* shows that there are a large number of FUSE_RELEASE
> >>>> commands that are queued up on behalf of the unlinked files at the time
> >>>> that fuse_conn_destroy calls fuse_abort_conn. Had the connection not
> >>>> aborted, the fuse server would have responded to the RELEASE commands by
> >>>> removing the hidden files; instead they stick around.
> >>>>
> >>>> For upper-level fuse servers that don't use fuseblk mode this isn't a
> >>>> problem because libfuse responds to the connection going down by pruning
> >>>> its inode cache and calling the fuse server's ->release for any open
> >>>> files before calling the server's ->destroy function.
> >>>>
> >>>> For fuseblk servers this is a problem, however, because the kernel sends
> >>>> FUSE_DESTROY to the fuse server, and the fuse server has to close the
> >>>> block device before returning. This means that the kernel must flush
> >>>> all pending FUSE_RELEASE requests before issuing FUSE_DESTROY.
> >>>>
> >>>> Create a function to push all the background requests to the queue and
> >>>> then wait for the number of pending events to hit zero, and call this
> >>>> before sending FUSE_DESTROY. That way, all the pending events are
> >>>> processed by the fuse server and we don't end up with a corrupt
> >>>> filesystem.
> >>>>
> >>>> Note that we use a wait_event_timeout() loop to cause the process to
> >>>> schedule at least once per second to avoid a "task blocked" warning:
> >>>>
> >>>> INFO: task umount:1279 blocked for more than 20 seconds.
> >>>> Not tainted 6.17.0-rc7-xfsx #rc7
> >>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this messag.
> >>>> task:umount state:D stack:11984 pid:1279 tgid:1279 ppid:10690
> >>>>
> >>>> Earlier in the threads about this patch there was a (self-inflicted)
> >>>> dispute as to whether it was necessary to call touch_softlockup_watchdog
> >>>> in the loop body. Because the process goes to sleep, it's not necessary
> >>>> to touch the softlockup watchdog because we're not preventing another
> >>>> process from being scheduled on a CPU.
> >>>>
> >>>> Signed-off-by: "Darrick J. Wong" <djwong@...nel.org>
> >>>> ---
> >>>> fs/fuse/fuse_i.h | 5 +++++
> >>>> fs/fuse/dev.c | 35 +++++++++++++++++++++++++++++++++++
> >>>> fs/fuse/inode.c | 11 ++++++++++-
> >>>> 3 files changed, 50 insertions(+), 1 deletion(-)
> >>>>
> >>>>
> >>>> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> >>>> index c2f2a48156d6c5..aaa8574fd72775 100644
> >>>> --- a/fs/fuse/fuse_i.h
> >>>> +++ b/fs/fuse/fuse_i.h
> >>>> @@ -1274,6 +1274,11 @@ void fuse_request_end(struct fuse_req *req);
> >>>> void fuse_abort_conn(struct fuse_conn *fc);
> >>>> void fuse_wait_aborted(struct fuse_conn *fc);
> >>>>
> >>>> +/**
> >>>> + * Flush all pending requests and wait for them.
> >>>> + */
> >>>> +void fuse_flush_requests_and_wait(struct fuse_conn *fc);
> >>>> +
> >>>> /* Check if any requests timed out */
> >>>> void fuse_check_timeout(struct work_struct *work);
> >>>>
> >>>> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> >>>> index 132f38619d7072..ecc0a5304c59d1 100644
> >>>> --- a/fs/fuse/dev.c
> >>>> +++ b/fs/fuse/dev.c
> >>>> @@ -24,6 +24,7 @@
> >>>> #include <linux/splice.h>
> >>>> #include <linux/sched.h>
> >>>> #include <linux/seq_file.h>
> >>>> +#include <linux/nmi.h>
> >>>>
> >>>> #include "fuse_trace.h"
> >>>>
> >>>> @@ -2430,6 +2431,40 @@ static void end_polls(struct fuse_conn *fc)
> >>>> }
> >>>> }
> >>>>
> >>>> +/*
> >>>> + * Flush all pending requests and wait for them. Only call this function when
> >>>> + * it is no longer possible for other threads to add requests.
> >>>> + */
> >>>> +void fuse_flush_requests_and_wait(struct fuse_conn *fc)
> >>>> +{
> >>>> + spin_lock(&fc->lock);
> >>>
> >>> Do we need to grab the fc lock? fc->connected is protected under the
> >>> bg_lock, afaict from fuse_abort_conn().
> >>
> >> Oh, heh. Yeah, it does indeed take both fc->lock and fc->bg_lock.
> >> Will fix that, thanks. :)
> >>
> >> FWIW I don't think it's a big deal if we see a stale connected==1 value
> >> because the events will all get cancelled and the wait loop won't run
> >> anyway, but I agree with being consistent about lock ordering. :)
> >>
> >>>> + if (!fc->connected) {
> >>>> + spin_unlock(&fc->lock);
> >>>> + return;
> >>>> + }
> >>>> +
> >>>> + /* Push all the background requests to the queue. */
> >>>> + spin_lock(&fc->bg_lock);
> >>>> + fc->blocked = 0;
> >>>> + fc->max_background = UINT_MAX;
> >>>> + flush_bg_queue(fc);
> >>>> + spin_unlock(&fc->bg_lock);
> >>>> + spin_unlock(&fc->lock);
> >>>> +
> >>>> + /*
> >>>> + * Wait for all pending fuse requests to complete or abort. The fuse
> >>>> + * server could take a significant amount of time to complete a
> >>>> + * request, so run this in a loop with a short timeout so that we don't
> >>>> + * trip the soft lockup detector.
> >>>> + */
> >>>> + smp_mb();
> >>>> + while (wait_event_timeout(fc->blocked_waitq,
> >>>> + !fc->connected || atomic_read(&fc->num_waiting) == 0,
> >>>> + HZ) == 0) {
> >>>> + /* empty */
> >>>> + }
> >>>
> >>> I'm wondering if it's necessary to wait here for all the pending
> >>> requests to complete or abort?
> >>
> >> I'm not 100% sure what the fuse client shutdown sequence is supposed to
> >> be. If someone kills a program with a large number of open unlinked
> >> files and immediately calls umount(), then the fuse client could be in
> >> the process of sending FUSE_RELEASE requests to the server.
> >>
> >> [background info, feel free to speedread this paragraph]
> >> For a non-fuseblk server, unmount aborts all pending requests and
> >> disconnects the fuse device. This means that the fuse server won't see
> >> all the FUSE_REQUESTs before libfuse calls ->destroy having observed the
> >> fusedev shutdown. The end result is that (on fuse2fs anyway) you end up
> >> with a lot of .fuseXXXXX files that nobody cleans up.
> >>
> >> If you make ->destroy release all the remaining open files, now you run
> >> into a second problem, which is that if there are a lot of open unlinked
> >> files, freeing the inodes can collectively take enough time that the
> >> FUSE_DESTROY request times out.
> >>
> >> On a fuseblk server with libfuse running in multithreaded mode, there
> >> can be several threads reading fuse requests from the fusedev. The
> >> kernel actually sends its own FUSE_DESTROY request, but there's no
> >> coordination between the fuse workers, which means that the fuse server
> >> can process FUSE_DESTROY at the same time it's processing FUSE_RELEASE.
> >> If ->destroy closes the filesystem before the FUSE_RELEASE requests are
> >> processed, you end up with the same .fuseXXXXX file cleanup problem.
> >
> > imo it is the responsibility of the server to coordinate this and make
> > sure it has handled all the requests it has received before it starts
> > executing the destruction logic. imo the only responsibility of the
> > kernel is to actually send the background requests before it sends the
> > FUSE_DESTROY. I think non-fuseblk servers should also receive the
> > FUSE_DESTROY request.
>
> Hmm, good idea, I guess we can add that in libfuse, maybe with some kind
> of timeout.
>
> There is something I don't understand though, how can FUSE_DESTROY
> happen before FUSE_RELEASE is completed?
>
> ->release / fuse_release
> fuse_release_common
> fuse_file_release
> fuse_file_put
> fuse_simple_background
> <userspace>
> <userspace-reply>
> fuse_release_end
> iput()
>
> I.e. how can it release the superblock (which triggers FUSE_DESTROY)
The short answer is that fuse_file_put doesn't wait for the backgrounded
release request to complete and returns; and that FUSE_DESTROY is sent
synchronously and with args->force = true so it jumps the queue.
(See my longer reply to Joanne for more details)
--D
>
> Thanks,
> Bernd
>
Powered by blists - more mailing lists