linux-ext4 - Re: [PATCH 1/5] fuse: flush pending fuse events before aborting the connection

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a9c0c66e-c3ce-4cdd-bd83-dd04bc5f9379@bsbernd.com>
Date: Tue, 4 Nov 2025 22:47:52 +0100
From: Bernd Schubert <bernd@...ernd.com>
To: Joanne Koong <joannelkoong@...il.com>, "Darrick J. Wong"
 <djwong@...nel.org>
Cc: miklos@...redi.hu, neal@...pa.dev, linux-ext4@...r.kernel.org,
 linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH 1/5] fuse: flush pending fuse events before aborting the
 connection



On 11/4/25 20:22, Joanne Koong wrote:
> On Mon, Nov 3, 2025 at 2:13 PM Darrick J. Wong <djwong@...nel.org> wrote:
>>
>> On Mon, Nov 03, 2025 at 09:20:26AM -0800, Joanne Koong wrote:
>>> On Tue, Oct 28, 2025 at 5:43 PM Darrick J. Wong <djwong@...nel.org> wrote:
>>>>
>>>> From: Darrick J. Wong <djwong@...nel.org>
>>>>
>>>> generic/488 fails with fuse2fs in the following fashion:
>>>>
>>>> generic/488       _check_generic_filesystem: filesystem on /dev/sdf is inconsistent
>>>> (see /var/tmp/fstests/generic/488.full for details)
>>>>
>>>> This test opens a large number of files, unlinks them (which really just
>>>> renames them to fuse hidden files), closes the program, unmounts the
>>>> filesystem, and runs fsck to check that there aren't any inconsistencies
>>>> in the filesystem.
>>>>
>>>> Unfortunately, the 488.full file shows that there are a lot of hidden
>>>> files left over in the filesystem, with incorrect link counts.  Tracing
>>>> fuse_request_* shows that there are a large number of FUSE_RELEASE
>>>> commands that are queued up on behalf of the unlinked files at the time
>>>> that fuse_conn_destroy calls fuse_abort_conn.  Had the connection not
>>>> aborted, the fuse server would have responded to the RELEASE commands by
>>>> removing the hidden files; instead they stick around.
>>>>
>>>> For upper-level fuse servers that don't use fuseblk mode this isn't a
>>>> problem because libfuse responds to the connection going down by pruning
>>>> its inode cache and calling the fuse server's ->release for any open
>>>> files before calling the server's ->destroy function.
>>>>
>>>> For fuseblk servers this is a problem, however, because the kernel sends
>>>> FUSE_DESTROY to the fuse server, and the fuse server has to close the
>>>> block device before returning.  This means that the kernel must flush
>>>> all pending FUSE_RELEASE requests before issuing FUSE_DESTROY.
>>>>
>>>> Create a function to push all the background requests to the queue and
>>>> then wait for the number of pending events to hit zero, and call this
>>>> before sending FUSE_DESTROY.  That way, all the pending events are
>>>> processed by the fuse server and we don't end up with a corrupt
>>>> filesystem.
>>>>
>>>> Note that we use a wait_event_timeout() loop to cause the process to
>>>> schedule at least once per second to avoid a "task blocked" warning:
>>>>
>>>> INFO: task umount:1279 blocked for more than 20 seconds.
>>>>       Not tainted 6.17.0-rc7-xfsx #rc7
>>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this messag.
>>>> task:umount          state:D stack:11984 pid:1279  tgid:1279  ppid:10690
>>>>
>>>> Earlier in the threads about this patch there was a (self-inflicted)
>>>> dispute as to whether it was necessary to call touch_softlockup_watchdog
>>>> in the loop body.  Because the process goes to sleep, it's not necessary
>>>> to touch the softlockup watchdog because we're not preventing another
>>>> process from being scheduled on a CPU.
>>>>
>>>> Signed-off-by: "Darrick J. Wong" <djwong@...nel.org>
>>>> ---
>>>>  fs/fuse/fuse_i.h |    5 +++++
>>>>  fs/fuse/dev.c    |   35 +++++++++++++++++++++++++++++++++++
>>>>  fs/fuse/inode.c  |   11 ++++++++++-
>>>>  3 files changed, 50 insertions(+), 1 deletion(-)
>>>>
>>>>
>>>> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
>>>> index c2f2a48156d6c5..aaa8574fd72775 100644
>>>> --- a/fs/fuse/fuse_i.h
>>>> +++ b/fs/fuse/fuse_i.h
>>>> @@ -1274,6 +1274,11 @@ void fuse_request_end(struct fuse_req *req);
>>>>  void fuse_abort_conn(struct fuse_conn *fc);
>>>>  void fuse_wait_aborted(struct fuse_conn *fc);
>>>>
>>>> +/**
>>>> + * Flush all pending requests and wait for them.
>>>> + */
>>>> +void fuse_flush_requests_and_wait(struct fuse_conn *fc);
>>>> +
>>>>  /* Check if any requests timed out */
>>>>  void fuse_check_timeout(struct work_struct *work);
>>>>
>>>> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
>>>> index 132f38619d7072..ecc0a5304c59d1 100644
>>>> --- a/fs/fuse/dev.c
>>>> +++ b/fs/fuse/dev.c
>>>> @@ -24,6 +24,7 @@
>>>>  #include <linux/splice.h>
>>>>  #include <linux/sched.h>
>>>>  #include <linux/seq_file.h>
>>>> +#include <linux/nmi.h>
>>>>
>>>>  #include "fuse_trace.h"
>>>>
>>>> @@ -2430,6 +2431,40 @@ static void end_polls(struct fuse_conn *fc)
>>>>         }
>>>>  }
>>>>
>>>> +/*
>>>> + * Flush all pending requests and wait for them.  Only call this function when
>>>> + * it is no longer possible for other threads to add requests.
>>>> + */
>>>> +void fuse_flush_requests_and_wait(struct fuse_conn *fc)
>>>> +{
>>>> +       spin_lock(&fc->lock);
>>>
>>> Do we need to grab the fc lock? fc->connected is protected under the
>>> bg_lock, afaict from fuse_abort_conn().
>>
>> Oh, heh.  Yeah, it does indeed take both fc->lock and fc->bg_lock.
>> Will fix that, thanks. :)
>>
>> FWIW I don't think it's a big deal if we see a stale connected==1 value
>> because the events will all get cancelled and the wait loop won't run
>> anyway, but I agree with being consistent about lock ordering. :)
>>
>>>> +       if (!fc->connected) {
>>>> +               spin_unlock(&fc->lock);
>>>> +               return;
>>>> +       }
>>>> +
>>>> +       /* Push all the background requests to the queue. */
>>>> +       spin_lock(&fc->bg_lock);
>>>> +       fc->blocked = 0;
>>>> +       fc->max_background = UINT_MAX;
>>>> +       flush_bg_queue(fc);
>>>> +       spin_unlock(&fc->bg_lock);
>>>> +       spin_unlock(&fc->lock);
>>>> +
>>>> +       /*
>>>> +        * Wait for all pending fuse requests to complete or abort.  The fuse
>>>> +        * server could take a significant amount of time to complete a
>>>> +        * request, so run this in a loop with a short timeout so that we don't
>>>> +        * trip the soft lockup detector.
>>>> +        */
>>>> +       smp_mb();
>>>> +       while (wait_event_timeout(fc->blocked_waitq,
>>>> +                       !fc->connected || atomic_read(&fc->num_waiting) == 0,
>>>> +                       HZ) == 0) {
>>>> +               /* empty */
>>>> +       }
>>>
>>> I'm wondering if it's necessary to wait here for all the pending
>>> requests to complete or abort?
>>
>> I'm not 100% sure what the fuse client shutdown sequence is supposed to
>> be.  If someone kills a program with a large number of open unlinked
>> files and immediately calls umount(), then the fuse client could be in
>> the process of sending FUSE_RELEASE requests to the server.
>>
>> [background info, feel free to speedread this paragraph]
>> For a non-fuseblk server, unmount aborts all pending requests and
>> disconnects the fuse device.  This means that the fuse server won't see
>> all the FUSE_REQUESTs before libfuse calls ->destroy having observed the
>> fusedev shutdown.  The end result is that (on fuse2fs anyway) you end up
>> with a lot of .fuseXXXXX files that nobody cleans up.
>>
>> If you make ->destroy release all the remaining open files, now you run
>> into a second problem, which is that if there are a lot of open unlinked
>> files, freeing the inodes can collectively take enough time that the
>> FUSE_DESTROY request times out.
>>
>> On a fuseblk server with libfuse running in multithreaded mode, there
>> can be several threads reading fuse requests from the fusedev.  The
>> kernel actually sends its own FUSE_DESTROY request, but there's no
>> coordination between the fuse workers, which means that the fuse server
>> can process FUSE_DESTROY at the same time it's processing FUSE_RELEASE.
>> If ->destroy closes the filesystem before the FUSE_RELEASE requests are
>> processed, you end up with the same .fuseXXXXX file cleanup problem.
> 
> imo it is the responsibility of the server to coordinate this and make
> sure it has handled all the requests it has received before it starts
> executing the destruction logic. imo the only responsibility of the
> kernel is to actually send the background requests before it sends the
> FUSE_DESTROY. I think non-fuseblk servers should also receive the
> FUSE_DESTROY request.

Hmm, good idea, I guess we can add that in libfuse, maybe with some kind
of timeout.

There is something I don't understand though, how can FUSE_DESTROY
happen before FUSE_RELEASE is completed?

->release / fuse_release
   fuse_release_common
      fuse_file_release
         fuse_file_put
            fuse_simple_background
            <userspace>
            <userspace-reply>
               fuse_release_end
                  iput()

I.e. how can it release the superblock (which triggers FUSE_DESTROY)


Thanks,
Bernd