[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9736bed0-3221-4e47-aed4-b46150b253a8@bsbernd.com>
Date: Mon, 10 Nov 2025 19:22:57 +0100
From: Bernd Schubert <bernd@...ernd.com>
To: "Darrick J. Wong" <djwong@...nel.org>
Cc: Joanne Koong <joannelkoong@...il.com>, miklos@...redi.hu, neal@...pa.dev,
linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH 1/5] fuse: flush pending fuse events before aborting the
connection
On 11/10/25 18:56, Darrick J. Wong wrote:
> On Fri, Nov 07, 2025 at 04:02:54PM -0800, Darrick J. Wong wrote:
>> On Fri, Nov 07, 2025 at 11:03:24PM +0100, Bernd Schubert wrote:
>>>
>>>
>>> On 11/7/25 05:26, Darrick J. Wong wrote:
>>>> [I read this email backwards, like I do]
>>>>
>>>> On Thu, Nov 06, 2025 at 10:37:41AM -0800, Joanne Koong wrote:
>>>>> On Wed, Nov 5, 2025 at 4:17 PM Darrick J. Wong <djwong@...nel.org> wrote:
>>>>>>
>>>>>> On Tue, Nov 04, 2025 at 11:22:26AM -0800, Joanne Koong wrote:
>>>>>>
>>>>>> <snipping here because this thread has gotten very long>
>>>>>>
>>>>>>>>>> + while (wait_event_timeout(fc->blocked_waitq,
>>>>>>>>>> + !fc->connected || atomic_read(&fc->num_waiting) == 0,
>>>>>>>>>> + HZ) == 0) {
>>>>>>>>>> + /* empty */
>>>>>>>>>> + }
>>>>>>>>>
>>>>>>>>> I'm wondering if it's necessary to wait here for all the pending
>>>>>>>>> requests to complete or abort?
>>>>>>>>
>>>>>>>> I'm not 100% sure what the fuse client shutdown sequence is supposed to
>>>>>>>> be. If someone kills a program with a large number of open unlinked
>>>>>>>> files and immediately calls umount(), then the fuse client could be in
>>>>>>>> the process of sending FUSE_RELEASE requests to the server.
>>>>>>>>
>>>>>>>> [background info, feel free to speedread this paragraph]
>>>>>>>> For a non-fuseblk server, unmount aborts all pending requests and
>>>>>>>> disconnects the fuse device. This means that the fuse server won't see
>>>>>>>> all the FUSE_REQUESTs before libfuse calls ->destroy having observed the
>>>>>>>> fusedev shutdown. The end result is that (on fuse2fs anyway) you end up
>>>>>>>> with a lot of .fuseXXXXX files that nobody cleans up.
>>>>>>>>
>>>>>>>> If you make ->destroy release all the remaining open files, now you run
>>>>>>>> into a second problem, which is that if there are a lot of open unlinked
>>>>>>>> files, freeing the inodes can collectively take enough time that the
>>>>>>>> FUSE_DESTROY request times out.
>>>>>>>>
>>>>>>>> On a fuseblk server with libfuse running in multithreaded mode, there
>>>>>>>> can be several threads reading fuse requests from the fusedev. The
>>>>>>>> kernel actually sends its own FUSE_DESTROY request, but there's no
>>>>>>>> coordination between the fuse workers, which means that the fuse server
>>>>>>>> can process FUSE_DESTROY at the same time it's processing FUSE_RELEASE.
>>>>>>>> If ->destroy closes the filesystem before the FUSE_RELEASE requests are
>>>>>>>> processed, you end up with the same .fuseXXXXX file cleanup problem.
>>>>>>>
>>>>>>> imo it is the responsibility of the server to coordinate this and make
>>>>>>> sure it has handled all the requests it has received before it starts
>>>>>>> executing the destruction logic.
>>>>>>
>>>>>> I think we're all saying that some sort of fuse request reordering
>>>>>> barrier is needed here, but there's at least three opinions about where
>>>>>> that barrier should be implemented. Clearly I think the barrier should
>>>>>> be in the kernel, but let me think more about where it could go if it
>>>>>> were somewhere else.
>>>>>>
>>>>>> First, Joanne's suggestion for putting it in the fuse server itself:
>>>>>>
>>>>>> I don't see how it's generally possible for the fuse server to know that
>>>>>> it's processed all the requests that the kernel might have sent it.
>>>>>> AFAICT each libfuse thread does roughly this:
>>>>>>
>>>>>> 1. read() a request from the fusedev fd
>>>>>> 2. decode the request data and maybe do some allocations or transform it
>>>>>> 3. call fuse server with request
>>>>>> 4. fuse server does ... something with the request
>>>>>> 5. fuse server finishes, hops back to libfuse / calls fuse_reply_XXX
>>>>>>
>>>>>> Let's say thread 1 is at step 4 with a FUSE_DESTROY. How does it find
>>>>>> out if there are other fuse worker threads that are somewhere in steps
>>>>>> 2 or 3? AFAICT the library doesn't keep track of the number of threads
>>>>>> that are waiting in fuse_session_receive_buf_internal, so fuse servers
>>>>>> can't ask the library about that either.
>>>>>>
>>>>>> Taking a narrower view, it might be possible for the fuse server to
>>>>>> figure this out by maintaining an open resource count. It would
>>>>>> increment this counter when a FUSE_{OPEN,CREATE} request succeeds and
>>>>>> decrement it when FUSE_RELEASE comes in. Assuming that FUSE_RELEASE is
>>>>>> the only kind of request that can be pending when a FUSE_DESTROY comes
>>>>>> in, then destroy just has to wait for the counter to hit zero.
>>>>>
>>>>> I was thinking this logic could be in libfuse's fuse_loop_mt.c. Where
>>>>> if there are X worker threads that are all running fuse_do_work( )
>>>>> then if you get a FUSE_DESTROY on one of those threads that thread can
>>>>> set some se->destroyed field. At this point the other threads will
>>>>> have already called fuse_session_receive_buf_internal() on all the
>>>>> flushed background requests, so after they process it and return from
>>>>> fuse_session_process_buf_internal(), then they check if se->destroyed
>>>>> was set, and if it is they exit the thread, while in the thread that
>>>>> got the FUSE_DESTROY it sleeps until all the threads have completed
>>>>> and then it executes the destroy logic.That to me seems like the
>>>>> cleanest approach.
>>>>
>>>> Hrm. Well now (scrolling to the bottom and back) that I know that the
>>>> FUSE_DESTROY won't get put on the queue ahead of the FUSE_RELEASEs, I
>>>> think that /could/ work.
>>>>
>>>> One tricky thing with having worker threads check a flag and exit is
>>>> that they can be sleeping in the kernel (from _fuse_session_receive_buf)
>>>> when the "just go away" flag gets set. If the thread never wakes up,
>>>> then it'll never exit. In theory you could have the FUSE_DESTROY thread
>>>> call pthread_cancel on all the other worker threads to eliminate them
>>>> once they emerge from PTHREAD_CANCEL_DISABLE state, but I still have
>>>> nightmares from adventures in pthread_cancel at Sun in 2002. :P
>>>>
>>>> Maybe an easier approach would be to have fuse_do_work increment a
>>>> counter when it receives a buffer and decrement it when it finishes with
>>>> that buffer. The FUSE_DESTROY thread merely has to wait for that
>>>> counter to reach 1, at which point it's the only thread with a request
>>>> to process, so it can call do_destroy. That at least would avoid adding
>>>> a new user of pthread_cancel() into the mt loop code.
>>>
>>> I will read through the rest (too tired right now) durig the weekend.
>>> I was also thinking about counter. And let's please also do this right
>>> also handling io-uring. I.e. all CQEs needs to have been handled.
>>> Without io-uring it would be probably a counter in decreased in
>>> fuse_free_req(), with io-uring it is a bit more complex.
>>
>> Oh right, the uring backend.
>>
>> Assuming that it's really true that the only requests pending during an
>> unmount are going to be FUSE_RELEASE (nobody's actually said that's
>> true) then it's *much* easier to count the number of open files in
>> fuse_session and make _do_destroy in the lowlevel library wait until the
>> open file count reaches zero.
>
> FWIW I tried this out over the weekend with the patch below and the
> wait_event() turned off in the kernel. It seems to work (though I only
> tried it cursorily with iouring) so if the kernel fuse developers would
> rather not have a wait_event() in the unmount path then I suppose this
> is a way to move ahead with this topic.
>
> --D
>
> From: Darrick J. Wong <djwong@...nel.org>
> Subject: [PATCH] libfuse: wait in do_destroy until all open files are closed
>
> Joanne suggests that libfuse should defer a FUSE_DESTROY request until
> all FUSE_RELEASEs have completed. Let's see if that works by tracking
> the count of open files.
>
> Signed-off-by: "Darrick J. Wong" <djwong@...nel.org>
> ---
> lib/fuse_i.h | 4 ++++
> lib/fuse_lowlevel.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++---
> 2 files changed, 55 insertions(+), 3 deletions(-)
>
> diff --git a/lib/fuse_i.h b/lib/fuse_i.h
> index 0ce2c0134ed879..dfe9d9f067498e 100644
> --- a/lib/fuse_i.h
> +++ b/lib/fuse_i.h
> @@ -117,6 +117,10 @@ struct fuse_session {
> */
> uint32_t conn_want;
> uint64_t conn_want_ext;
> +
> + /* destroy has to wait for all the open files to go away */
> + pthread_cond_t zero_open_files;
> + uint64_t open_files;
> };
>
> struct fuse_chan {
> diff --git a/lib/fuse_lowlevel.c b/lib/fuse_lowlevel.c
> index 12724ed66bdcc8..f12c6db0eb0e60 100644
> --- a/lib/fuse_lowlevel.c
> +++ b/lib/fuse_lowlevel.c
> @@ -52,6 +52,30 @@
> #define PARAM(inarg) (((char *)(inarg)) + sizeof(*(inarg)))
> #define OFFSET_MAX 0x7fffffffffffffffLL
>
> +static inline void inc_open_files(struct fuse_session *se)
> +{
> + pthread_mutex_lock(&se->lock);
> + se->open_files++;
> + pthread_mutex_unlock(&se->lock);
> +}
> +
> +static inline void dec_open_files(struct fuse_session *se)
> +{
> + pthread_mutex_lock(&se->lock);
> + se->open_files--;
> + if (!se->open_files)
> + pthread_cond_broadcast(&se->zero_open_files);
> + pthread_mutex_unlock(&se->lock);
> +}
I think open files only decreases when destroy is received? Doesn't that
give us the chance to use an atomic (C11) and then to broadcast only
when FUSE_DESTROY is received? I.e. I would use an atomic for
'open_files', set something like 'se->destroy_received' and then trigger
the broadcast only when that is set.
(A later further optimization with io-uring will be to have that counter
per queue.)
Thanks,
Bernd
Powered by blists - more mailing lists