[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e0b83d5f-d6b2-4383-a90f-437437d4cb75@bsbernd.com>
Date: Fri, 7 Nov 2025 23:03:24 +0100
From: Bernd Schubert <bernd@...ernd.com>
To: "Darrick J. Wong" <djwong@...nel.org>,
Joanne Koong <joannelkoong@...il.com>
Cc: miklos@...redi.hu, neal@...pa.dev, linux-ext4@...r.kernel.org,
linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH 1/5] fuse: flush pending fuse events before aborting the
connection
On 11/7/25 05:26, Darrick J. Wong wrote:
> [I read this email backwards, like I do]
>
> On Thu, Nov 06, 2025 at 10:37:41AM -0800, Joanne Koong wrote:
>> On Wed, Nov 5, 2025 at 4:17 PM Darrick J. Wong <djwong@...nel.org> wrote:
>>>
>>> On Tue, Nov 04, 2025 at 11:22:26AM -0800, Joanne Koong wrote:
>>>
>>> <snipping here because this thread has gotten very long>
>>>
>>>>>>> + while (wait_event_timeout(fc->blocked_waitq,
>>>>>>> + !fc->connected || atomic_read(&fc->num_waiting) == 0,
>>>>>>> + HZ) == 0) {
>>>>>>> + /* empty */
>>>>>>> + }
>>>>>>
>>>>>> I'm wondering if it's necessary to wait here for all the pending
>>>>>> requests to complete or abort?
>>>>>
>>>>> I'm not 100% sure what the fuse client shutdown sequence is supposed to
>>>>> be. If someone kills a program with a large number of open unlinked
>>>>> files and immediately calls umount(), then the fuse client could be in
>>>>> the process of sending FUSE_RELEASE requests to the server.
>>>>>
>>>>> [background info, feel free to speedread this paragraph]
>>>>> For a non-fuseblk server, unmount aborts all pending requests and
>>>>> disconnects the fuse device. This means that the fuse server won't see
>>>>> all the FUSE_REQUESTs before libfuse calls ->destroy having observed the
>>>>> fusedev shutdown. The end result is that (on fuse2fs anyway) you end up
>>>>> with a lot of .fuseXXXXX files that nobody cleans up.
>>>>>
>>>>> If you make ->destroy release all the remaining open files, now you run
>>>>> into a second problem, which is that if there are a lot of open unlinked
>>>>> files, freeing the inodes can collectively take enough time that the
>>>>> FUSE_DESTROY request times out.
>>>>>
>>>>> On a fuseblk server with libfuse running in multithreaded mode, there
>>>>> can be several threads reading fuse requests from the fusedev. The
>>>>> kernel actually sends its own FUSE_DESTROY request, but there's no
>>>>> coordination between the fuse workers, which means that the fuse server
>>>>> can process FUSE_DESTROY at the same time it's processing FUSE_RELEASE.
>>>>> If ->destroy closes the filesystem before the FUSE_RELEASE requests are
>>>>> processed, you end up with the same .fuseXXXXX file cleanup problem.
>>>>
>>>> imo it is the responsibility of the server to coordinate this and make
>>>> sure it has handled all the requests it has received before it starts
>>>> executing the destruction logic.
>>>
>>> I think we're all saying that some sort of fuse request reordering
>>> barrier is needed here, but there's at least three opinions about where
>>> that barrier should be implemented. Clearly I think the barrier should
>>> be in the kernel, but let me think more about where it could go if it
>>> were somewhere else.
>>>
>>> First, Joanne's suggestion for putting it in the fuse server itself:
>>>
>>> I don't see how it's generally possible for the fuse server to know that
>>> it's processed all the requests that the kernel might have sent it.
>>> AFAICT each libfuse thread does roughly this:
>>>
>>> 1. read() a request from the fusedev fd
>>> 2. decode the request data and maybe do some allocations or transform it
>>> 3. call fuse server with request
>>> 4. fuse server does ... something with the request
>>> 5. fuse server finishes, hops back to libfuse / calls fuse_reply_XXX
>>>
>>> Let's say thread 1 is at step 4 with a FUSE_DESTROY. How does it find
>>> out if there are other fuse worker threads that are somewhere in steps
>>> 2 or 3? AFAICT the library doesn't keep track of the number of threads
>>> that are waiting in fuse_session_receive_buf_internal, so fuse servers
>>> can't ask the library about that either.
>>>
>>> Taking a narrower view, it might be possible for the fuse server to
>>> figure this out by maintaining an open resource count. It would
>>> increment this counter when a FUSE_{OPEN,CREATE} request succeeds and
>>> decrement it when FUSE_RELEASE comes in. Assuming that FUSE_RELEASE is
>>> the only kind of request that can be pending when a FUSE_DESTROY comes
>>> in, then destroy just has to wait for the counter to hit zero.
>>
>> I was thinking this logic could be in libfuse's fuse_loop_mt.c. Where
>> if there are X worker threads that are all running fuse_do_work( )
>> then if you get a FUSE_DESTROY on one of those threads that thread can
>> set some se->destroyed field. At this point the other threads will
>> have already called fuse_session_receive_buf_internal() on all the
>> flushed background requests, so after they process it and return from
>> fuse_session_process_buf_internal(), then they check if se->destroyed
>> was set, and if it is they exit the thread, while in the thread that
>> got the FUSE_DESTROY it sleeps until all the threads have completed
>> and then it executes the destroy logic.That to me seems like the
>> cleanest approach.
>
> Hrm. Well now (scrolling to the bottom and back) that I know that the
> FUSE_DESTROY won't get put on the queue ahead of the FUSE_RELEASEs, I
> think that /could/ work.
>
> One tricky thing with having worker threads check a flag and exit is
> that they can be sleeping in the kernel (from _fuse_session_receive_buf)
> when the "just go away" flag gets set. If the thread never wakes up,
> then it'll never exit. In theory you could have the FUSE_DESTROY thread
> call pthread_cancel on all the other worker threads to eliminate them
> once they emerge from PTHREAD_CANCEL_DISABLE state, but I still have
> nightmares from adventures in pthread_cancel at Sun in 2002. :P
>
> Maybe an easier approach would be to have fuse_do_work increment a
> counter when it receives a buffer and decrement it when it finishes with
> that buffer. The FUSE_DESTROY thread merely has to wait for that
> counter to reach 1, at which point it's the only thread with a request
> to process, so it can call do_destroy. That at least would avoid adding
> a new user of pthread_cancel() into the mt loop code.
I will read through the rest (too tired right now) durig the weekend.
I was also thinking about counter. And let's please also do this right
also handling io-uring. I.e. all CQEs needs to have been handled.
Without io-uring it would be probably a counter in decreased in
fuse_free_req(), with io-uring it is a bit more complex.
Thanks,
Bernd
Powered by blists - more mailing lists