[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2e57be4f-e61b-4a37-832d-14bdea315126@bsbernd.com>
Date: Fri, 12 Sep 2025 14:29:03 +0200
From: Bernd Schubert <bernd@...ernd.com>
To: Amir Goldstein <amir73il@...il.com>
Cc: Luis Henriques <luis@...lia.com>, "Darrick J. Wong" <djwong@...nel.org>,
Theodore Ts'o <tytso@....edu>, Miklos Szeredi <miklos@...redi.hu>,
Bernd Schubert <bschubert@....com>, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, Kevin Chen <kchen@....com>
Subject: Re: [RFC] Another take at restarting FUSE servers
On 9/12/25 13:41, Amir Goldstein wrote:
> On Fri, Sep 12, 2025 at 12:31 PM Bernd Schubert <bernd@...ernd.com> wrote:
>>
>>
>>
>> On 8/1/25 12:15, Luis Henriques wrote:
>>> On Thu, Jul 31 2025, Darrick J. Wong wrote:
>>>
>>>> On Thu, Jul 31, 2025 at 09:04:58AM -0400, Theodore Ts'o wrote:
>>>>> On Tue, Jul 29, 2025 at 04:38:54PM -0700, Darrick J. Wong wrote:
>>>>>>
>>>>>> Just speaking for fuse2fs here -- that would be kinda nifty if libfuse
>>>>>> could restart itself. It's unclear if doing so will actually enable us
>>>>>> to clear the condition that caused the failure in the first place, but I
>>>>>> suppose fuse2fs /does/ have e2fsck -fy at hand. So maybe restarts
>>>>>> aren't totally crazy.
>>>>>
>>>>> I'm trying to understand what the failure scenario is here. Is this
>>>>> if the userspace fuse server (i.e., fuse2fs) has crashed? If so, what
>>>>> is supposed to happen with respect to open files, metadata and data
>>>>> modifications which were in transit, etc.? Sure, fuse2fs could run
>>>>> e2fsck -fy, but if there are dirty inode on the system, that's going
>>>>> potentally to be out of sync, right?
>>>>>
>>>>> What are the recovery semantics that we hope to be able to provide?
>>>>
>>>> <echoing what we said on the ext4 call this morning>
>>>>
>>>> With iomap, most of the dirty state is in the kernel, so I think the new
>>>> fuse2fs instance would poke the kernel with FUSE_NOTIFY_RESTARTED, which
>>>> would initiate GETATTR requests on all the cached inodes to validate
>>>> that they still exist; and then resend all the unacknowledged requests
>>>> that were pending at the time. It might be the case that you have to
>>>> that in the reverse order; I only know enough about the design of fuse
>>>> to suspect that to be true.
>>>>
>>>> Anyhow once those are complete, I think we can resume operations with
>>>> the surviving inodes. The ones that fail the GETATTR revalidation are
>>>> fuse_make_bad'd, which effectively revokes them.
>>>
>>> Ah! Interesting, I have been playing a bit with sending LOOKUP requests,
>>> but probably GETATTR is a better option.
>>>
>>> So, are you currently working on any of this? Are you implementing this
>>> new NOTIFY_RESTARTED request? I guess it's time for me to have a closer
>>> look at fuse2fs too.
>>
>> Sorry for joining the discussion late, I was totally occupied, day and
>> night. Added Kevin to CC, who is going to work on recovery on our
>> DDN side.
>>
>> Issue with GETATTR and LOOKUP is that they need a path, but on fuse
>> server restart we want kernel to recover inodes and their lookup count.
>> Now inode recovery might be hard, because we currently only have a
>> 64-bit node-id - which is used my most fuse application as memory
>> pointer.
>>
>> As Luis wrote, my issue with FUSE_NOTIFY_RESEND is that it just re-sends
>> outstanding requests. And that ends up in most cases in sending requests
>> with invalid node-IDs, that are casted and might provoke random memory
>> access on restart. Kind of the same issue why fuse nfs export or
>> open_by_handle_at doesn't work well right now.
>>
>> So IMHO, what we really want is something like FUSE_LOOKUP_FH, which
>> would not return a 64-bit node ID, but a max 128 byte file handle.
>> And then FUSE_REVALIDATE_FH on server restart.
>> The file handles could be stored into the fuse inode and also used for
>> NFS export.
>>
>> I *think* Amir had a similar idea, but I don't find the link quickly.
>> Adding Amir to CC.
>
> Or maybe it was Miklos' idea. Hard to keep track of this rolling thread:
> https://lore.kernel.org/linux-fsdevel/CAJfpegvNZ6Z7uhuTdQ6quBaTOYNkAP8W_4yUY4L2JRAEKxEwOQ@mail.gmail.com/
Thanks for the reference Amir! I even had been in that thread.
>
>>
>> Our short term plan is to add something like FUSE_NOTIFY_RESTART, which
>> will iterate over all superblock inodes and mark them with fuse_make_bad.
>> Any objections against that?
>
> IDK, it seems much more ugly than implementing LOOKUP_HANDLE
> and I am not sure that LOOKUP_HANDLE is that hard to implement, when
> comparing to this alternative.
>
> I mean a restartable server is going to be a new implementation anyway, right?
> So it makes sense to start with a cleaner and more adequate protocol,
> does it not?
Definitely, if we agree on the approach on LOOKUP_HANDLE and using it
for recovery, adding that op seems simple. And reading through the
thread you had posted above, just the implementation was missing.
So let's go ahead to do this approach.
Thanks,
Bernd
Powered by blists - more mailing lists