linux-kernel - Re: [RFC] Another take at restarting FUSE servers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOQ4uxg8sFdFRxKUcAFoCPMXaNY18m4e1PfBXo+GdGxGcKDaFg@mail.gmail.com>
Date: Mon, 15 Sep 2025 10:41:31 +0200
From: Amir Goldstein <amir73il@...il.com>
To: Bernd Schubert <bernd@...ernd.com>
Cc: "Darrick J. Wong" <djwong@...nel.org>, Luis Henriques <luis@...lia.com>, "Theodore Ts'o" <tytso@....edu>, 
	Miklos Szeredi <miklos@...redi.hu>, Bernd Schubert <bschubert@....com>, linux-fsdevel@...r.kernel.org, 
	linux-kernel@...r.kernel.org, Kevin Chen <kchen@....com>
Subject: Re: [RFC] Another take at restarting FUSE servers

On Mon, Sep 15, 2025 at 10:27 AM Bernd Schubert <bernd@...ernd.com> wrote:
>
>
>
> On 9/15/25 09:07, Amir Goldstein wrote:
> > On Fri, Sep 12, 2025 at 4:58 PM Darrick J. Wong <djwong@...nel.org> wrote:
> >>
> >> On Fri, Sep 12, 2025 at 02:29:03PM +0200, Bernd Schubert wrote:
> >>>
> >>>
> >>> On 9/12/25 13:41, Amir Goldstein wrote:
> >>>> On Fri, Sep 12, 2025 at 12:31 PM Bernd Schubert <bernd@...ernd.com> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 8/1/25 12:15, Luis Henriques wrote:
> >>>>>> On Thu, Jul 31 2025, Darrick J. Wong wrote:
> >>>>>>
> >>>>>>> On Thu, Jul 31, 2025 at 09:04:58AM -0400, Theodore Ts'o wrote:
> >>>>>>>> On Tue, Jul 29, 2025 at 04:38:54PM -0700, Darrick J. Wong wrote:
> >>>>>>>>>
> >>>>>>>>> Just speaking for fuse2fs here -- that would be kinda nifty if libfuse
> >>>>>>>>> could restart itself.  It's unclear if doing so will actually enable us
> >>>>>>>>> to clear the condition that caused the failure in the first place, but I
> >>>>>>>>> suppose fuse2fs /does/ have e2fsck -fy at hand.  So maybe restarts
> >>>>>>>>> aren't totally crazy.
> >>>>>>>>
> >>>>>>>> I'm trying to understand what the failure scenario is here.  Is this
> >>>>>>>> if the userspace fuse server (i.e., fuse2fs) has crashed?  If so, what
> >>>>>>>> is supposed to happen with respect to open files, metadata and data
> >>>>>>>> modifications which were in transit, etc.?  Sure, fuse2fs could run
> >>>>>>>> e2fsck -fy, but if there are dirty inode on the system, that's going
> >>>>>>>> potentally to be out of sync, right?
> >>>>>>>>
> >>>>>>>> What are the recovery semantics that we hope to be able to provide?
> >>>>>>>
> >>>>>>> <echoing what we said on the ext4 call this morning>
> >>>>>>>
> >>>>>>> With iomap, most of the dirty state is in the kernel, so I think the new
> >>>>>>> fuse2fs instance would poke the kernel with FUSE_NOTIFY_RESTARTED, which
> >>>>>>> would initiate GETATTR requests on all the cached inodes to validate
> >>>>>>> that they still exist; and then resend all the unacknowledged requests
> >>>>>>> that were pending at the time.  It might be the case that you have to
> >>>>>>> that in the reverse order; I only know enough about the design of fuse
> >>>>>>> to suspect that to be true.
> >>>>>>>
> >>>>>>> Anyhow once those are complete, I think we can resume operations with
> >>>>>>> the surviving inodes.  The ones that fail the GETATTR revalidation are
> >>>>>>> fuse_make_bad'd, which effectively revokes them.
> >>>>>>
> >>>>>> Ah! Interesting, I have been playing a bit with sending LOOKUP requests,
> >>>>>> but probably GETATTR is a better option.
> >>>>>>
> >>>>>> So, are you currently working on any of this?  Are you implementing this
> >>>>>> new NOTIFY_RESTARTED request?  I guess it's time for me to have a closer
> >>>>>> look at fuse2fs too.
> >>>>>
> >>>>> Sorry for joining the discussion late, I was totally occupied, day and
> >>>>> night. Added Kevin to CC, who is going to work on recovery on our
> >>>>> DDN side.
> >>>>>
> >>>>> Issue with GETATTR and LOOKUP is that they need a path, but on fuse
> >>>>> server restart we want kernel to recover inodes and their lookup count.
> >>>>> Now inode recovery might be hard, because we currently only have a
> >>>>> 64-bit node-id - which is used my most fuse application as memory
> >>>>> pointer.
> >>>>>
> >>>>> As Luis wrote, my issue with FUSE_NOTIFY_RESEND is that it just re-sends
> >>>>> outstanding requests. And that ends up in most cases in sending requests
> >>>>> with invalid node-IDs, that are casted and might provoke random memory
> >>>>> access on restart. Kind of the same issue why fuse nfs export or
> >>>>> open_by_handle_at doesn't work well right now.
> >>>>>
> >>>>> So IMHO, what we really want is something like FUSE_LOOKUP_FH, which
> >>>>> would not return a 64-bit node ID, but a max 128 byte file handle.
> >>>>> And then FUSE_REVALIDATE_FH on server restart.
> >>>>> The file handles could be stored into the fuse inode and also used for
> >>>>> NFS export.
> >>>>>
> >>>>> I *think* Amir had a similar idea, but I don't find the link quickly.
> >>>>> Adding Amir to CC.
> >>>>
> >>>> Or maybe it was Miklos' idea. Hard to keep track of this rolling thread:
> >>>> https://lore.kernel.org/linux-fsdevel/CAJfpegvNZ6Z7uhuTdQ6quBaTOYNkAP8W_4yUY4L2JRAEKxEwOQ@mail.gmail.com/
> >>>
> >>> Thanks for the reference Amir! I even had been in that thread.
> >>>
> >>>>
> >>>>>
> >>>>> Our short term plan is to add something like FUSE_NOTIFY_RESTART, which
> >>>>> will iterate over all superblock inodes and mark them with fuse_make_bad.
> >>>>> Any objections against that?
> >>
> >> What if you actually /can/ reuse a nodeid after a restart?  Consider
> >> fuse4fs, where the nodeid is the on-disk inode number.  After a restart,
> >> you can reconnect the fuse_inode to the ondisk inode, assuming recovery
> >> didn't delete it, obviously.
> >
> > FUSE_LOOKUP_HANDLE is a contract.
> > If fuse4fs can reuse nodeid after restart then by all means, it should sign
> > this contract, otherwise there is no way for client to know that the
> > nodeids are persistent.
> > If fuse4fs_handle := nodeid, that will make implementing the lookup_handle()
> > API trivial.
> >
> >>
> >> I suppose you could just ask for refreshed stat information and either
> >> the server gives it to you and the fuse_inode lives; or the server
> >> returns ENOENT and then we mark it bad.  But I'd have to see code
> >> patches to form a real opinion.
> >>
> >
> > You could make fuse4fs_handle := <nodeid:fuse_instance_id>
> > where fuse_instance_id can be its start time or random number.
> > for auto invalidate, or maybe the fuse_instance_id should be
> > a native part of FUSE protocol so that client knows to only invalidate
> > attr cache in case of fuse_instance_id change?
> >
> > In any case, instead of a storm of revalidate messages after
> > server restart, do it lazily on demand.
>
> For a network file system, probably. For fuse4fs or other block
> based file systems, not sure. Darrick has the example of fsck.
> Let's assume fuse4fs runs with attribute and dentry timeouts > 0,
> fuse-server gets restarted, fsck'ed and some files get removed.
> Now reading these inodes would still work - wouldn't it
> be better to invalidate the cache before going into operation
> again?

Forgive me, I was making a wrong assumption that fuse4fs
was using ext4 filehandle as nodeid, but of course it does not.

The reason I made this wrong assumption is because fuse4fs *can*
already use ext4 (64bit) file handle as nodeid, with existing FUSE protocol
which is what my fuse passthough library [1] does.

My claim was that although fuse4fs could support safe restart, which
cannot read from recycled inode number with current FUSE protocol,
doing so with FUSE_HANDLE protocol would express a commitment
to this behavior.

Thanks,
Amir.

[1] https://github.com/amir73il/libfuse/commits/fuse_passthrough