linux-kernel - Re: [RFC] Another take at restarting FUSE servers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOQ4uxg1zXPTB1_pFB=hyqjAGjk=AC34qP1k9C043otxcwqJGg@mail.gmail.com>
Date: Fri, 12 Sep 2025 13:41:20 +0200
From: Amir Goldstein <amir73il@...il.com>
To: Bernd Schubert <bernd@...ernd.com>
Cc: Luis Henriques <luis@...lia.com>, "Darrick J. Wong" <djwong@...nel.org>, "Theodore Ts'o" <tytso@....edu>, 
	Miklos Szeredi <miklos@...redi.hu>, Bernd Schubert <bschubert@....com>, linux-fsdevel@...r.kernel.org, 
	linux-kernel@...r.kernel.org, Kevin Chen <kchen@....com>
Subject: Re: [RFC] Another take at restarting FUSE servers

On Fri, Sep 12, 2025 at 12:31 PM Bernd Schubert <bernd@...ernd.com> wrote:
>
>
>
> On 8/1/25 12:15, Luis Henriques wrote:
> > On Thu, Jul 31 2025, Darrick J. Wong wrote:
> >
> >> On Thu, Jul 31, 2025 at 09:04:58AM -0400, Theodore Ts'o wrote:
> >>> On Tue, Jul 29, 2025 at 04:38:54PM -0700, Darrick J. Wong wrote:
> >>>>
> >>>> Just speaking for fuse2fs here -- that would be kinda nifty if libfuse
> >>>> could restart itself.  It's unclear if doing so will actually enable us
> >>>> to clear the condition that caused the failure in the first place, but I
> >>>> suppose fuse2fs /does/ have e2fsck -fy at hand.  So maybe restarts
> >>>> aren't totally crazy.
> >>>
> >>> I'm trying to understand what the failure scenario is here.  Is this
> >>> if the userspace fuse server (i.e., fuse2fs) has crashed?  If so, what
> >>> is supposed to happen with respect to open files, metadata and data
> >>> modifications which were in transit, etc.?  Sure, fuse2fs could run
> >>> e2fsck -fy, but if there are dirty inode on the system, that's going
> >>> potentally to be out of sync, right?
> >>>
> >>> What are the recovery semantics that we hope to be able to provide?
> >>
> >> <echoing what we said on the ext4 call this morning>
> >>
> >> With iomap, most of the dirty state is in the kernel, so I think the new
> >> fuse2fs instance would poke the kernel with FUSE_NOTIFY_RESTARTED, which
> >> would initiate GETATTR requests on all the cached inodes to validate
> >> that they still exist; and then resend all the unacknowledged requests
> >> that were pending at the time.  It might be the case that you have to
> >> that in the reverse order; I only know enough about the design of fuse
> >> to suspect that to be true.
> >>
> >> Anyhow once those are complete, I think we can resume operations with
> >> the surviving inodes.  The ones that fail the GETATTR revalidation are
> >> fuse_make_bad'd, which effectively revokes them.
> >
> > Ah! Interesting, I have been playing a bit with sending LOOKUP requests,
> > but probably GETATTR is a better option.
> >
> > So, are you currently working on any of this?  Are you implementing this
> > new NOTIFY_RESTARTED request?  I guess it's time for me to have a closer
> > look at fuse2fs too.
>
> Sorry for joining the discussion late, I was totally occupied, day and
> night. Added Kevin to CC, who is going to work on recovery on our
> DDN side.
>
> Issue with GETATTR and LOOKUP is that they need a path, but on fuse
> server restart we want kernel to recover inodes and their lookup count.
> Now inode recovery might be hard, because we currently only have a
> 64-bit node-id - which is used my most fuse application as memory
> pointer.
>
> As Luis wrote, my issue with FUSE_NOTIFY_RESEND is that it just re-sends
> outstanding requests. And that ends up in most cases in sending requests
> with invalid node-IDs, that are casted and might provoke random memory
> access on restart. Kind of the same issue why fuse nfs export or
> open_by_handle_at doesn't work well right now.
>
> So IMHO, what we really want is something like FUSE_LOOKUP_FH, which
> would not return a 64-bit node ID, but a max 128 byte file handle.
> And then FUSE_REVALIDATE_FH on server restart.
> The file handles could be stored into the fuse inode and also used for
> NFS export.
>
> I *think* Amir had a similar idea, but I don't find the link quickly.
> Adding Amir to CC.

Or maybe it was Miklos' idea. Hard to keep track of this rolling thread:
https://lore.kernel.org/linux-fsdevel/CAJfpegvNZ6Z7uhuTdQ6quBaTOYNkAP8W_4yUY4L2JRAEKxEwOQ@mail.gmail.com/

>
> Our short term plan is to add something like FUSE_NOTIFY_RESTART, which
> will iterate over all superblock inodes and mark them with fuse_make_bad.
> Any objections against that?

IDK, it seems much more ugly than implementing LOOKUP_HANDLE
and I am not sure that LOOKUP_HANDLE is that hard to implement, when
comparing to this alternative.

I mean a restartable server is going to be a new implementation anyway, right?
So it makes sense to start with a cleaner and more adequate protocol,
does it not?

Thanks,
Amir.