lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250912145857.GQ8117@frogsfrogsfrogs>
Date: Fri, 12 Sep 2025 07:58:57 -0700
From: "Darrick J. Wong" <djwong@...nel.org>
To: Bernd Schubert <bernd@...ernd.com>
Cc: Amir Goldstein <amir73il@...il.com>, Luis Henriques <luis@...lia.com>,
	Theodore Ts'o <tytso@....edu>, Miklos Szeredi <miklos@...redi.hu>,
	Bernd Schubert <bschubert@....com>, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org, Kevin Chen <kchen@....com>
Subject: Re: [RFC] Another take at restarting FUSE servers

On Fri, Sep 12, 2025 at 02:29:03PM +0200, Bernd Schubert wrote:
> 
> 
> On 9/12/25 13:41, Amir Goldstein wrote:
> > On Fri, Sep 12, 2025 at 12:31 PM Bernd Schubert <bernd@...ernd.com> wrote:
> >>
> >>
> >>
> >> On 8/1/25 12:15, Luis Henriques wrote:
> >>> On Thu, Jul 31 2025, Darrick J. Wong wrote:
> >>>
> >>>> On Thu, Jul 31, 2025 at 09:04:58AM -0400, Theodore Ts'o wrote:
> >>>>> On Tue, Jul 29, 2025 at 04:38:54PM -0700, Darrick J. Wong wrote:
> >>>>>>
> >>>>>> Just speaking for fuse2fs here -- that would be kinda nifty if libfuse
> >>>>>> could restart itself.  It's unclear if doing so will actually enable us
> >>>>>> to clear the condition that caused the failure in the first place, but I
> >>>>>> suppose fuse2fs /does/ have e2fsck -fy at hand.  So maybe restarts
> >>>>>> aren't totally crazy.
> >>>>>
> >>>>> I'm trying to understand what the failure scenario is here.  Is this
> >>>>> if the userspace fuse server (i.e., fuse2fs) has crashed?  If so, what
> >>>>> is supposed to happen with respect to open files, metadata and data
> >>>>> modifications which were in transit, etc.?  Sure, fuse2fs could run
> >>>>> e2fsck -fy, but if there are dirty inode on the system, that's going
> >>>>> potentally to be out of sync, right?
> >>>>>
> >>>>> What are the recovery semantics that we hope to be able to provide?
> >>>>
> >>>> <echoing what we said on the ext4 call this morning>
> >>>>
> >>>> With iomap, most of the dirty state is in the kernel, so I think the new
> >>>> fuse2fs instance would poke the kernel with FUSE_NOTIFY_RESTARTED, which
> >>>> would initiate GETATTR requests on all the cached inodes to validate
> >>>> that they still exist; and then resend all the unacknowledged requests
> >>>> that were pending at the time.  It might be the case that you have to
> >>>> that in the reverse order; I only know enough about the design of fuse
> >>>> to suspect that to be true.
> >>>>
> >>>> Anyhow once those are complete, I think we can resume operations with
> >>>> the surviving inodes.  The ones that fail the GETATTR revalidation are
> >>>> fuse_make_bad'd, which effectively revokes them.
> >>>
> >>> Ah! Interesting, I have been playing a bit with sending LOOKUP requests,
> >>> but probably GETATTR is a better option.
> >>>
> >>> So, are you currently working on any of this?  Are you implementing this
> >>> new NOTIFY_RESTARTED request?  I guess it's time for me to have a closer
> >>> look at fuse2fs too.
> >>
> >> Sorry for joining the discussion late, I was totally occupied, day and
> >> night. Added Kevin to CC, who is going to work on recovery on our
> >> DDN side.
> >>
> >> Issue with GETATTR and LOOKUP is that they need a path, but on fuse
> >> server restart we want kernel to recover inodes and their lookup count.
> >> Now inode recovery might be hard, because we currently only have a
> >> 64-bit node-id - which is used my most fuse application as memory
> >> pointer.
> >>
> >> As Luis wrote, my issue with FUSE_NOTIFY_RESEND is that it just re-sends
> >> outstanding requests. And that ends up in most cases in sending requests
> >> with invalid node-IDs, that are casted and might provoke random memory
> >> access on restart. Kind of the same issue why fuse nfs export or
> >> open_by_handle_at doesn't work well right now.
> >>
> >> So IMHO, what we really want is something like FUSE_LOOKUP_FH, which
> >> would not return a 64-bit node ID, but a max 128 byte file handle.
> >> And then FUSE_REVALIDATE_FH on server restart.
> >> The file handles could be stored into the fuse inode and also used for
> >> NFS export.
> >>
> >> I *think* Amir had a similar idea, but I don't find the link quickly.
> >> Adding Amir to CC.
> > 
> > Or maybe it was Miklos' idea. Hard to keep track of this rolling thread:
> > https://lore.kernel.org/linux-fsdevel/CAJfpegvNZ6Z7uhuTdQ6quBaTOYNkAP8W_4yUY4L2JRAEKxEwOQ@mail.gmail.com/
> 
> Thanks for the reference Amir! I even had been in that thread.
> 
> > 
> >>
> >> Our short term plan is to add something like FUSE_NOTIFY_RESTART, which
> >> will iterate over all superblock inodes and mark them with fuse_make_bad.
> >> Any objections against that?

What if you actually /can/ reuse a nodeid after a restart?  Consider
fuse4fs, where the nodeid is the on-disk inode number.  After a restart,
you can reconnect the fuse_inode to the ondisk inode, assuming recovery
didn't delete it, obviously.

I suppose you could just ask for refreshed stat information and either
the server gives it to you and the fuse_inode lives; or the server
returns ENOENT and then we mark it bad.  But I'd have to see code
patches to form a real opinion.

It's very nice of fuse to have implemented revoke() ;)

--D

> > IDK, it seems much more ugly than implementing LOOKUP_HANDLE
> > and I am not sure that LOOKUP_HANDLE is that hard to implement, when
> > comparing to this alternative.
> > 
> > I mean a restartable server is going to be a new implementation anyway, right?
> > So it makes sense to start with a cleaner and more adequate protocol,
> > does it not?
> 
> Definitely, if we agree on the approach on LOOKUP_HANDLE and using it
> for recovery, adding that op seems simple. And reading through the
> thread you had posted above, just the implementation was missing.
> So let's go ahead to do this approach.
> 
> 
> Thanks,
> Bernd
> 
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ