lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <39818613-c10b-4ed2-b596-23b70c749af1@bsbernd.com>
Date: Fri, 12 Sep 2025 12:31:17 +0200
From: Bernd Schubert <bernd@...ernd.com>
To: Luis Henriques <luis@...lia.com>, "Darrick J. Wong" <djwong@...nel.org>
Cc: Theodore Ts'o <tytso@....edu>, Miklos Szeredi <miklos@...redi.hu>,
 Bernd Schubert <bschubert@....com>, linux-fsdevel@...r.kernel.org,
 linux-kernel@...r.kernel.org, Kevin Chen <kchen@....com>,
 Amir Goldstein <amir73il@...il.com>
Subject: Re: [RFC] Another take at restarting FUSE servers



On 8/1/25 12:15, Luis Henriques wrote:
> On Thu, Jul 31 2025, Darrick J. Wong wrote:
> 
>> On Thu, Jul 31, 2025 at 09:04:58AM -0400, Theodore Ts'o wrote:
>>> On Tue, Jul 29, 2025 at 04:38:54PM -0700, Darrick J. Wong wrote:
>>>>
>>>> Just speaking for fuse2fs here -- that would be kinda nifty if libfuse
>>>> could restart itself.  It's unclear if doing so will actually enable us
>>>> to clear the condition that caused the failure in the first place, but I
>>>> suppose fuse2fs /does/ have e2fsck -fy at hand.  So maybe restarts
>>>> aren't totally crazy.
>>>
>>> I'm trying to understand what the failure scenario is here.  Is this
>>> if the userspace fuse server (i.e., fuse2fs) has crashed?  If so, what
>>> is supposed to happen with respect to open files, metadata and data
>>> modifications which were in transit, etc.?  Sure, fuse2fs could run
>>> e2fsck -fy, but if there are dirty inode on the system, that's going
>>> potentally to be out of sync, right?
>>>
>>> What are the recovery semantics that we hope to be able to provide?
>>
>> <echoing what we said on the ext4 call this morning>
>>
>> With iomap, most of the dirty state is in the kernel, so I think the new
>> fuse2fs instance would poke the kernel with FUSE_NOTIFY_RESTARTED, which
>> would initiate GETATTR requests on all the cached inodes to validate
>> that they still exist; and then resend all the unacknowledged requests
>> that were pending at the time.  It might be the case that you have to
>> that in the reverse order; I only know enough about the design of fuse
>> to suspect that to be true.
>>
>> Anyhow once those are complete, I think we can resume operations with
>> the surviving inodes.  The ones that fail the GETATTR revalidation are
>> fuse_make_bad'd, which effectively revokes them.
> 
> Ah! Interesting, I have been playing a bit with sending LOOKUP requests,
> but probably GETATTR is a better option.
> 
> So, are you currently working on any of this?  Are you implementing this
> new NOTIFY_RESTARTED request?  I guess it's time for me to have a closer
> look at fuse2fs too.

Sorry for joining the discussion late, I was totally occupied, day and
night. Added Kevin to CC, who is going to work on recovery on our
DDN side.

Issue with GETATTR and LOOKUP is that they need a path, but on fuse
server restart we want kernel to recover inodes and their lookup count.
Now inode recovery might be hard, because we currently only have a 
64-bit node-id - which is used my most fuse application as memory
pointer.

As Luis wrote, my issue with FUSE_NOTIFY_RESEND is that it just re-sends
outstanding requests. And that ends up in most cases in sending requests
with invalid node-IDs, that are casted and might provoke random memory
access on restart. Kind of the same issue why fuse nfs export or
open_by_handle_at doesn't work well right now.

So IMHO, what we really want is something like FUSE_LOOKUP_FH, which
would not return a 64-bit node ID, but a max 128 byte file handle.
And then FUSE_REVALIDATE_FH on server restart.
The file handles could be stored into the fuse inode and also used for
NFS export. 

I *think* Amir had a similar idea, but I don't find the link quickly.
Adding Amir to CC.

Our short term plan is to add something like FUSE_NOTIFY_RESTART, which
will iterate over all superblock inodes and mark them with fuse_make_bad.
Any objections against that?


Thanks,
Bernd






Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ