[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1451743b-69a0-4e32-9dd7-a68c14c8f1a1@ustc.edu>
Date: Sat, 17 Jan 2026 23:26:41 +0800
From: Chunsheng Luo <luochunsheng@...c.edu>
To: Amir Goldstein <amir73il@...il.com>
Cc: miklos@...redi.hu, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, paullawrence@...gle.com
Subject: Re: [RFC 2/2] fuse: Add new flag to reuse the backing file of
fuse_inode
On 1/17/26 3:08 AM, Amir Goldstein wrote:
> On Fri, Jan 16, 2026 at 3:43 AM Chunsheng Luo <luochunsheng@...c.edu> wrote:
>>
>>
>>
>> On 1/15/26 11:31 PM, Amir Goldstein wrote:
>>> On Thu, Jan 15, 2026 at 3:35 PM Chunsheng Luo <luochunsheng@...c.edu> wrote:
>>>>
>>>>
>>>>
>>>> On 1/15/26 9:09 PM, Amir Goldstein wrote:
>>>>> Hi Chunsheng,
>>>>>
>>>>> Please CC me for future fuse passthrough patch sets.
>>>>>
>>>> Ok.
>>>>
>>>>> On Thu, Jan 15, 2026 at 03:20:31PM +0800, Chunsheng Luo wrote:
>>>>>> To simplify crash recovery and reduce performance impact, backing_ids
>>>>>> are not persisted across daemon restarts. However, this creates a
>>>>>> problem: when the daemon restarts and a process opens the same FUSE
>>>>>> file, a new backing_id may be allocated for the same backing file. If
>>>>>> the inode already has a cached backing file from before the restart,
>>>>>> subsequent open requests with the new backing_id will fail in
>>>>>> fuse_inode_uncached_io_start() due to fb mismatch, even though both
>>>>>> IDs reference the identical underlying file.
>>>>>
>>>>> I don't think that your proposal makes this guaranty.
>>>>>
>>>>
>>>> Yes, this proposal does not apply to all situations.
>>>>
>>>>>>
>>>>>> Introduce the FOPEN_PASSTHROUGH_INODE_CACHE flag to address this
>>>>>> issue. When set, the kernel reuses the backing file already cached in
>>>>>> the inode.
>>>>>>
>>>>>> Signed-off-by: Chunsheng Luo <luochunsheng@...c.edu>
>>>>>> ---
>>>>>> fs/fuse/iomode.c | 2 +-
>>>>>> fs/fuse/passthrough.c | 11 +++++++++++
>>>>>> include/uapi/linux/fuse.h | 2 ++
>>>>>> 3 files changed, 14 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c
>>>>>> index 3728933188f3..b200bb248598 100644
>>>>>> --- a/fs/fuse/iomode.c
>>>>>> +++ b/fs/fuse/iomode.c
>>>>>> @@ -163,7 +163,7 @@ static void fuse_file_uncached_io_release(struct fuse_file *ff,
>>>>>> */
>>>>>> #define FOPEN_PASSTHROUGH_MASK \
>>>>>> (FOPEN_PASSTHROUGH | FOPEN_DIRECT_IO | FOPEN_PARALLEL_DIRECT_WRITES | \
>>>>>> - FOPEN_NOFLUSH)
>>>>>> + FOPEN_NOFLUSH | FOPEN_PASSTHROUGH_INODE_CACHE)
>>>>>>
>>>>>> static int fuse_file_passthrough_open(struct inode *inode, struct file *file)
>>>>>> {
>>>>>> diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
>>>>>> index 72de97c03d0e..fde4ac0c5737 100644
>>>>>> --- a/fs/fuse/passthrough.c
>>>>>> +++ b/fs/fuse/passthrough.c
>>>>>> @@ -147,16 +147,26 @@ ssize_t fuse_passthrough_mmap(struct file *file, struct vm_area_struct *vma)
>>>>>> /*
>>>>>> * Setup passthrough to a backing file.
>>>>>> *
>>>>>> + * If fuse inode backing is provided and FOPEN_PASSTHROUGH_INODE_CACHE flag
>>>>>> + * is set, try to reuse it first before looking up backing_id.
>>>>>> + *
>>>>>> * Returns an fb object with elevated refcount to be stored in fuse inode.
>>>>>> */
>>>>>> struct fuse_backing *fuse_passthrough_open(struct file *file, int backing_id)
>>>>>> {
>>>>>> struct fuse_file *ff = file->private_data;
>>>>>> struct fuse_conn *fc = ff->fm->fc;
>>>>>> + struct fuse_inode *fi = get_fuse_inode(file->f_inode);
>>>>>> struct fuse_backing *fb = NULL;
>>>>>> struct file *backing_file;
>>>>>> int err;
>>>>>>
>>>>>> + if (ff->open_flags & FOPEN_PASSTHROUGH_INODE_CACHE) {
>>>>>> + fb = fuse_backing_get(fuse_inode_backing(fi));
>>>>>> + if (fb)
>>>>>> + goto do_open;
>>>>>> + }
>>>>>> +
>>>>>
>>>>> Maybe an explicit FOPEN_PASSTHROUGH_INODE_CACHE flag is a good idea,
>>>>> but just FYI, I intentionally reserved backing_id 0 for this purpose.
>>>>> For example, for setting up the backing id on lookup [1] and then
>>>>> open does not need to specify the backing_id.
>>>>>
>>>>> [1] https://lore.kernel.org/linux-fsdevel/20250804173228.1990317-1-paullawrence@google.com/
>>>>>
>>>>
>>>> This is a great idea. However, we need to consider the lifecycle
>>>> management of the backing file associated with a FUSE inode.
>>>> Specifically, will the same backing_idbe retained for the entire
>>>> lifetime of the FUSE inode until it is deleted?
>>>
>>> It's not a good fit for servers that want to change the backing file
>>> (like re-download). For these servers we have the existing file
>>> open-to-close life cycle.
>>>
>>>>
>>>> Additionally, since each backing_idcorresponds to an open file
>>>> descriptor (fd) for the backing file, if a fuse_inode holds onto a
>>>> backing_id indefinitely without a suitable release mechanism, could this
>>>> accumulation of file descriptors cause the process to exceed its open
>>>> files limit?
>>>>
>>>
>>> There is no such accumulation.
>>> fuse_inode refers to a single fuse_backing object.
>>> fuse_file refers to a single fuse_backing object.
>>> It can be the same (refcounted) object.
>>>
>>
>> Sorry, I wasn't referring to `fuse_backing` refs.
>>
>> If the lifecycle of `fuse_backing` is the same as `fuse_inode`, and
>> there are a large number of FUSE files on the file system, then when I
>> iterate through and open the backing files, register the `fuse_backing`,
>> and then set it to the `fuse_inode`, the FUSE service will hold a large
>> number of backing file file descriptors (FDs). These backing file FDs
>> will only be released when the FUSE files are deleted.
>>
>
> Not until files are deleted. Until fuse inodes are evicted from inode cache.
> FWIW, fuse_inode is ~900 bytes and filp is ~256 bytes, and when memory
> is needed, memory shrinker will evict fuse inodes and backing file, so sure
> backing files take up memory but not a game changer.
>
>> For example, if there are 1000 FUSE files on the file system, and I
>> iterate through and set the backing file for each `fuse_inode`, then the
>> FUSE service will hold 1000 backing file FDs for a long time. Extending
>> this further, if there are even more files, could the FUSE service
>> process exceed the `ulimit` configuration for open files?
>>
>> ```shell
>> [root@...alhost home]# ulimit -a |grep "open files"
>> open files (-n) 1024
>> ```
>>
>
> backing files do not account for the open files limit of the FUSE server
> that's one of the design issues, but it is by design.
>
Thank you for the explanation.
I understand.
>>>>> But what you are proposing is a little bit odd API IMO:
>>>>> "Use this backing_id with this backing file, unless you find another
>>>>> backing file so use that one instead" - this sounds a bit awkward to me.
>>>>>
>>>>> I think it would be saner and simpler to relax the check in
>>>>> fuse_inode_uncached_io_start() to check that old and new fuse_backing
>>>>> objects refer to the same backing inode:
>>>>>
>>>>> diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c
>>>>> index 3728933188f30..c6070c361d855 100644
>>>>> --- a/fs/fuse/iomode.c
>>>>> +++ b/fs/fuse/iomode.c
>>>>> @@ -88,9 +88,9 @@ int fuse_inode_uncached_io_start(struct fuse_inode *fi, struct fuse_backing *fb)
>>>>> int err = 0;
>>>>>
>>>>> spin_lock(&fi->lock);
>>>>> - /* deny conflicting backing files on same fuse inode */
>>>>> + /* deny conflicting backing inodes on same fuse inode */
>>>>> oldfb = fuse_inode_backing(fi);
>>>>> - if (fb && oldfb && oldfb != fb) {
>>>>> + if (fb && oldfb && file_inode(oldfb->file) != file_inode(fb->file)) {
>>>>> err = -EBUSY;
>>>>> goto unlock;
>>>>> }
>>>>> --
>>>>>
>>>>> I don't think that this requires opt-in flag.
>>>>>
>>>>> Thanks,
>>>>> Amir.
>>>>
>>>> I agree that modifying the condition to `file_inode(oldfb->file) !=
>>>> file_inode(fb->file)` is a reasonable fix, and it does address the first
>>>> scenario I described.
>>>>
>>>> However, it doesn't fully resolve the second scenario: in a read-only
>>>> FUSE filesystem, the backing file itself might be cleaned up and
>>>> re-downloaded (resulting in a new inode with identical content). In this
>>>> case, reusing the cached fuse_inode's fb after a daemon restart still be
>>>> safe, but the inode comparison would incorrectly reject it. Is there a
>>>> more robust approach for handling this scenario?
>>>>
>>>
>>> There is a reason we added the restriction against associating
>>> fuse file to different backing inodes.
>>>
>>> mmap and reads from different files to the same inode need to be
>>> cache coherent.
>>>
>>> IOW, we intentionally do not support this setup without server restart
>>> there is no reason for us to allow that after server restarts because
>>> the consequense will be the same.
>>>
>>> It does not sound like a good idea for the server to cleanup files
>>> that are currently opened via fuse passthrough - is that something
>>> that happens intentionally? after server restarts?
>>>
>>> You could try to take a write lease to check if the file is currently
>>> open for read/write to avoid cleanup in this case?
>>>
>>> Thanks,
>>> Amir.
>>>
>>>
>>
>> Yes, it happened after the fuse service crash recovery restart, because
>> the refs of the backup files were cleaned up, causing them to be
>> mistakenly garbage collected.
>>
>> I will consider how to prevent it from being mistakenly garbage
>> collected by the fuse server.
>
> See here https://github.com/amir73il/fsnotify-utils/wiki/Hierarchical-Storage-Management-API#evicting-file-content
> explanation how you can use F_SETLEASE to synchronize evicting
> file content with file content readers.
>
> Thanks,
> Amir.
>
>
This is very helpful.
Thanks,
Chunsheng Luo
Powered by blists - more mailing lists