linux-kernel - Re: [RFC 2/2] fuse: Add new flag to reuse the backing file of fuse

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOQ4uxjv=EZ-W-L=o8m2V+399PcBLLedz7T4z=5XKZhkwYitWw@mail.gmail.com>
Date: Fri, 16 Jan 2026 20:08:28 +0100
From: Amir Goldstein <amir73il@...il.com>
To: Chunsheng Luo <luochunsheng@...c.edu>
Cc: miklos@...redi.hu, linux-fsdevel@...r.kernel.org, 
	linux-kernel@...r.kernel.org, paullawrence@...gle.com
Subject: Re: [RFC 2/2] fuse: Add new flag to reuse the backing file of fuse_inode

On Fri, Jan 16, 2026 at 3:43 AM Chunsheng Luo <luochunsheng@...c.edu> wrote:
>
>
>
> On 1/15/26 11:31 PM, Amir Goldstein wrote:
> > On Thu, Jan 15, 2026 at 3:35 PM Chunsheng Luo <luochunsheng@...c.edu> wrote:
> >>
> >>
> >>
> >> On 1/15/26 9:09 PM, Amir Goldstein wrote:
> >>> Hi Chunsheng,
> >>>
> >>> Please CC me for future fuse passthrough patch sets.
> >>>
> >> Ok.
> >>
> >>> On Thu, Jan 15, 2026 at 03:20:31PM +0800, Chunsheng Luo wrote:
> >>>> To simplify crash recovery and reduce performance impact, backing_ids
> >>>> are not persisted across daemon restarts. However, this creates a
> >>>> problem: when the daemon restarts and a process opens the same FUSE
> >>>> file, a new backing_id may be allocated for the same backing file. If
> >>>> the inode already has a cached backing file from before the restart,
> >>>> subsequent open requests with the new backing_id will fail in
> >>>> fuse_inode_uncached_io_start() due to fb mismatch, even though both
> >>>> IDs reference the identical underlying file.
> >>>
> >>> I don't think that your proposal makes this guaranty.
> >>>
> >>
> >> Yes, this proposal does not apply to all situations.
> >>
> >>>>
> >>>> Introduce the FOPEN_PASSTHROUGH_INODE_CACHE flag to address this
> >>>> issue. When set, the kernel reuses the backing file already cached in
> >>>> the inode.
> >>>>
> >>>> Signed-off-by: Chunsheng Luo <luochunsheng@...c.edu>
> >>>> ---
> >>>>    fs/fuse/iomode.c          |  2 +-
> >>>>    fs/fuse/passthrough.c     | 11 +++++++++++
> >>>>    include/uapi/linux/fuse.h |  2 ++
> >>>>    3 files changed, 14 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c
> >>>> index 3728933188f3..b200bb248598 100644
> >>>> --- a/fs/fuse/iomode.c
> >>>> +++ b/fs/fuse/iomode.c
> >>>> @@ -163,7 +163,7 @@ static void fuse_file_uncached_io_release(struct fuse_file *ff,
> >>>>     */
> >>>>    #define FOPEN_PASSTHROUGH_MASK \
> >>>>       (FOPEN_PASSTHROUGH | FOPEN_DIRECT_IO | FOPEN_PARALLEL_DIRECT_WRITES | \
> >>>> -     FOPEN_NOFLUSH)
> >>>> +     FOPEN_NOFLUSH | FOPEN_PASSTHROUGH_INODE_CACHE)
> >>>>
> >>>>    static int fuse_file_passthrough_open(struct inode *inode, struct file *file)
> >>>>    {
> >>>> diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
> >>>> index 72de97c03d0e..fde4ac0c5737 100644
> >>>> --- a/fs/fuse/passthrough.c
> >>>> +++ b/fs/fuse/passthrough.c
> >>>> @@ -147,16 +147,26 @@ ssize_t fuse_passthrough_mmap(struct file *file, struct vm_area_struct *vma)
> >>>>    /*
> >>>>     * Setup passthrough to a backing file.
> >>>>     *
> >>>> + * If fuse inode backing is provided and FOPEN_PASSTHROUGH_INODE_CACHE flag
> >>>> + * is set, try to reuse it first before looking up backing_id.
> >>>> + *
> >>>>     * Returns an fb object with elevated refcount to be stored in fuse inode.
> >>>>     */
> >>>>    struct fuse_backing *fuse_passthrough_open(struct file *file, int backing_id)
> >>>>    {
> >>>>       struct fuse_file *ff = file->private_data;
> >>>>       struct fuse_conn *fc = ff->fm->fc;
> >>>> +    struct fuse_inode *fi = get_fuse_inode(file->f_inode);
> >>>>       struct fuse_backing *fb = NULL;
> >>>>       struct file *backing_file;
> >>>>       int err;
> >>>>
> >>>> +    if (ff->open_flags & FOPEN_PASSTHROUGH_INODE_CACHE) {
> >>>> +            fb = fuse_backing_get(fuse_inode_backing(fi));
> >>>> +            if (fb)
> >>>> +                    goto do_open;
> >>>> +    }
> >>>> +
> >>>
> >>> Maybe an explicit FOPEN_PASSTHROUGH_INODE_CACHE flag is a good idea,
> >>> but just FYI, I intentionally reserved backing_id 0 for this purpose.
> >>> For example, for setting up the backing id on lookup [1] and then
> >>> open does not need to specify the backing_id.
> >>>
> >>> [1] https://lore.kernel.org/linux-fsdevel/20250804173228.1990317-1-paullawrence@google.com/
> >>>
> >>
> >> This is a great idea. However, we need to consider the lifecycle
> >> management of the backing file associated with a FUSE inode.
> >> Specifically, will the same backing_idbe retained for the entire
> >> lifetime of the FUSE inode until it is deleted?
> >
> > It's not a good fit for servers that want to change the backing file
> > (like re-download). For these servers we have the existing file
> > open-to-close life cycle.
> >
> >>
> >> Additionally, since each backing_idcorresponds to an open file
> >> descriptor (fd) for the backing file, if a fuse_inode holds onto a
> >> backing_id indefinitely without a suitable release mechanism, could this
> >> accumulation of file descriptors cause the process to exceed its open
> >> files limit?
> >>
> >
> > There is no such accumulation.
> > fuse_inode refers to a single fuse_backing object.
> > fuse_file refers to a single fuse_backing object.
> > It can be the same (refcounted) object.
> >
>
> Sorry, I wasn't referring to `fuse_backing` refs.
>
> If the lifecycle of `fuse_backing` is the same as `fuse_inode`, and
> there are a large number of FUSE files on the file system, then when I
> iterate through and open the backing files, register the `fuse_backing`,
> and then set it to the `fuse_inode`, the FUSE service will hold a large
> number of backing file file descriptors (FDs).  These backing file FDs
> will only be released when the FUSE files are deleted.
>

Not until files are deleted. Until fuse inodes are evicted from inode cache.
FWIW, fuse_inode is ~900 bytes and filp is ~256 bytes, and when memory
is needed, memory shrinker will evict fuse inodes and backing file, so sure
backing files take up memory but not a game changer.

> For example, if there are 1000 FUSE files on the file system, and I
> iterate through and set the backing file for each `fuse_inode`, then the
> FUSE service will hold 1000 backing file FDs for a long time.  Extending
> this further, if there are even more files, could the FUSE service
> process exceed the `ulimit` configuration for open files?
>
> ```shell
> [root@...alhost home]# ulimit -a |grep "open files"
> open files                          (-n) 1024
> ```
>

backing files do not account for the open files limit of the FUSE server
that's one of the design issues, but it is by design.

> >>> But what you are proposing is a little bit odd API IMO:
> >>> "Use this backing_id with this backing file, unless you find another
> >>>    backing file so use that one instead" - this sounds a bit awkward to me.
> >>>
> >>> I think it would be saner and simpler to relax the check in
> >>> fuse_inode_uncached_io_start() to check that old and new fuse_backing
> >>> objects refer to the same backing inode:
> >>>
> >>> diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c
> >>> index 3728933188f30..c6070c361d855 100644
> >>> --- a/fs/fuse/iomode.c
> >>> +++ b/fs/fuse/iomode.c
> >>> @@ -88,9 +88,9 @@ int fuse_inode_uncached_io_start(struct fuse_inode *fi, struct fuse_backing *fb)
> >>>        int err = 0;
> >>>
> >>>        spin_lock(&fi->lock);
> >>> -     /* deny conflicting backing files on same fuse inode */
> >>> +     /* deny conflicting backing inodes on same fuse inode */
> >>>        oldfb = fuse_inode_backing(fi);
> >>> -     if (fb && oldfb && oldfb != fb) {
> >>> +     if (fb && oldfb && file_inode(oldfb->file) != file_inode(fb->file)) {
> >>>                err = -EBUSY;
> >>>                goto unlock;
> >>>        }
> >>> --
> >>>
> >>> I don't think that this requires opt-in flag.
> >>>
> >>> Thanks,
> >>> Amir.
> >>
> >> I agree that modifying the condition to `file_inode(oldfb->file) !=
> >> file_inode(fb->file)` is a reasonable fix, and it does address the first
> >> scenario I described.
> >>
> >> However, it doesn't fully resolve the second scenario: in a read-only
> >> FUSE filesystem, the backing file itself might be cleaned up and
> >> re-downloaded (resulting in a new inode with identical content). In this
> >> case, reusing the cached fuse_inode's fb after a daemon restart still be
> >> safe, but the inode comparison would incorrectly reject it. Is there a
> >> more robust approach for handling this scenario?
> >>
> >
> > There is a reason we added the restriction against associating
> > fuse file to different backing inodes.
> >
> > mmap and reads from different files to the same inode need to be
> > cache coherent.
> >
> > IOW, we intentionally do not support this setup without server restart
> > there is no reason for us to allow that after server restarts because
> > the consequense will be the same.
> >
> > It does not sound like a good idea for the server to cleanup files
> > that are currently opened via fuse passthrough - is that something
> > that happens intentionally? after server restarts?
> >
> > You could try to take a write lease to check if the file is currently
> > open for read/write to avoid cleanup in this case?
> >
> > Thanks,
> > Amir.
> >
> >
>
> Yes, it happened after the fuse service crash recovery restart, because
> the refs of the backup files were cleaned up, causing them to be
> mistakenly garbage collected.
>
> I will consider how to prevent it from being mistakenly garbage
> collected by the fuse server.

See here https://github.com/amir73il/fsnotify-utils/wiki/Hierarchical-Storage-Management-API#evicting-file-content
explanation how you can use F_SETLEASE to synchronize evicting
file content with file content readers.

Thanks,
Amir.