lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c73wbrsbijzlcfoptr4d6ryuf2mliectblna2hek5pxcuxfgla@7dbxympec26j>
Date: Fri, 4 Jul 2025 15:30:16 -0500
From: John Groves <John@...ves.net>
To: Amir Goldstein <amir73il@...il.com>
Cc: Dan Williams <dan.j.williams@...el.com>, 
	Miklos Szeredi <miklos@...redi.hu>, Bernd Schubert <bschubert@....com>, 
	John Groves <jgroves@...ron.com>, Jonathan Corbet <corbet@....net>, 
	Vishal Verma <vishal.l.verma@...el.com>, Dave Jiang <dave.jiang@...el.com>, 
	Matthew Wilcox <willy@...radead.org>, Jan Kara <jack@...e.cz>, 
	Alexander Viro <viro@...iv.linux.org.uk>, Christian Brauner <brauner@...nel.org>, 
	"Darrick J . Wong" <djwong@...nel.org>, Randy Dunlap <rdunlap@...radead.org>, 
	Jeff Layton <jlayton@...nel.org>, Kent Overstreet <kent.overstreet@...ux.dev>, 
	linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org, nvdimm@...ts.linux.dev, 
	linux-cxl@...r.kernel.org, linux-fsdevel@...r.kernel.org, 
	Jonathan Cameron <Jonathan.Cameron@...wei.com>, Stefan Hajnoczi <shajnocz@...hat.com>, 
	Joanne Koong <joannelkoong@...il.com>, Josef Bacik <josef@...icpanda.com>, 
	Aravind Ramesh <arramesh@...ron.com>, Ajay Joshi <ajayjoshi@...ron.com>
Subject: Re: [RFC V2 12/18] famfs_fuse: Plumb the GET_FMAP message/response

On 25/07/04 10:54AM, Amir Goldstein wrote:
> On Thu, Jul 3, 2025 at 8:51 PM John Groves <John@...ves.net> wrote:
> >
> > Upon completion of an OPEN, if we're in famfs-mode we do a GET_FMAP to
> > retrieve and cache up the file-to-dax map in the kernel. If this
> > succeeds, read/write/mmap are resolved direct-to-dax with no upcalls.
> >
> > GET_FMAP has a variable-size response payload, and the allocated size
> > is sent in the in_args[0].size field. If the fmap would overflow the
> > message, the fuse server sends a reply of size 'sizeof(uint32_t)' which
> > specifies the size of the fmap message. Then the kernel can realloc a
> > large enough buffer and try again.
> >
> > Signed-off-by: John Groves <john@...ves.net>
> > ---
> >  fs/fuse/file.c            | 84 +++++++++++++++++++++++++++++++++++++++
> >  fs/fuse/fuse_i.h          | 36 ++++++++++++++++-
> >  fs/fuse/inode.c           | 19 +++++++--
> >  fs/fuse/iomode.c          |  2 +-
> >  include/uapi/linux/fuse.h | 18 +++++++++
> >  5 files changed, 154 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> > index 93b82660f0c8..8616fb0a6d61 100644
> > --- a/fs/fuse/file.c
> > +++ b/fs/fuse/file.c
> > @@ -230,6 +230,77 @@ static void fuse_truncate_update_attr(struct inode *inode, struct file *file)
> >         fuse_invalidate_attr_mask(inode, FUSE_STATX_MODSIZE);
> >  }
> >
> > +#if IS_ENABLED(CONFIG_FUSE_FAMFS_DAX)
> 
> We generally try to avoid #ifdef blocks in c files
> keep them mostly in h files and use in c files
>    if (IS_ENABLED(CONFIG_FUSE_FAMFS_DAX))
> 
> also #if IS_ENABLED(CONFIG_FUSE_FAMFS_DAX)
> it a bit strange for a bool Kconfig because it looks too
> much like the c code, so I prefer
> #ifdef CONFIG_FUSE_FAMFS_DAX
> when you have to use it
> 
> If you need entire functions compiled out, why not put them in famfs.c?

Perhaps moving fuse_get_fmap() to famfs.c is the best approach. Will try that
first.

Regarding '#if IS_ENABLED(CONFIG_FUSE_FAMFS_DAX)', vs.
'#ifdef CONFIG_FUSE_FAMFS_DAX' vs. '#if CONFIG_FUSE_FAMFS_DAX'...

I've learned to be cautious there because the latter two are undefined if
CONFIG_FUSE_FAMFS_DAX=m. I've been burned by this.

My original thinking was that famfs made sense as a module, but I'm leaning
the other way now - and in this series fs/fuse/Kconfig makes it a bool - 
meaning all three macro tests will work because a bool can't be set to 'm'. 

So to the extent that I need conditional compilation macros I can switch
to '#ifdef...'.


> 
> > +
> > +#define FMAP_BUFSIZE 4096
> > +
> > +static int
> > +fuse_get_fmap(struct fuse_mount *fm, struct inode *inode, u64 nodeid)
> > +{
> > +       struct fuse_get_fmap_in inarg = { 0 };
> > +       size_t fmap_bufsize = FMAP_BUFSIZE;
> > +       ssize_t fmap_size;
> > +       int retries = 1;
> > +       void *fmap_buf;
> > +       int rc;
> > +
> > +       FUSE_ARGS(args);
> > +
> > +       fmap_buf = kcalloc(1, FMAP_BUFSIZE, GFP_KERNEL);
> > +       if (!fmap_buf)
> > +               return -EIO;
> > +
> > + retry_once:
> > +       inarg.size = fmap_bufsize;
> > +
> > +       args.opcode = FUSE_GET_FMAP;
> > +       args.nodeid = nodeid;
> > +
> > +       args.in_numargs = 1;
> > +       args.in_args[0].size = sizeof(inarg);
> > +       args.in_args[0].value = &inarg;
> > +
> > +       /* Variable-sized output buffer
> > +        * this causes fuse_simple_request() to return the size of the
> > +        * output payload
> > +        */
> > +       args.out_argvar = true;
> > +       args.out_numargs = 1;
> > +       args.out_args[0].size = fmap_bufsize;
> > +       args.out_args[0].value = fmap_buf;
> > +
> > +       /* Send GET_FMAP command */
> > +       rc = fuse_simple_request(fm, &args);
> > +       if (rc < 0) {
> > +               pr_err("%s: err=%d from fuse_simple_request()\n",
> > +                      __func__, rc);
> > +               return rc;
> > +       }
> > +       fmap_size = rc;
> > +
> > +       if (retries && fmap_size == sizeof(uint32_t)) {
> > +               /* fmap size exceeded fmap_bufsize;
> > +                * actual fmap size returned in fmap_buf;
> > +                * realloc and retry once
> > +                */
> > +               fmap_bufsize = *((uint32_t *)fmap_buf);
> > +
> > +               --retries;
> > +               kfree(fmap_buf);
> > +               fmap_buf = kcalloc(1, fmap_bufsize, GFP_KERNEL);
> > +               if (!fmap_buf)
> > +                       return -EIO;
> > +
> > +               goto retry_once;
> > +       }
> > +
> > +       /* Will call famfs_file_init_dax() when that gets added */
> > +
> > +       kfree(fmap_buf);
> > +       return 0;
> > +}
> > +#endif
> > +
> >  static int fuse_open(struct inode *inode, struct file *file)
> >  {
> >         struct fuse_mount *fm = get_fuse_mount(inode);
> > @@ -263,6 +334,19 @@ static int fuse_open(struct inode *inode, struct file *file)
> >
> >         err = fuse_do_open(fm, get_node_id(inode), file, false);
> >         if (!err) {
> > +#if IS_ENABLED(CONFIG_FUSE_FAMFS_DAX)
> > +               if (fm->fc->famfs_iomap) {
> 
> That should be
> > +               if (IS_ENABLED(CONFIG_FUSE_FAMFS_DAX) &&
> > +                   fm->fc->famfs_iomap) {
> 
> > +                       if (S_ISREG(inode->i_mode)) {
> > +                               int rc;
> > +                               /* Get the famfs fmap */
> > +                               rc = fuse_get_fmap(fm, inode,
> > +                                                  get_node_id(inode));
> > +                               if (rc)
> > +                                       pr_err("%s: fuse_get_fmap err=%d\n",
> > +                                              __func__, rc);
> > +                       }
> > +               }
> > +#endif
> >                 ff = file->private_data;
> >                 err = fuse_finish_open(inode, file);
> >                 if (err)
> > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> > index f4ee61046578..e01d6e5c6e93 100644
> > --- a/fs/fuse/fuse_i.h
> > +++ b/fs/fuse/fuse_i.h
> > @@ -193,6 +193,10 @@ struct fuse_inode {
> >         /** Reference to backing file in passthrough mode */
> >         struct fuse_backing *fb;
> >  #endif
> > +
> > +#if IS_ENABLED(CONFIG_FUSE_FAMFS_DAX)
> > +       void *famfs_meta;
> > +#endif
> >  };
> >
> >  /** FUSE inode state bits */
> > @@ -945,6 +949,8 @@ struct fuse_conn {
> >  #endif
> >
> >  #if IS_ENABLED(CONFIG_FUSE_FAMFS_DAX)
> > +       struct rw_semaphore famfs_devlist_sem;
> > +       struct famfs_dax_devlist *dax_devlist;
> >         char *shadow;
> >  #endif
> >  };
> > @@ -1435,11 +1441,14 @@ void fuse_free_conn(struct fuse_conn *fc);
> >
> >  /* dax.c */
> >
> > +static inline int fuse_file_famfs(struct fuse_inode *fi); /* forward */
> > +
> >  /* This macro is used by virtio_fs, but now it also needs to filter for
> >   * "not famfs"
> >   */
> >  #define FUSE_IS_VIRTIO_DAX(fuse_inode) (IS_ENABLED(CONFIG_FUSE_DAX)    \
> > -                                       && IS_DAX(&fuse_inode->inode))
> > +                                       && IS_DAX(&fuse_inode->inode)   \
> > +                                       && !fuse_file_famfs(fuse_inode))
> >
> >  ssize_t fuse_dax_read_iter(struct kiocb *iocb, struct iov_iter *to);
> >  ssize_t fuse_dax_write_iter(struct kiocb *iocb, struct iov_iter *from);
> > @@ -1550,4 +1559,29 @@ extern void fuse_sysctl_unregister(void);
> >  #define fuse_sysctl_unregister()       do { } while (0)
> >  #endif /* CONFIG_SYSCTL */
> >
> > +/* famfs.c */
> > +static inline struct fuse_backing *famfs_meta_set(struct fuse_inode *fi,
> > +                                                      void *meta)
> > +{
> > +#if IS_ENABLED(CONFIG_FUSE_FAMFS_DAX)
> > +       return xchg(&fi->famfs_meta, meta);
> > +#else
> > +       return NULL;
> > +#endif
> > +}
> > +
> > +static inline void famfs_meta_free(struct fuse_inode *fi)
> > +{
> > +       /* Stub wil be connected in a subsequent commit */
> > +}
> > +
> > +static inline int fuse_file_famfs(struct fuse_inode *fi)
> > +{
> > +#if IS_ENABLED(CONFIG_FUSE_FAMFS_DAX)
> > +       return (READ_ONCE(fi->famfs_meta) != NULL);
> > +#else
> > +       return 0;
> > +#endif
> > +}
> > +
> >  #endif /* _FS_FUSE_I_H */
> > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> > index a7e1cf8257b0..b071d16f7d04 100644
> > --- a/fs/fuse/inode.c
> > +++ b/fs/fuse/inode.c
> > @@ -117,6 +117,9 @@ static struct inode *fuse_alloc_inode(struct super_block *sb)
> >         if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
> >                 fuse_inode_backing_set(fi, NULL);
> >
> > +       if (IS_ENABLED(CONFIG_FUSE_FAMFS_DAX))
> > +               famfs_meta_set(fi, NULL);
> > +
> >         return &fi->inode;
> >
> >  out_free_forget:
> > @@ -138,6 +141,13 @@ static void fuse_free_inode(struct inode *inode)
> >         if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
> >                 fuse_backing_put(fuse_inode_backing(fi));
> >
> > +#if IS_ENABLED(CONFIG_FUSE_FAMFS_DAX)
> > +       if (S_ISREG(inode->i_mode) && fi->famfs_meta) {
> > +               famfs_meta_free(fi);
> > +               famfs_meta_set(fi, NULL);
> > +       }
> > +#endif
> > +
> >         kmem_cache_free(fuse_inode_cachep, fi);
> >  }
> >
> > @@ -1002,6 +1012,9 @@ void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm,
> >         if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
> >                 fuse_backing_files_init(fc);
> >
> > +       if (IS_ENABLED(CONFIG_FUSE_FAMFS_DAX))
> > +               pr_notice("%s: Kernel is FUSE_FAMFS_DAX capable\n", __func__);
> > +
> >         INIT_LIST_HEAD(&fc->mounts);
> >         list_add(&fm->fc_entry, &fc->mounts);
> >         fm->fc = fc;
> > @@ -1036,9 +1049,8 @@ void fuse_conn_put(struct fuse_conn *fc)
> >                 }
> >                 if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
> >                         fuse_backing_files_free(fc);
> > -#if IS_ENABLED(CONFIG_FUSE_FAMFS_DAX)
> > -               kfree(fc->shadow);
> > -#endif
> > +               if (IS_ENABLED(CONFIG_FUSE_FAMFS_DAX))
> > +                       kfree(fc->shadow);
> >                 call_rcu(&fc->rcu, delayed_release);
> >         }
> >  }
> > @@ -1425,6 +1437,7 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
> >                                  * those capabilities, they are held here).
> >                                  */
> >                                 fc->famfs_iomap = 1;
> > +                               init_rwsem(&fc->famfs_devlist_sem);
> >                         }
> >                 } else {
> >                         ra_pages = fc->max_read / PAGE_SIZE;
> > diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c
> > index aec4aecb5d79..443b337b0c05 100644
> > --- a/fs/fuse/iomode.c
> > +++ b/fs/fuse/iomode.c
> > @@ -204,7 +204,7 @@ int fuse_file_io_open(struct file *file, struct inode *inode)
> >          * io modes are not relevant with DAX and with server that does not
> >          * implement open.
> >          */
> > -       if (FUSE_IS_VIRTIO_DAX(fi) || !ff->args)
> > +       if (FUSE_IS_VIRTIO_DAX(fi) || fuse_file_famfs(fi) || !ff->args)
> >                 return 0;
> 
> This is where fuse_is_dax() helper would be handy.

Roger that. Thinking through it...

> 
> Thanks,
> Amir.

Thank you,
John


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ